Cross Validated Asked by Matin Kh on December 27, 2020

I have a dataset with 200K samples (cases) and 30 variables. Every distance-based method for clustering or dimension reduction technique that I use, such as *DBSCAN*, *Hierarchical Clustering*, *LLE*, *Isomap* and … fail to run on my machine (normally I get `R Session Terminated`

error) due to the large distance file being generated. (*Distance calculation requires o(n^2) time and space*)

Is there any solution to handle this problem? Is there any good package for the mentioned clustering or dimensionality reduction in R or Matlab that is suitable ?

Maybe you could try Mini-Batch K-Means. I have Matlab code for it:

```
function [c,counts,idx] = mbkmeans(x,k,c,counts)
[N,D] = size(x);
if ~exist('c','var') || isempty(c)
c = x(1:min([k N]),:) + bsxfun(@times,randn(min([k N]),D)*0.001,std(x));
if N < k
c(N+1:k,:) = bsxfun(@plus,mean(x),bsxfun(@times,randn(k-N,D),std(x)));
end;
end;
if ~exist('counts','var') || isempty(counts)
counts = zeros(k,1);
end;
idx = knnsearch(c,x,'k',1);
add = full(sparse(idx,1,1));
counts(idx) = counts(idx) + add(idx);
lr = 1 ./ counts(idx);
for i = 1:N
c(idx(i),:) = (1 - lr(i)) * c(idx(i),:) + lr(i) * x(i,:);
end;
```

Usage:

```
clusters = mbkmeans(yourdata,numberofclusters);
```

You may feed it your entire dataset at once and you're done. Or you may feed it smaller subsets. In this case, use it like this:

```
[c1, counts1] = mbkmeans(subset1,numberofclusters);
[c2, counts2] = mbkmeans(subset2,numberofclusters, c1, counts1); %start clustering using previously created clusters
[c3, counts3] = mbkmeans(subset3,numberofclusters, c2, counts2);
...
[cn, countsn, indices] = mbkmeans(subsetn,numberofclusters, c(n-1), counts(n-1));
```

For R, there is the stream package (explanation here). You may also take a look at this, this and this.

Answered by rcpinto on December 27, 2020

Get help from others!

Recent Answers

- Jon Church on Why fry rice before boiling?
- haakon.io on Why fry rice before boiling?
- Joshua Engel on Why fry rice before boiling?
- Lex on Does Google Analytics track 404 page responses as valid page views?
- Peter Machado on Why fry rice before boiling?

Recent Questions

- How can I transform graph image into a tikzpicture LaTeX code?
- How Do I Get The Ifruit App Off Of Gta 5 / Grand Theft Auto 5
- Iv’e designed a space elevator using a series of lasers. do you know anybody i could submit the designs too that could manufacture the concept and put it to use
- Need help finding a book. Female OP protagonist, magic
- Why is the WWF pending games (“Your turn”) area replaced w/ a column of “Bonus & Reward”gift boxes?

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP