[BioC] Problem in K- means clustering

Sonali Arora sarora at fhcrc.org
Fri Aug 22 18:38:20 CEST 2014


Hi Aditya,

You have a typo error in your code . It should be -

i <- c(2) instead of   i<c(2)

I had no problem running the following:

>rm(list=ls())
>dat2<- mtcars
>kmax<-c(100)
># If there are less than 100 genes or arrays
># make the max. no. of cluster equal to the
># number of genes or arrays
>if(nrow(dat2)<100) {
+     kmax<-nrow(dat2)
+}
># Create an empty vector for storing the
># within SS values
>km<-rep(NA, (kmax-1))
># Minimum number of cluster is 2
>i<-c(2)
># Test all numbers of clusters between 2
># max. 100 using the while -loop
>
>
>
>while(i<kmax) {
+     km[i]<-sum(kmeans(dat2, i, iter.max=20000,
+                       nstart=10)$withinss)
+     # Terminate the run if the change in within SS is
+     # less than 1%
+     if(i>=3 & km[i-1]/km[i]<=1.01) {
+         i<-kmax
+     } else {
+         i<-i+1
+     }
+}
># Plot the number of K against the within SS
>plot(2:kmax, km, xlab="K", ylab="sum(withinss)", type="b",
+      pch="+", main="Terminated when change less than 1%")


>sessionInfo()
R version 3.1.1 (2014-07-10)
Platform: i386-w64-mingw32/i386 (32-bit)

locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets
[6] methods   base

loaded via a namespace (and not attached):
[1] tools_3.1.1


In future- Please attach a script which people can copy-paste into their browser session
along with the sesssionInfo() - It will help people in exactly replicating your problem and troubleshooting.

Thanks and Regards,
Sonali.


On 8/22/2014 9:02 AM, Aditya Saxena wrote:
> Dear List,
>
> I am working on Affimetrix's HGU133Plus2 chip data GSE23343 from GEO and
> want to find the optimal number of clusters.
>
> I am following  http://koti.mbnet.fi/tuimala/oppaat/r2.pdf at pg 113
> following code is there but when I tried, I do not find graph but received
> following error.
>
> please suggest where I am doing mistake ?
>
> kmax<-c(100)
>> if(nrow(dat2)<100) {
> + kmax<-nrow(dat2)
> + }
> km<-rep(NA,(kmax-1))
>> i<c(2)
> while(i<kmax){
> + km[i]<-sum(kmeans(dat2,i,iter.max=20000,nstart=10)$withinss)
> + if(i>=3 & km[i-1]/km[i]<=1.01){
> + i<-kmax
> + } else {
> + i<-i+1
> +  }
> + }
> plot(2:kmax,km,xlab="K",ylab="sum(withinss)",type="b",pch="+",main="Terminated
> when change less then 1%")
>
>
>
>
>
> *Error in plot.window(...) : need finite 'ylim' valuesIn addition: Warning
> messages:1: In min(x) : no non-missing arguments to min; returning Inf2: In
> max(x) : no non-missing arguments to max; returning -Inf*
> Many thanks,
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



	[[alternative HTML version deleted]]



More information about the Bioconductor mailing list