[BioC] Problem in K- means clustering
Sonali Arora
sarora at fhcrc.org
Fri Aug 22 18:38:20 CEST 2014
Hi Aditya,
You have a typo error in your code . It should be -
i <- c(2) instead of i<c(2)
I had no problem running the following:
>rm(list=ls())
>dat2<- mtcars
>kmax<-c(100)
># If there are less than 100 genes or arrays
># make the max. no. of cluster equal to the
># number of genes or arrays
>if(nrow(dat2)<100) {
+ kmax<-nrow(dat2)
+}
># Create an empty vector for storing the
># within SS values
>km<-rep(NA, (kmax-1))
># Minimum number of cluster is 2
>i<-c(2)
># Test all numbers of clusters between 2
># max. 100 using the while -loop
>
>
>
>while(i<kmax) {
+ km[i]<-sum(kmeans(dat2, i, iter.max=20000,
+ nstart=10)$withinss)
+ # Terminate the run if the change in within SS is
+ # less than 1%
+ if(i>=3 & km[i-1]/km[i]<=1.01) {
+ i<-kmax
+ } else {
+ i<-i+1
+ }
+}
># Plot the number of K against the within SS
>plot(2:kmax, km, xlab="K", ylab="sum(withinss)", type="b",
+ pch="+", main="Terminated when change less than 1%")
>sessionInfo()
R version 3.1.1 (2014-07-10)
Platform: i386-w64-mingw32/i386 (32-bit)
locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets
[6] methods base
loaded via a namespace (and not attached):
[1] tools_3.1.1
In future- Please attach a script which people can copy-paste into their browser session
along with the sesssionInfo() - It will help people in exactly replicating your problem and troubleshooting.
Thanks and Regards,
Sonali.
On 8/22/2014 9:02 AM, Aditya Saxena wrote:
> Dear List,
>
> I am working on Affimetrix's HGU133Plus2 chip data GSE23343 from GEO and
> want to find the optimal number of clusters.
>
> I am following http://koti.mbnet.fi/tuimala/oppaat/r2.pdf at pg 113
> following code is there but when I tried, I do not find graph but received
> following error.
>
> please suggest where I am doing mistake ?
>
> kmax<-c(100)
>> if(nrow(dat2)<100) {
> + kmax<-nrow(dat2)
> + }
> km<-rep(NA,(kmax-1))
>> i<c(2)
> while(i<kmax){
> + km[i]<-sum(kmeans(dat2,i,iter.max=20000,nstart=10)$withinss)
> + if(i>=3 & km[i-1]/km[i]<=1.01){
> + i<-kmax
> + } else {
> + i<-i+1
> + }
> + }
> plot(2:kmax,km,xlab="K",ylab="sum(withinss)",type="b",pch="+",main="Terminated
> when change less then 1%")
>
>
>
>
>
> *Error in plot.window(...) : need finite 'ylim' valuesIn addition: Warning
> messages:1: In min(x) : no non-missing arguments to min; returning Inf2: In
> max(x) : no non-missing arguments to max; returning -Inf*
> Many thanks,
>
>
