Fwd: Re: Fwd: Re: [BioC] Clustering in R....

Marcus marcusb at biotech.kth.se
Thu Nov 27 14:15:13 MET 2003


>X-Mailer: QUALCOMM Windows Eudora Version 5.2.0.9
>Date: Thu, 27 Nov 2003 08:12:22 +0100
>To: Marcus <marcusb at biotech.kth.se>
>From: Johan Lindberg <johanl at kiev.biotech.kth.se>
>Subject: Re: Fwd: Re: [BioC] Clustering in R....
>X-MIME-Autoconverted: from quoted-printable to 8bit by kiev.biotech.kth.se 
>id hAR7HNPr006352
>
>That did the trick !
>Thank you Sean for the help.
>
>Now I have some more questions. I read a tutorial on the webb that didnt 
>include any R code that it was possible to identify in a Heatmap which 
>genes that are in specific clusters. In other programs such as GeneSpring 
>you could just click on the dendogram to get a list of the genes of 
>interest. How do one perform such an operation in R? Is it possible? I 
>mean, is the heatmap only for visualization or can one look at the 
>different clustered groups in some way?
>
>And a question about levelplot. When you plot the correlation with 
>levelplot you do not get the names of your samples on either the x or the 
>y axis. In a plot like barplot it works with the argument 
>names.arg=namevector but I havent found anything like that for levelplot. 
>A tip anyone?
>
>Cheers
>
>/Marcus
>
>
>At 10:09 2003-11-12 +0100, you wrote:
>
>>>User-Agent: Microsoft-Entourage/10.0.0.1309
>>>Date: Tue, 11 Nov 2003 06:33:22 -0500
>>>Subject: Re: [BioC] Clustering in R....
>>>From: Sean Davis <sdavis2 at mail.nih.gov>
>>>To: Marcus <marcusb at biotech.kth.se>
>>>X-MIME-Autoconverted: from quoted-printable to 8bit by 
>>>kiev.biotech.kth.se id hABBWFPr030178
>>>
>>>Marcus,
>>>
>>>Here is a fairly general method for working with heatmap that I have used.
>>>You can substitute any function that you want for distance (eg.,
>>>1-correlation, etc.) and for clustering (don't have to use hclust).  Make
>>>sure that you do the coercion (to distance or dendrogram objects as needed),
>>>though.  Also, some distance functions that you can dream up will not work
>>>with NA's, but dist does.
>>>
>>> > m <- matrix(rnorm(100),nrow=10,ncol=10)
>>> > m
>>>             [,1]        [,2]       [,3]        [,4]       [,5]       [,6]
>>>  [1,] -1.0326191  1.09744204  0.9923254 -0.05780237  1.6853566 -0.5938021
>>>  [2,] -0.6493561 -0.58846041  0.8735639  0.34492342 -0.1398261  1.4288108
>>>  [3,] -1.0020073  0.75130128 -2.6110435  1.27265445  0.1211387  0.7048981
>>>  [4,] -0.1658810  0.45351434 -0.8973168 -0.17738084 -0.1056792 -1.7251339
>>>  [5,]  0.1466563  0.11917823  0.9372353  0.29040600  0.8463049  0.9192848
>>>  [6,]  0.6020565 -0.90338771 -0.7453363 -1.34284821 -0.7684490  0.2177409
>>>  [7,]  0.5290555  0.58798246  0.4085396  0.63305003  0.2014624 -0.5613248
>>>  [8,]  1.4456958  0.06372875  0.1829127  0.20681971  0.5745696 -0.3555856
>>>  [9,]  0.5973093 -0.35483585  1.1074023  0.63930734 -1.2452399 -1.2721422
>>>[10,]  1.2563169  0.92249574 -0.7103717 -0.41067056  0.2277188  0.3861969
>>>              [,7]       [,8]       [,9]       [,10]
>>>  [1,] -1.63852314 -1.0773165  0.5601368  1.05115476
>>>  [2,] -0.14026278 -0.9013605  0.1581475  0.36730440
>>>  [3,]  0.45517561 -1.5211124 -1.1641732  1.97321531
>>>  [4,]  0.08338336  1.4846938  0.3096862  0.44513675
>>>  [5,]  0.85917332  1.0337033 -0.1784938 -0.48848017
>>>  [6,]  0.05054810  1.3712665 -0.6545246  0.10251154
>>>  [7,]  2.30894410 -0.6089214  1.5761573  0.66912925
>>>  [8,] -0.85946317  0.0855971 -0.7014037 -2.19050881
>>>  [9,]  1.53911617  1.1185075  0.2428764 -0.09556405
>>>[10,] -1.61446618  1.0605298  0.5160358  0.04152571
>>> > m[10,1:8] <- NA
>>> > m
>>>             [,1]        [,2]       [,3]        [,4]       [,5]       [,6]
>>>  [1,] -1.0326191  1.09744204  0.9923254 -0.05780237  1.6853566 -0.5938021
>>>  [2,] -0.6493561 -0.58846041  0.8735639  0.34492342 -0.1398261  1.4288108
>>>  [3,] -1.0020073  0.75130128 -2.6110435  1.27265445  0.1211387  0.7048981
>>>  [4,] -0.1658810  0.45351434 -0.8973168 -0.17738084 -0.1056792 -1.7251339
>>>  [5,]  0.1466563  0.11917823  0.9372353  0.29040600  0.8463049  0.9192848
>>>  [6,]  0.6020565 -0.90338771 -0.7453363 -1.34284821 -0.7684490  0.2177409
>>>  [7,]  0.5290555  0.58798246  0.4085396  0.63305003  0.2014624 -0.5613248
>>>  [8,]  1.4456958  0.06372875  0.1829127  0.20681971  0.5745696 -0.3555856
>>>  [9,]  0.5973093 -0.35483585  1.1074023  0.63930734 -1.2452399 -1.2721422
>>>[10,]         NA          NA         NA          NA         NA         NA
>>>              [,7]       [,8]       [,9]       [,10]
>>>  [1,] -1.63852314 -1.0773165  0.5601368  1.05115476
>>>  [2,] -0.14026278 -0.9013605  0.1581475  0.36730440
>>>  [3,]  0.45517561 -1.5211124 -1.1641732  1.97321531
>>>  [4,]  0.08338336  1.4846938  0.3096862  0.44513675
>>>  [5,]  0.85917332  1.0337033 -0.1784938 -0.48848017
>>>  [6,]  0.05054810  1.3712665 -0.6545246  0.10251154
>>>  [7,]  2.30894410 -0.6089214  1.5761573  0.66912925
>>>  [8,] -0.85946317  0.0855971 -0.7014037 -2.19050881
>>>  [9,]  1.53911617  1.1185075  0.2428764 -0.09556405
>>>[10,]          NA         NA  0.5160358  0.04152571
>>> > sampdist=dist(t(m))
>>> > sclus=hclust(sampdist) # sclus is a dendrogram that you can plot(sclus)
>>> > genedist=dist(m)
>>> > gclus=hclust(genedist) # gclus is also a dendrogram
>>> > heatmap(m,Rowv=gclus,Colv=sclus) #this doesn't work!
>>>Error in lV + rV : non-numeric argument to binary operator
>>> > heatmap(m,Rowv=as.dendrogram(gclus),Colv=as.dendrogram(sclus)) # need 
>>> proper
>>>coercion for this to work
>>>
>>>Although this works, note that using a gene that has 16 NA values out of 22
>>>is probably not going to be useful, as the distance matrix for this example
>>>for the genes is:
>>>
>>> > genedist
>>>           1        2        3        4        5        6        7        8
>>>2  3.673241
>>>3  5.235695 4.536603
>>>4  4.381494 4.522069 5.046200
>>>5  4.367649 2.821795 5.437622 3.688942
>>>6  5.408318 3.863713 5.380546 3.014530 3.345877
>>>7  4.764409 3.915998 5.194822 3.911820 3.548220 4.830247
>>>8  4.825510 4.216357 6.212646 4.149383 3.314914 3.844966 5.041345
>>>9  5.536079 4.169987 6.179576 3.158424 3.249127 3.637840 3.149486 4.264858
>>>10 2.259752 1.082164 5.724739 1.013612 1.953558 2.621002 2.754763 5.685128
>>>            9
>>>2
>>>3
>>>4
>>>5
>>>6
>>>7
>>>8
>>>9
>>>10 0.6834093
>>>
>>>See how much different the distance involving row 10 is from the others--the
>>>NA values were simply dropped.  You will probably have to either deal with
>>>the missing values beforehand or use another distance measure that is not
>>>sensitive to NA values.  I can't tell you what to do on that part, as that
>>>is also somewhat dependent on your need to use that gene and the
>>>practicality of doing more experiments.
>>>
>>>Let me know if that helps.
>>>
>>>Sean
>>>
>>>
>>>On 11/10/03 7:41 AM, "Marcus" <marcusb at biotech.kth.se> wrote:
>>>
>>> >
>>> >
>>> >> Hello again. Back from some weeks of laborative work I still have some
>>> >> questions on clustering in R.
>>> >>
>>> >> I got a lot of help from Sean Davis (thanks a lot :o)  ) so if he or
>>> >> someone else have the time....
>>> >>
>>> >> My problem is that I have some spots flagges as NA in a matrix of 
>>> M-values
>>> >> organised slidewise. I want to cluster those but I get error 
>>> messages when
>>> >> using heatmap due to the NA:s in the matrix. I mailed Andy Liaw (who 
>>> wrote
>>> >> the heatmap function) and he gave med the tip to look into the daisy
>>> >> function. And the daisy function is supposed to handle NA:s.
>>> >>
>>> >> But what do you get out of the function?
>>> >>
>>> >> test <- daisy(mymatrix)
>>> >> This creates an object of type dissimilarity right? And you can 
>>> convert it
>>> >> into a matrix with the help of
>>> >> testII <- as.matrix(test)
>>> >> Is this what I should use hclust on? or should I do
>>> >> testIII <- as.dist(testII) before. Neither works so I do not know really
>>> >> what is true.
>>> >>
>>> >> And I tried to use daisy directly with heatmap but that didnt work but
>>> >> produced the same error as with dist.
>>> >>
>>> >> heatmap(mymatrix[1:22,], distfun = dist)
>>> >> Error in hclustfun(distfun(x)) : NA/NaN/Inf in foreign function call 
>>> (arg 11)
>>> >> This is due to the fact that I only have 2 M-values in the twentisecond
>>> >> row and 16 NA:s.
>>> >>
>>> >> So basically my question is, how do you do to get heatmap to work with a
>>> >> matrix of M-values that has got spots flagged NA in them ? What distance
>>> >> function works and how do you use it?
>>> >>
>>> >> Could someone please help me and perhaps write an example of how to 
>>> do. I
>>> >> think the help files are not so good in this perspective.
>>> >>
>>> >> Best regards
>>> >>
>>> >> / Marcus
>>> >
>>> > 
>>> ******************************************************************************
>>> > *************
>>> > Marcus Gry Björklund
>>> >
>>> > Royal Institute of Technology
>>> > AlbaNova University Center
>>> > Stockholm Center for Physics, Astronomy and Biotechnology
>>> > Department of Molecular Biotechnology
>>> > 106 91 Stockholm, Sweden
>>> >
>>> > Phone (office): +46 8 553 783 45
>>> > Fax: + 46 8 553 784 81
>>> > Visiting adress: Roslagstullsbacken 21, Floor 3
>>> > Delivery adress: Roslagsvägen 30B
>>> >
>>> > _______________________________________________
>>> > Bioconductor mailing list
>>> > Bioconductor at stat.math.ethz.ch
>>> > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor
>>> >
>>
>>*******************************************************************************************
>>Marcus Gry Björklund
>>
>>Royal Institute of Technology
>>AlbaNova University Center
>>Stockholm Center for Physics, Astronomy and Biotechnology
>>Department of Molecular Biotechnology
>>106 91 Stockholm, Sweden
>>
>>Phone (office): +46 8 553 783 45
>>Fax: + 46 8 553 784 81
>>Visiting adress: Roslagstullsbacken 21, Floor 3
>>Delivery adress: Roslagsvägen 30B
>>******************************************************************************************* 
>>
>
>*******************************************************************************************
>Johan Lindberg
>Royal Institute of Technology
>AlbaNova University Center
>Stockholm Center for Physics, Astronomy and Biotechnology
>Department of Molecular Biotechnology
>106 91 Stockholm, Sweden
>
>Phone (office): +46 8 553 783 45
>Fax: + 46 8 553 784 81
>Visiting adress: Roslagstullsbacken 21, Floor 3
>Delivery adress: Roslagsvägen 30B
>******************************************************************************************* 
>
>

*******************************************************************************************
Marcus Gry Björklund

Royal Institute of Technology
AlbaNova University Center
Stockholm Center for Physics, Astronomy and Biotechnology
Department of Molecular Biotechnology
106 91 Stockholm, Sweden

Phone (office): +46 8 553 783 45
Fax: + 46 8 553 784 81
Visiting adress: Roslagstullsbacken 21, Floor 3
Delivery adress: Roslagsvägen 30B



More information about the Bioconductor mailing list