[BioC] clustering in R

James W. MacDonald jmacdon at uw.edu
Tue Oct 23 16:36:03 CEST 2012


Hi Priya,

On 10/23/2012 3:34 AM, priya [guest] wrote:
> I have a RMA normalized genes expression datset with 22810 rows and 9 columns( types of promoters) and a subset of the data is as follows:
>
>      ID_REF GSM362180    GSM362181  GSM362188    GSM362189  GSM362192
>      244901 5.094871713 4.626623079 4.554272515 4.748604391 4.759221647
>      244902 5.194528083 4.985930299 4.817426064 5.151654407 4.838741605
>      244903 5.412329253 5.352970877 5.06250609  5.305709079 8.365082403
>      244904 5.529220594 5.28134657  5.467445095 5.62968933  5.458388909
>      244905 5.024052699 4.714631878 4.792865831 4.843975286 4.657188246
>      244906 5.786557533 5.242403911 5.060605782 5.458148567 5.890061836
>
>
>
>
>
>   -- output of sessionInfo():
>
> I want to do a clustering of the above and tried the hierarchical clustering:
>
>      d<- dist(as.matrix(deg), method = "euclidean")
> where deg is the a matrix of the differentially expressed genes ( 4300 in number ).And I get the following warning:
>
>        Warning message:
>       In dist(as.matrix(deg), method = "euclidean") : NAs introduced by coercion
>
>   Is it allright to proceed with the clustering inspite of the warning ?

Well, you shouldn't get that warning if your matrix is all numeric. And 
if your matrix isn't all numeric, it will usually all be coerced to 
character, so I would want to check that out and see what is happening.

>
>
>      hc<- hclust(d)
>      plot(hc, hang = -0.01, cex = 0.7)
>
> I get a dendrogram which is very dense and the labels are not clear: Also I do not know which of the 9 promoters are classified in the tree for the several genes: How would it be possible to label the tree with the promoters and also how to visualize the genes into a clearer dendrogram? There are around 4300 genes and would like to get a better dendrogram so that I could visualize it better.

That is a lot of genes, so you will have to make the dendrogram really 
big if you actually want to see things. The best thing to do IMO is to 
put it in a pdf of the correct size, and then you can zoom in and look 
at different regions. It would probably be easiest to make the pdf 
really wide, so something like

pdf("dendrogram.pdf", width = 200, height = 8)
plot(hc, hang = -0.01, cex = 0.7)
dev.off()

As for the promoters being classified by the tree, I am not sure what 
you are asking. If it is simply a labeling issue, note that your 'hc' 
object is a list with a 'labels' member that contains whatever is going 
to be used in labeling the dendrogram. If you want to change what the 
labels are, then you can modify that.

Best,

Jim


>
>
>
>
> --
> Sent via the guest posting facility at bioconductor.org.
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099



More information about the Bioconductor mailing list