[BioC] How to do clustering

Thomas Girke thomas.girke at ucr.edu
Wed Jun 27 10:51:49 CEST 2007


Alex,
If you post a message on a new topic to this list, then please start a
new email thread instead of replying to an old thread that deals with a 
different topic.

The answer to your question is that the method for accessing the wilcoxon
p-values from mas5calls has been changed with the latest BioConductor release
2.0 from 

	se.exprs(eset_pma)
	
	to
	
	assayDataElement(eset_pma, "se.exprs")

In the provided example you would type the following:
	
my_frame <- data.frame(
		exprs(eset_rma), 
		exprs(eset_pma), 
		assayDataElement(eset_pma, "se.exprs")) 

I have updated this change now in the exercise code you are referring to.

Best,

Thomas



On Tue 06/26/07 21:58, ssls sddd wrote:
> Hi Thomas,
> 
> I have another question and need your help. I followed the link
> http://faculty.ucr.edu/~tgirke/Documents/R_BioCond/R_BioCondManual.html#biocon_limmaaffy
> and tried the code presented in the session of 'BioConductor Exercises'.
> 
> I first downloaded 'workshop.zip' file and unpack the files to my computer.
> I also tired
> six files of my Affy arrays but found the code would not work with *.CEL
> files. I manually
> changed .CEL to .cel and I can play with the code well.
> 
> The problem is that when I ran the code:
> 
> *my_frame <- data.frame(exprs(eset_rma), exprs(eset_pma), se.exprs
> (eset_pma))* # Combine RMA intensities, P/M/A calls plus their wilcoxon
> p-values in one data frame.
> 
> The error message popped up as:
> 
> >my_frame <- data.frame(exprs(eset_rma), exprs(eset_pma), 
> >se.exprs(eset_pma))
> 
> error in function (classes, fdef, mtable)  :
>        unable to find an inherited method for function "se.exprs", for
> signature "ExpressionSet"
> >
> 
> This also happened for the files from 'workshop.zip'. Can you suggest me how
> to
> correct this?
> 
> 
> Thanks a lot!
> 
> Sincerely,
> 
> Alex
> 
> 
> On 6/19/07, Thomas Girke <thomas.girke at ucr.edu> wrote:
> >
> >Alex,
> >
> >I guess Martin answered your question.
> >
> >A similar result, but with slower computation, can obtained by applying
> >the IQR function like this:
> >
> >        apply(iris[,1:3], 1, IQR)
> >
> >Thomas
> >
> >On Tue 06/19/07 21:10, Martin Morgan wrote:
> >> Alex,
> >>
> >> > library(Biobase)
> >> [snip]
> >> > args(rowQ)
> >> function (imat, which)
> >> NULL
> >> > showMethods("rowQ")
> >> Function: rowQ (package Biobase)
> >> imat="ExpressionSet", which="numeric"
> >> imat="exprSet", which="numeric"
> >> imat="matrix", which="numeric"
> >>
> >> so it looks like x should be a matrix rather than a data frame.
> >>
> >> Martin
> >>
> >> "ssls sddd" <ssls.sddd at gmail.com> writes:
> >>
> >> > Hi Thomas,
> >> >
> >> > Thanks! Sorry for getting back to it late because I was out
> >> > of town for a couple of days.
> >> >
> >> > I like the idea of 'removing all rows with low variability across
> >> > samples'. I searched around and found an online tutorial
> >> >
> >http://www.economia.unimi.it/projects/marray/2006/material/Lab3/MachineLearning/ML-lab.pdfis
> >> > doing very similar thing which teaches how to filter some
> >> > undifferentially
> >> > expressed genes.
> >> >
> >> > It takes the simplistic approach of using the 75th percentile of the
> >> > interquartile range
> >> > (IQR) as the cut-off point and computes quantiles using rowQ.
> >> >
> >> > I followed their method and my code is:
> >> >
> >> > library("Biobase")
> >> > lowQ = rowQ(x, floor(0.25 * 49))#49 for 49 samples
> >> > upQ = rowQ(x, ceiling(0.75 * 49))
> >> > iqrs = upQ - lowQ
> >> > giqr = iqrs > quantile(iqrs, probs = 0.75)
> >> > sum(giqr)
> >> > xsub = x[giqr, ]
> >> > dim(xsub)
> >> >
> >> > But the error message is like:
> >> >
> >> > function (classes, fdef, mtable)  :
> >> >         unable to find an inherited method for function "rowQ", for
> >> > signature "data.frame", "numeric"
> >> >
> >> > Perhaps you can any experience in using 'rowQ'? If I want to use IQR
> >> > function, how should I approach this?
> >> >
> >> > I really appreciate your help!
> >> >
> >> > Thank you very much!
> >> >
> >> > Sincerely,
> >> >
> >> > Alex
> >> >
> >> >
> >> >
> >> > On 6/13/07, Thomas Girke <thomas.girke at ucr.edu> wrote:
> >> >>
> >> >> Dear Alex,
> >> >>
> >> >> In addition, to Sean's advice, I would like to point out that the
> >> >> sample you are giving below indicates that you are trying to pass on
> >> >> to the heatmap function a column dendrogram plus a row dendrogram.
> >With
> >> >> your
> >> >> matrix of 238,000 rows by 49 columns you should have only a column
> >> >> dendrogram, because the row dendrogram would take more than 200 GB of
> >> >> memory to
> >> >> calculate. You can still use the heatmap or heatmap.2 functions by
> >turning
> >> >> off the row
> >> >> sorting by setting the Rowv argument to NA. In addition to this, I
> >would
> >> >> consider to filter your rows in a meaningful manner to a much smaller
> >> >> number, perhaps by using R's IQR function to remove all rows with
> >very
> >> >> low variability. I am suggesting this because, you won't see any
> >> >> patterns in the heatmap when you have so many rows. If the row
> >filtering
> >> >> works then you could generate a dendrogram for the row dimension as
> >well.
> >> >> Remember: hclust will require ~4 GB of memory to cluster ~30,000
> >items
> >> >> and < 1 GB for 10,000 items, and pvclust that uses hclust internally
> >will
> >> >> need even much more than this.
> >> >>
> >> >> As a more general advice, when working with large data sets in R
> >always
> >> >> subset
> >> >> your data to something very small to test out your strategy first,
> >because
> >> >> this
> >> >> will save you a lot of time.
> >> >> In your case, this could by done by selecting just the first 100 rows
> >of
> >> >> your
> >> >> matrix like this:
> >> >>                 my_matrix <- my_matrix[1:100, ]
> >> >>
> >> >> Once you have tested things out then just remove in your
> >script/protocol
> >> >> the '[1:100,]' part.
> >> >>
> >> >> Best,
> >> >>
> >> >> Thomas
> >> >>
> >> >>
> >> >> On Wed 06/13/07 06:02, Sean Davis wrote:
> >> >> > ssls sddd wrote:
> >> >> > > Dear Dr.Thomas Girke,
> >> >> > >
> >> >> > > I have one more question for you. I tried pvclust in the session
> >of
> >> >> > > 'Obtain significant clusters by pvclust bootstrap analysis' for
> >my
> >> >> data, x.
> >> >> > >
> >> >> > > But I have a problem with:
> >> >> > >
> >> >> > > heatmap(x, Rowv=dend_colored, Colv=as.dendrogram(hc), col=
> >my.colorFct
> >> >> (),
> >> >> > > scale="row", RowSideColors=mycolhc)
> >> >> > >
> >> >> > > the error was:
> >> >> > >
> >> >> > > error in heatmap(x, Rowv = dend_colored, Colv = as.dendrogram(hc),
> >col
> >> >> =
> >> >> > > my.colorFct(),  :
> >> >> > >         'x' must be a numeric matrix
> >> >> > >
> >> >> > > I ran 'x[1:3,1:3]' and it produced the following:
> >> >> > >
> >> >> > >               AIRNS_A09 AIRNS_A11 AIRNS_A12
> >> >> > > SNP_A-1780271   1.85642   1.50956   1.73154
> >> >> > > SNP_A-1780274   1.72140   1.83712   1.85948
> >> >> > > SNP_A-1780277   2.04241   1.53458   1.65270
> >> >> > >
> >> >> > > I think the x is a numeric matrix. Do you think where I may get
> >wrong?
> >> >> >
> >> >> > Try coercing the x into a matrix directly:
> >> >> >
> >> >> > heatmap(as.matrix(x), Rowv=dend_colored, Colv=as.dendrogram(hc),
> >> >> > col=my.colorFct(), scale="row", RowSideColors=mycolhc)
> >> >> >
> >> >> > Does this fix the problem?  You can always check the class of an
> >object
> >> >> > by doing something like:
> >> >> >
> >> >> > class(x)
> >> >> >
> >> >> > which should report:
> >> >> >
> >> >> > [1] "matrix"
> >> >> >
> >> >> > Hope that helps.
> >> >> >
> >> >> > Sean
> >> >> >
> >> >>
> >> >> --
> >> >> Dr. Thomas Girke
> >> >> Assistant Professor of Bioinformatics
> >> >> Director, IIGB Bioinformatic Facility
> >> >> Center for Plant Cell Biology (CEPCEB)
> >> >> Institute for Integrative Genome Biology (IIGB)
> >> >> Department of Botany and Plant Sciences
> >> >> 1008 Noel T. Keen Hall
> >> >> University of California
> >> >> Riverside, CA 92521
> >> >>
> >> >> E-mail: thomas.girke at ucr.edu
> >> >> Website: http://faculty.ucr.edu/~tgirke <
> >http://faculty.ucr.edu/%7Etgirke>
> >> >> Ph: 951-827-2469
> >> >> Fax: 951-827-4437
> >> >>
> >> >
> >> >     [[alternative HTML version deleted]]
> >> >
> >> > _______________________________________________
> >> > Bioconductor mailing list
> >> > Bioconductor at stat.math.ethz.ch
> >> > https://stat.ethz.ch/mailman/listinfo/bioconductor
> >> > Search the archives:
> >http://news.gmane.org/gmane.science.biology.informatics.conductor
> >>
> >> --
> >> Martin Morgan
> >> Bioconductor / Computational Biology
> >> http://bioconductor.org
> >>
> >> _______________________________________________
> >> Bioconductor mailing list
> >> Bioconductor at stat.math.ethz.ch
> >> https://stat.ethz.ch/mailman/listinfo/bioconductor
> >> Search the archives:
> >http://news.gmane.org/gmane.science.biology.informatics.conductor
> >>
> >
> >--
> >Thomas Girke
> >Assistant Professor of Bioinformatics
> >Director, IIGB Bioinformatic Facility
> >Center for Plant Cell Biology (CEPCEB)
> >Institute for Integrative Genome Biology (IIGB)
> >Department of Botany and Plant Sciences
> >1008 Noel T. Keen Hall
> >University of California
> >Riverside, CA 92521
> >
> >E-mail: thomas.girke at ucr.edu
> >Website: http://faculty.ucr.edu/~tgirke
> >Ph: 951-827-2469
> >Fax: 951-827-4437
> >

-- 
Thomas Girke
Assistant Professor of Bioinformatics
Director, IIGB Bioinformatic Facility
Center for Plant Cell Biology (CEPCEB)
Institute for Integrative Genome Biology (IIGB)
Department of Botany and Plant Sciences
1008 Noel T. Keen Hall
University of California
Riverside, CA 92521

E-mail: thomas.girke at ucr.edu
Website: http://faculty.ucr.edu/~tgirke
Ph: 951-827-2469
Fax: 951-827-4437



More information about the Bioconductor mailing list