[BioC] EdgeR: Using estimateCommondisp for housekeeping genes

Tom Keller kellert at ohsu.edu
Mon Nov 14 23:07:54 CET 2011


This was very helpful to me. But when I compare an R created subset to an expression set I created manually based on the same criteria, they "look" the same, but the R function equals() says FALSE:

> equals(c_eset, co)
[1] FALSE
> equals.Verbose(c_eset, co)
[1] FALSE
attr(,"reason")
[1] "Not same class"
> class(c_eset)
[1] "matrix"
> class(co)
[1] "matrix"
> head(c_eset)
                   01A_H1_08a_446VD.CEL 01A_H1_08b_446VD.CEL 01A_H1_22a_446VD.CEL 01A_H1_22b_446VD.CEL 02A_H1_10a_446VD.CEL 02A_H1_10b_446VD.CEL 02A_H1_24a_446VD.CEL 02A_H1_24b_446VD.CEL
hsa-miR-105_st                 6.795107             6.913433             3.580257             2.749580             6.854192             7.477093             3.834115             3.678890
hsa-miR-10b_st                 5.949392             6.248459             6.824051             6.804110             6.812655             5.530932             5.739557             6.231381
hsa-miR-1180_st                6.294834             5.884889             7.562048             6.952669             6.779352             5.939806             7.055804             7.188879
hsa-miR-1225-5p_st             4.754944             5.403615             3.803957             3.752518             5.513498             6.534705             5.981105             5.429791
hsa-miR-1260_st                3.995493             3.621449             2.846316             2.297834             3.051231             2.566250             3.323427             3.978176
hsa-miR-127-5p_st              6.252869             6.829449             7.170236             7.086132             5.743895             5.812096             6.813619             6.844393
> head(co)
                   01A_H1_08a_446VD.CEL 01A_H1_08b_446VD.CEL 01A_H1_22a_446VD.CEL 01A_H1_22b_446VD.CEL 02A_H1_10a_446VD.CEL 02A_H1_10b_446VD.CEL 02A_H1_24a_446VD.CEL 02A_H1_24b_446VD.CEL
hsa-miR-105_st                 6.795107             6.913433             3.580257             2.749580             6.854192             7.477093             3.834115             3.678890
hsa-miR-10b_st                 5.949392             6.248459             6.824051             6.804110             6.812655             5.530932             5.739557             6.231381
hsa-miR-1180_st                6.294834             5.884889             7.562048             6.952669             6.779352             5.939806             7.055804             7.188879
hsa-miR-1225-5p_st             4.754944             5.403615             3.803957             3.752518             5.513498             6.534705             5.981105             5.429791
hsa-miR-1260_st                3.995493             3.621449             2.846316             2.297834             3.051231             2.566250             3.323427             3.978176
hsa-miR-127-5p_st              6.252869             6.829449             7.170236             7.086132             5.743895             5.812096             6.813619             6.844393
> summary(co)
 01A_H1_08a_446VD.CEL 01A_H1_08b_446VD.CEL 01A_H1_22a_446VD.CEL 01A_H1_22b_446VD.CEL 02A_H1_10a_446VD.CEL 02A_H1_10b_446VD.CEL 02A_H1_24a_446VD.CEL 02A_H1_24b_446VD.CEL
 Min.   : 1.877       Min.   : 2.176       Min.   : 1.767       Min.   : 1.727       Min.   : 1.774       Min.   : 2.022       Min.   : 1.779       Min.   : 1.738      
 1st Qu.: 3.976       1st Qu.: 4.085       1st Qu.: 2.789       1st Qu.: 2.669       1st Qu.: 3.881       1st Qu.: 4.085       1st Qu.: 2.618       1st Qu.: 2.585      
 Median : 5.621       Median : 5.331       Median : 3.767       Median : 3.812       Median : 5.557       Median : 5.522       Median : 3.568       Median : 3.581      
 Mean   : 5.567       Mean   : 5.616       Mean   : 4.247       Mean   : 4.176       Mean   : 5.643       Mean   : 5.708       Mean   : 4.087       Mean   : 4.180      
 3rd Qu.: 6.816       3rd Qu.: 6.861       3rd Qu.: 5.316       3rd Qu.: 5.221       3rd Qu.: 7.040       3rd Qu.: 6.828       3rd Qu.: 5.098       3rd Qu.: 5.184      
 Max.   :10.387       Max.   :10.703       Max.   :11.126       Max.   :11.105       Max.   :10.939       Max.   :10.958       Max.   :10.796       Max.   :10.930      
> summary(c_eset)
 01A_H1_08a_446VD.CEL 01A_H1_08b_446VD.CEL 01A_H1_22a_446VD.CEL 01A_H1_22b_446VD.CEL 02A_H1_10a_446VD.CEL 02A_H1_10b_446VD.CEL 02A_H1_24a_446VD.CEL 02A_H1_24b_446VD.CEL
 Min.   : 1.877       Min.   : 2.176       Min.   : 1.767       Min.   : 1.727       Min.   : 1.774       Min.   : 2.022       Min.   : 1.779       Min.   : 1.738      
 1st Qu.: 3.976       1st Qu.: 4.085       1st Qu.: 2.789       1st Qu.: 2.669       1st Qu.: 3.881       1st Qu.: 4.085       1st Qu.: 2.618       1st Qu.: 2.585      
 Median : 5.621       Median : 5.331       Median : 3.767       Median : 3.812       Median : 5.557       Median : 5.522       Median : 3.568       Median : 3.581      
 Mean   : 5.567       Mean   : 5.616       Mean   : 4.247       Mean   : 4.176       Mean   : 5.643       Mean   : 5.708       Mean   : 4.087       Mean   : 4.180      
 3rd Qu.: 6.816       3rd Qu.: 6.861       3rd Qu.: 5.316       3rd Qu.: 5.221       3rd Qu.: 7.040       3rd Qu.: 6.828       3rd Qu.: 5.098       3rd Qu.: 5.184      
 Max.   :10.387       Max.   :10.703       Max.   :11.126       Max.   :11.105       Max.   :10.939       Max.   :10.958       Max.   :10.796       Max.   :10.930 


I don't understand why equals() gives "FALSE". They seem to be identical.

thanks for any suggestions or references.
Tom
MMI DNA Services Core Facility
503-494-2442
kellert at ohsu.edu
Office: 6588 RJH (CROET/BasicScience)

OHSU Shared Resources






On Nov 13, 2011, at 7:35 AM, Mark Robinson wrote:

> Hi Tonya,
> 
> I believe your question comes down to how to subset a DGEList object.  As the example 'd[housekeeping,]' suggests, it is much like subletting a matrix (if you don't know how to do this, you should consult an Intro to R manual).  
> 
> Here is an example:
> 
> y <- matrix(rnbinom(80,size=1/0.2,mu=10),nrow=20,ncol=4)
> rownames(y) <- paste("Gene",1:nrow(y),sep=".")
> group <- factor(c(1,1,2,2))
> d <- DGEList(counts=y,group=group,lib.size=rep(1000,4))
> 
> If you knew your housekeeping genes were in rows {4,6,10,15} of your table, you could simply call:
> 
> do <- estimateCommonDisp(d[c(4,6,10,15),])
> 
> Of course, there are lots of ways to subset, e.g.:
> http://www.ats.ucla.edu/stat/r/modules/subsetting.htm
> 
> Equivalent to above but slight different approach, you could do:
> 
> gkeep <- paste("Gene",c(4,6,10,15),sep=".")
> do <- estimateCommonDisp(d[rownames(d) %in% gkeep,])
> 
> … and so on.
> 
> On the whole enterprise of doing these analyses w/o replicates, there has been a lot of discussion:
> 
> http://seqanswers.com/forums/showthread.php?t=4055
> http://seqanswers.com/forums/showthread.php?t=10137
> http://seqanswers.com/forums/showthread.php?t=11081
> https://stat.ethz.ch/pipermail/bioconductor/2011-July/040296.html
> … and so on.
> 
> All the best,
> Mark
> 
> ----------
> Prof. Dr. Mark Robinson
> Bioinformatics
> Institute of Molecular Life Sciences
> University of Zurich
> Winterthurerstrasse 190
> 8057 Zurich
> Switzerland
> 
> v: +41 44 635 4848
> f: +41 44 635 6898
> e: mark.robinson at imls.uzh.ch
> o: Y32-J-34
> w: http://tiny.cc/mrobin
> 
> 
> 
> 
> On 12.11.2011, at 23:12, Tonya Mariko Brunetti wrote:
> 
>> Hello,
>> 
>> My name is Tonya and I am very new to both R and edgeR so sorry if this seems silly.  I have recently gotten back results of two samples from a 454 and do not have replicates of either.  I was reading the edgeR manual section about what to do about calculating common dispersion factors if no replicates are available.
>> 
>> One of the options was to use genes that are not suppose to be differentially expressed (ie housekeeping genes) to determine the common dispersion.  In the manaul they show an example do<-estimateCommonDisp(d[housekeeping,]).
>> 
>> Would anyone please explain to me how I can use the housekeeping genes in the data I have collected to estimate this value.  I have tried numerous things for input into the function estimateCommonDisp (see below some of what I have tried) but I guess I don't know how to specify just the housekeeping genes??  Or if anyone has a method for common dispersion in edgeR that will work for no replicates that would be appreciated as well!
>> 
>> estimateCommonDisp(d['RpS2','RpS28b]) (where the stuff in brackets are my housekeeping genes and d is my normalized DGEList
>> estimateCommonDisp(d[RpS2,RpS28b])
>> 
>> 
>> Thank you so much!
>> 
>> Tonya
>> 
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list