[BioC] topTable

J.delasHeras at ed.ac.uk J.delasHeras at ed.ac.uk
Tue Aug 28 19:05:41 CEST 2007


Hi Lev,

I agree with Jenny. I think you're worrying too much. For a gene that  
is not expressed on treatments 1 and 2, and is strongly expressed on  
3, you'll expect upregulation when comparing 3 vs 1, as you say. But  
if comparing 1 vs 2 you'll get WKW ("who-knows-what", as you said),  
because both will have some low intensity rather than exactly zero and  
the log ratios can vary wildly.
I think you'll find, as Jenny suggested, that regardless the measured  
log ratio, that particular gene will *not* be classified as  
"differentially expressed" for 1 vs. 2 (the P value is likely to be  
quite high for that gene in that contrast). So you will be able to  
compare contrasts and select the genes that you want. You're never  
going to pick absolutely *everything*, because there is always some  
error, and the more you pick, the more mistakes you make. P values  
won't tell you if a gene is DE (differentially expressed) or not, but  
it'll give you an idea of how likely you are to make a mistake by  
calling it DE or not DE... ranking a list of genes by P value first,  
and THEN looking at log ratios will allow you to eliminate pretty much  
every case [1] of "no expression but high log ratio" that you're  
concerned about.
In my work, I'm mostly trying to find genes that have no expression in  
a particular situation, but that can be activated after a certain  
treatment, so I have to deal with the same issues you are talking  
about. The only spots I remove are the ones that do not pass an  
intensity threshold in BOTH channels, and in ALL slides, because they  
just add noise and do not contribute anything useful. Anything else  
stays, and I think my analyses work reasonably well so far.

Jose

[1] you may still get some odd things, partly depending on whether you  
use background correction or not, and how good the method you use is  
(if you do)...
I always do a "clean up" check, looking at actual intensities,  
signal-to-noise ratios (given by the scanning software for every spot)  
etc...



Quoting Lev Soinov <lev_embl1 at yahoo.co.uk>:

> Hi Jenny,
>
>   I would not worry about this at all, if I did not have an   
> objective to compare the contrasts afterwards. For example, I need   
> to identify those genes that are regulated in one contrast, but are   
> not differentially expressed in the other one. In the example that I  
>  gave in my previous e-mail, probe A is "not expressed" in the   
> treatments 1 and 2, but is "expressed" in 3. This means that I will   
> get upregulation in 3vs.1 and who-knows-what in 2vs.1. To avoid such  
>  situations, I remove A from the topTable list for 2vs.1 and  
> conclude  that A is upregulated in 3vs.1, but cannot be classified  
> in 2vs.1.  If I do not remove it from the 2vs.1 list, I may end up  
> with a lot  spurious results/false conclusions.
>   Does it sound reasonable?
>
>   With kind regards,
>   Lev.
>
>
>
>
>
> Jenny Drnevich <drnevich at uiuc.edu> wrote:
>   Hi Lev,
>
> I think you are a little fixated on removing probes that are "bad" in
> one of your two contrasts. I don't think it's that serious of an
> issue, and I don't know anyone else who worries about it either.
> Especially since as you mention, there are not that many "bad"
> probes. It's highly unlikely that they would be significant anyway,
> so I don't see why you are so set on removing them. At most, I would
> only worry about checking significant genes in each contrast. Even if
> they slipped through, you are expecting some false positives in your
> list anyway, so I don't think they would radically affect the
> conclusions drawn from the lists. You're analysis steps 1-4 are
> fine, and I would stop there.
>
> That's my 2 cents,
> Jenny
>
>> I do some analysis in LIMMA and would be very grateful for your comments.
>> I have three treatments: 1, 2 and 3, comparing 2vs.1 and 3vs.1.
>> Then I analyse the created lists further, identifying genes that
>> are different/similar between the contrasts. As suggested earlier
>> on this Lists I:
>> 1. normalise using ALL the data;
>> 2. filter out probes which are not expressed across ALL
>> treatments 1, 2 and 3;
>> 3. run LIMMA on the filtered data;
>> 4. produce two gene lists for the two contrasts 2vs1 and 3vs1,
>> using topTable.
>>
>> To take the full advantage of LIMMA, in the above steps 3 and 4,
>> I process the data for all treatments together:
>> design <- model.matrix(~0 +factor(c(1,1,1,2,2,2,3,3,3)))
>> colnames(design) <- c("group1", "group2", "group3")
>> contrast.matrix <- makeContrasts(group2-group1,
>> group3-group1,levels=design)
>> fit <- lmFit(data_normalised_filtered, design)
>> fit2 <- contrasts.fit(fit, contrast.matrix)
>> fit2 <- eBayes(fit2)
>> topTable(fit2, coef=1, adjust="BH")
>> topTable(fit2, coef=2, adjust="BH")
>>
>> This means that some probes may have meaningless results for one
>> of the two contrasts. For example, if probe A is "not expressed" in
>> 1 and 2, but is "expressed" in 3, it will be kept in the analysis
>> (step 2), but obviously its fold change or p-values will be
>> meaningless for the 2vs.1 comparison (because we are comparing
>> noise vs. noise here). Recognising this, as the 5th step of my
>> procedure (after running topTable), I remove probes such as A from
>> the topTable results for the comparison 2vs.1, but keep them in the
>> results for the comparison 3vs.1.
>> So, for example, the topTable for the contrast 2vs.1:
>> ID logFC t P.Value adj.P.Val B
>> X -3.58 -14.19 1.068322e-06 0.0164 3.839
>> Y -4.71 -13.02 2.000032e-06 0.0164 3.589
>> A -2.52 -11.94 3.721566e-06 0.0203 3.315
>> Z -2.19 -11.17 5.993895e-06 0.0222 3.086
>> Will become:
>> ID logFC t P.Value adj.P.Val B
>> X -3.58 -14.19 1.068322e-06 0.0164 3.839
>> Y -4.71 -13.02 2.000032e-06 0.0164 3.589
>> Z -2.19 -11.17 5.993895e-06 0.0222 3.086
>>
>> The other way to make comparisons 2vs.1 and 3vs.1 would be to
>> process them separately, doing filtering for each pair separately
>> as well. But then it would decrease the power.
>> I realise that keeping such partially "bad" probes (probes that
>> are "bad" in one comparison, but are "good" in the other) and
>> removing them after running the topTable can adversely affect
>> "good" probes. It can happen either through eBayes or through the
>> multiple testing correction. My perception is that it would not
>> affect the results a lot, because the "bad" probes are not
>> numerous. Besides, probe rankings should remain the same.
>> Would you say that what I described above is a sensible way to go?
>>
>> Looking forward to your replies,
>> Lev.
>>
>>
>> ---------------------------------
>>
>> [[alternative HTML version deleted]]
>>
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
> Jenny Drnevich, Ph.D.
>
> Functional Genomics Bioinformatics Specialist
> W.M. Keck Center for Comparative and Functional Genomics
> Roy J. Carver Biotechnology Center
> University of Illinois, Urbana-Champaign
>
> 330 ERML
> 1201 W. Gregory Dr.
> Urbana, IL 61801
> USA
>
> ph: 217-244-7355
> fax: 217-265-5066
> e-mail: drnevich at uiuc.edu
>
>
>
>
> ---------------------------------
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:   
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>



-- 
Dr. Jose I. de las Heras                      Email: J.delasHeras at ed.ac.uk
The Wellcome Trust Centre for Cell Biology    Phone: +44 (0)131 6513374
Institute for Cell & Molecular Biology        Fax:   +44 (0)131 6507360
Swann Building, Mayfield Road
University of Edinburgh
Edinburgh EH9 3JR
UK

-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.



More information about the Bioconductor mailing list