[BioC] loged data or not loged previous to use normalize.quantile

Wolfgang Huber huber at ebi.ac.uk
Fri Apr 1 23:06:41 CEST 2005


Hi Marcelo,

the difference is that the power of the test you are doing can be 
different when you consider the data on the "raw" or on the 
log-transformed scale.

Also, the p-value calculated by limma is based on the assumption that 
the null-distribution of the test statistic is given by a 
t-distribution; this assumption might be more or less true in both cases.

You are really doing two different tests: test A, say, consists of 
applying the t-statistic to the untransformed intensities, test B, say, 
applying the t-statistic to the transformed intensities.

Then, if you want to use the t-distribution for getting p-values, you 
need to make sure that the null distribution of your test statistic
is indeed (to good enough approximation) t-distributed. You can do this 
e.g. by permutations. For that you need either a large number of 
replicates, or to pool variance estimators across genes.

If you don't want to make a parametric assumption for getting p-values, 
you need a larger number of replicates; if you have these, you can for 
example calculate a permutation p-value.

So, there is really no "right" or "wrong" about transforming, or which 
transformation -- as long as you don't violate the assumptions of the 
subsequent tests. If the assumptions are met, then the procedure with 
the highest power is preferable. And that depends very much on your data 
(about which you have not told us much.)

Hope that helps.

And here is another shameless plug: have a look at this paper:
Differential Expression with the Bioconductor Project
http://www.bepress.com/bioconductor/paper7

   Best wishes
    Wolfgang

Marcelo Luiz de Laia wrote:
> Dear Bioconductors Friends,
> 
> I have a question that I dont found answer for it. Please, if you have a 
> paper/article that explain it, please, tell me.
> 
> I normalize our data using normalize.quantile function.
> 
> If I previous transform our intensities (single channel) in log2, I dont 
> get differentially genes in limma.
> 
> But, if I dont transform our data, I get some genes with p.value around 
> 0.0001, thats is great!
> 
> Of course, when I transform the intensities data to log2, I get some NA.
> 
> Why are there this difference? Am I wrong in does an analysis with not 
> loged data?
> 
> Thanks a lot
> 
> Marcelo
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor


-- 
Best regards
   Wolfgang

-------------------------------------
Wolfgang Huber
European Bioinformatics Institute
European Molecular Biology Laboratory
Cambridge CB10 1SD
England
Phone: +44 1223 494642
Fax:   +44 1223 494486
Http:  www.ebi.ac.uk/huber



More information about the Bioconductor mailing list