[BioC] question about normalization of RNAseq by tweeDEseq using TMM from edgeR

Tue Feb 14 09:23:46 CET 2012

hi,

On Mon, 2012-02-13 at 15:09 -0500, Sermsawat Tunlaya-Anukit wrote:
> Thank you for your answer. I just want to find normalization count for
> another analysis such as partial correlation.

as far as i know "partial correlations" are defined for continuous data
only, so i'm afraid you cannot directly calculate them from RNA-seq
count data.

> I used edgeR for calculate differential gene expression. I calculate
> normalize count by using raw count divide by effective library size
> (normalize factor multiple with library size) and multiple by 1000000.

again, i'd recommend you to check edgeR documentation about how to use
edgeR for normalization and differential gene expression analysis. what
you say sounds like feeding RPMs (reads per million) into edgeR which i
believe is all wrong.

>  I saw tweeDEseq approach and try to use it.

tweeDEseq takes only a table of counts and a two-sample group indicator
variable as input. the tweeDEseq package does not provide its own
normalization approach and relies at the moment on the functionality of
edgeR for this purpose through the tweeDEseq function
'normalizeCounts()', or any other package that can produce a normalized
table of counts, such as the BioC packages cqn' or 'EDASeq'.

therefore, in order to feed normalized RNA-seq count data into tweeDEseq
one needs to obtain first a normalized table of counts, such as the one
provided by the function 'normalizeCounts()', whose "normalized" counts
may be different from the raw counts.

> After i see result of normalize is different from my calculation, so i
> just want to know what happen?

when you transformed raw counts into normalized counts, these may change
becoming larger or smaller. however, their interpretation should be
restricted to the interpretation made by the corresponding differential
expression analysis technique. in the case of tweeDEseq, normalized
counts help to make more accurate calls of differential expression but i
do not know whether normalized (transformed) counts are useful for other
inferences on RNA-seq data. i do see a danger in making an isolate
biological interpretation of a gene having a positive value of
normalized counts while the raw value was zero.

if you are interested in the issue of normalizing RNA-seq data, i'd
recommend you to take a look to these papers and their corresponding
BioC packages ('cqn' and 'EDASeq'):

http://biostatistics.oxfordjournals.org/content/early/2012/01/24/biostatistics.kxr054.long
http://www.biomedcentral.com/1471-2105/12/480/abstract

cheers,
robert.

> 
> Best regards,
> Sermsawat T.
> 
> On Mon, Feb 13, 2012 at 12:52 PM, Robert Castelo
> <robert.castelo at upf.edu> wrote:
>         Dear Sermsawat,
>         
>         the way in which "normalizeCounts()" uses edgeR-TMM
>         normalization is
>         analogous to the edgeR function "exactTest()" which equalizes
>         library
>         sizes using "equalizeLibSizes()" resulting in these changes in
>         the table
>         of counts. let me warn you, however, that you should *not* use
>         the
>         function normalizeCounts() from the tweeDEseq package to input
>         later the
>         resulting table on some other package for differential
>         expression
>         analysis, such as edgeR or DESeq. if you're going to use some
>         other
>         package for DE analysis then you should go to its specific
>         documentation
>         to see how to input and normalize your data.
>         
>         cheers,
>         robert.
>         
>         On Mon, 2012-02-13 at 00:54 -0500, Sermsawat Tunlaya-Anukit
>         wrote:
>         > I have some question about normalization in package
>         tweeDEseq which using
>         > TMM method in edgeR to normalize count data. I run
>         normalization as manual
>         > and found something unusual. The read count before
>         normalization of gene 4
>         > sample X1 and X2 is 0, but after normalization it turn to 4
>         and 3. Why
>         > normalization add count into 0 count? Did it effect from
>         tagwise
>         > dispersions? I post my code under here for more information.
>         Thank you in
>         > advance.
>         >
>         > Sermsawat Tunlaya-anukit
>         >
>         > > library(tweeDEseq)
>         > > y <- read.table("rawcount.txt", header=T )
>         > > group <- c(1,1,1,2,2,2,2,3,3,3,4,4)
>         > > yN <- normalizeCounts(y, group)
>         > Using edgeR normalization methods.
>         > Calculating library sizes from column totals.
>         > Calculating normalization factors with the TMM method.
>         > Estimating common dispersion.
>         > Estimating tagwise dispersions.
>         > Calculating effective library sizes.
>         > Adjusting counts to effective library sizes using tagwise
>         dispersions.
>         > > head(y)
>         >    X1  X2   X3   X4  X5  X6   X7   X8  X9  X10 X11 X12
>         > 1   0   0    0    1  11  18   16   12   9   12  25  19
>         > 2  14  28   84   56  54  40  114   86  43   91 150  83
>         > 3  12   8   18   15  12  10   32   19  27   31  44  21
>         > 4   0   0    0    0   0   0    0    0   0    0   0   0
>         > 5   4   6    8    3   7  12   22   44  14    1   1   2
>         > 6 899 725 1563 1342 173 129 1072 1607 172 1184 720 524
>         > > head(yN)
>         >        X1   X2   X3   X4  X5  X6  X7   X8  X9 X10 X11 X12
>         > [1,]    1    1    0    1  13  22   7    7  13   8  13  17
>         > [2,]   39   64   81   56  63  51  49   53  65  58  77  76
>         > [3,]   29   18   17   15  14  13  13   11  39  20  22  19
>         > [4,]    4    3    0    0   0   1   0    0   1   0   0   0
>         > [5,]   10   13    8    3   8  15  10   28  21   0   0   2
>         > [6,] 2306 1652 1497 1342 201 164 468 1001 261 752 363 476
>         > > sessionInfo()
>         > R version 2.14.1 (2011-12-22)
>         > Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
>         >
>         > locale:
>         > [1] C/en_US.UTF-8/C/C/C/C
>         >
>         > attached base packages:
>         > [1] stats     graphics  grDevices utils     datasets
>          methods   base
>         >
>         > other attached packages:
>         > [1] tweeDEseq_1.0.11
>         >
>         > loaded via a namespace (and not attached):
>         > [1] MASS_7.3-16  edgeR_2.4.3  limma_3.10.2 tools_2.14.1
>         >
>         
>         >       [[alternative HTML version deleted]]
>         >
>         > _______________________________________________
>         > Bioconductor mailing list
>         > Bioconductor at r-project.org
>         > https://stat.ethz.ch/mailman/listinfo/bioconductor
>         > Search the archives:
>         http://news.gmane.org/gmane.science.biology.informatics.conductor
>         >
>         
>         
>