[BioC] Normalization

Ryan C. Thompson rct at thompsonclan.org
Thu Feb 28 20:38:04 CET 2013


Hi Vittoria,

It would be best if you could show code examples of what gave you an 
empty list and what gave you a list of differentially expressed genes 
and what code didn't. Whether you you are doing a pairwise comparison 
or a multi-way "ANOVA-style" comparison, edgeR is actually performing 
the same test. In general, if all three pairwise comparisons are 
yielding significant hits, I would expect some significant hits in the 
three-way comparison as well.

-Ryan

On Thu 28 Feb 2013 11:26:17 AM PST, Vittoria Roncalli wrote:
> Hi Ryan,
>
> Thanks again for your explanation, you saved my day!
> Considering your expertise, I would ask you another question.
> I run on the raw data counts a simple one way anova (I have 3
> treatments with 3 reps each) and I found out that there is no
> significant difference between them. Then, with EdgeR I was able, to
> extract a list of DGE fro each pairwise comparison. Is this because
> the ANOVA is calculated on the overall library (total # genes) while
> the DGE comes from a t-test for each individual gene? I found this
> explanation on Bullard et al 2010, but I am not sure if I have
> misunderstood something.
>
> Does it make sense to you?
>
> Have a good day,and thanks again for your help.
>
> Vittoria
>
> On Wed, Feb 27, 2013 at 9:48 PM, Ryan C. Thompson
> <rct at thompsonclan.org <mailto:rct at thompsonclan.org>> wrote:
>
>     Hi Vittoria,
>
>     Please use "Reply All" so that your reply also goes to the mailing
>     list.
>
>     The normalization factors are used to adjust the library sizes (I
>     forget the details, I believe they are given in the User's Guide),
>     and then the pseudo counts are obtained by normalizing the counts
>     to the adjusted library sizes. Since you have not used any
>     normalization factors (i.e. all norm factors = 1), the pseudo
>     counts will simply be some constant factor of counts-per-million,
>     if I'm not mistaken. If you want absolutely no normalization, you
>     would have to set both the normalization factors and library sizes
>     to 1, I think.
>
>     In any case, the pseudo counts are only for descriptive purposes.
>     The statistical testing in edgeR happens using the raw integer counts.
>
>
>     On 02/27/2013 10:12 PM, Vittoria Roncalli wrote:
>>     Hi Ryan,
>>
>>     thanks for your reply.
>>     I obtain pesudo.counts with the following commands
>>
>>     "
>>
>>     > raw.data <- read.table("counts 2.txt",sep="\t",header=T)
>>
>>     > d <- raw.data[, 2:10]
>>
>>     > d[is.na <http://is.na>(d)] <- 0
>>
>>     > rownames(d) <- raw.data[, 1]
>>
>>     > group <- c("CONTROL","CONTROL","CONTROL","LD","LD","LD","HD","HD","HD")
>>
>>     > d <- DGEList(counts = d, group = group)
>>
>>     Calculating library sizes from column totals.
>>
>>     > keep <- rowSums (cpm(d)>1) >=3
>>
>>     > d <- d[keep,]
>>
>>     > dim(d)
>>
>>     [1] 28755 9
>>
>>     > d <- DGEList(counts = d, group = group)
>>
>>     Calculating library sizes from column totals.
>>
>>     > d <- estimateCommonDisp(d)
>>
>>
>>     After the common dispersion, I get in the DGE list
>>
>>     $counts
>>
>>     $samples
>>
>>     $commondispersion
>>
>>     $pseudo.counts
>>
>>     $logCPM
>>
>>     $pseudo.lib.size
>>
>>
>>
>>     Then I write a table for the pseudo.counts and I will continue
>>     with those for the DGE.
>>
>>     Considering that I did non normalize the libraries, what are the
>>     different counts in the pseudo.counts output?
>>
>>
>>     Thanks so much
>>
>>
>>     Vittoria
>>     On Wed, Feb 27, 2013 at 7:20 PM, Ryan C. Thompson
>>     <rct at thompsonclan.org <mailto:rct at thompsonclan.org>> wrote:
>>
>>         To answer your first question, when you first create a
>>         DGEList object, all the normalization factors are initially
>>         set to 1 by default. This is equivalent to no normalization.
>>         Once you use calcNormFactors, the normalization factors will
>>         be set appropriately.
>>
>>         I'm not sure about the second question. Could you provide an
>>         example of how you are obtaining pseudocounts with edgeR?
>>
>>
>>         On Wed 27 Feb 2013 05:12:27 PM PST, Vittoria Roncalli wrote:
>>
>>             Hi, I am a edgeR user and I am a little bit confused on
>>             the normalization
>>             topic.
>>             I am using EdgeR to get different expressed genes within
>>             3 conditions
>>             (RnaSeq) with 3 replicates each.
>>             I am following the user guide step:
>>
>>             -update counts file (from mapping against reference
>>             transcriptome)
>>             - filter the low counts reads (1cpm)
>>             - reassess library size
>>             - estimate common dispersion
>>
>>             Mi first question is related to the normalization. Why,
>>             after I import my
>>             file, next to the library size there is then column with
>>             norm.factors?
>>
>>             $samples
>>
>>                               group lib.size norm.factors
>>
>>             X48h_C_r1.sam  CONTROL 10898526            1
>>
>>             X48h_C_r2.sam  CONTROL  7176817            1
>>
>>             X48h_C_r3.sam  CONTROL  9511875            1
>>
>>             X48h_LD_r1.sam      LD 11350347            1
>>
>>             X48h_LD_r2.sam      LD 14836541            1
>>
>>             X48h_LD_r3.sam      LD 12635344            1
>>
>>             X48h_HD_r1.sam      HD 11840963            1
>>
>>             X48h_HD_r2.sam      HD 17335549            1
>>
>>             X48h_HD_r3.sam      HD 10274526            1
>>
>>
>>
>>             Is the normalization automated? What is the difference
>>             with the
>>             "calNormFactors?"
>>
>>             Moreover, if I do not run the calNormFactors, what is
>>             into the
>>             pseudo.counts output?
>>
>>
>>             I am very confused about those points.
>>
>>
>>             Thanks in advance for your help.
>>
>>
>>             Looking forward to hearing from you.
>>
>>
>>             Vittoria
>>
>>
>>             _______________________________________________
>>             Bioconductor mailing list
>>             Bioconductor at r-project.org
>>             <mailto:Bioconductor at r-project.org>
>>             https://stat.ethz.ch/mailman/listinfo/bioconductor
>>             Search the archives:
>>             http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>>
>>
>>
>>     --
>>
>>     Vittoria Roncalli
>>
>>     Graduate Research Assistant
>>     Center Békésy Laboratory of Neurobiology
>>     Pacific Biosciences Research Center
>>     University of Hawaii at Manoa
>>     1993 East-West Road
>>     Honolulu, HI 96822 USA
>>
>>     Tel: 808-4695693 <tel:808-4695693>
>>
>
>
>
>
> --
>
> Vittoria Roncalli
>
> Graduate Research Assistant
> Center Békésy Laboratory of Neurobiology
> Pacific Biosciences Research Center
> University of Hawaii at Manoa
> 1993 East-West Road
> Honolulu, HI 96822 USA
>
> Tel: 808-4695693
>



More information about the Bioconductor mailing list