[BioC] Normalization
Ryan C. Thompson
rct at thompsonclan.org
Thu Feb 28 20:38:04 CET 2013
Hi Vittoria,
It would be best if you could show code examples of what gave you an
empty list and what gave you a list of differentially expressed genes
and what code didn't. Whether you you are doing a pairwise comparison
or a multi-way "ANOVA-style" comparison, edgeR is actually performing
the same test. In general, if all three pairwise comparisons are
yielding significant hits, I would expect some significant hits in the
three-way comparison as well.
-Ryan
On Thu 28 Feb 2013 11:26:17 AM PST, Vittoria Roncalli wrote:
> Hi Ryan,
>
> Thanks again for your explanation, you saved my day!
> Considering your expertise, I would ask you another question.
> I run on the raw data counts a simple one way anova (I have 3
> treatments with 3 reps each) and I found out that there is no
> significant difference between them. Then, with EdgeR I was able, to
> extract a list of DGE fro each pairwise comparison. Is this because
> the ANOVA is calculated on the overall library (total # genes) while
> the DGE comes from a t-test for each individual gene? I found this
> explanation on Bullard et al 2010, but I am not sure if I have
> misunderstood something.
>
> Does it make sense to you?
>
> Have a good day,and thanks again for your help.
>
> Vittoria
>
> On Wed, Feb 27, 2013 at 9:48 PM, Ryan C. Thompson
> <rct at thompsonclan.org <mailto:rct at thompsonclan.org>> wrote:
>
> Hi Vittoria,
>
> Please use "Reply All" so that your reply also goes to the mailing
> list.
>
> The normalization factors are used to adjust the library sizes (I
> forget the details, I believe they are given in the User's Guide),
> and then the pseudo counts are obtained by normalizing the counts
> to the adjusted library sizes. Since you have not used any
> normalization factors (i.e. all norm factors = 1), the pseudo
> counts will simply be some constant factor of counts-per-million,
> if I'm not mistaken. If you want absolutely no normalization, you
> would have to set both the normalization factors and library sizes
> to 1, I think.
>
> In any case, the pseudo counts are only for descriptive purposes.
> The statistical testing in edgeR happens using the raw integer counts.
>
>
> On 02/27/2013 10:12 PM, Vittoria Roncalli wrote:
>> Hi Ryan,
>>
>> thanks for your reply.
>> I obtain pesudo.counts with the following commands
>>
>> "
>>
>> > raw.data <- read.table("counts 2.txt",sep="\t",header=T)
>>
>> > d <- raw.data[, 2:10]
>>
>> > d[is.na <http://is.na>(d)] <- 0
>>
>> > rownames(d) <- raw.data[, 1]
>>
>> > group <- c("CONTROL","CONTROL","CONTROL","LD","LD","LD","HD","HD","HD")
>>
>> > d <- DGEList(counts = d, group = group)
>>
>> Calculating library sizes from column totals.
>>
>> > keep <- rowSums (cpm(d)>1) >=3
>>
>> > d <- d[keep,]
>>
>> > dim(d)
>>
>> [1] 28755 9
>>
>> > d <- DGEList(counts = d, group = group)
>>
>> Calculating library sizes from column totals.
>>
>> > d <- estimateCommonDisp(d)
>>
>>
>> After the common dispersion, I get in the DGE list
>>
>> $counts
>>
>> $samples
>>
>> $commondispersion
>>
>> $pseudo.counts
>>
>> $logCPM
>>
>> $pseudo.lib.size
>>
>>
>>
>> Then I write a table for the pseudo.counts and I will continue
>> with those for the DGE.
>>
>> Considering that I did non normalize the libraries, what are the
>> different counts in the pseudo.counts output?
>>
>>
>> Thanks so much
>>
>>
>> Vittoria
>> On Wed, Feb 27, 2013 at 7:20 PM, Ryan C. Thompson
>> <rct at thompsonclan.org <mailto:rct at thompsonclan.org>> wrote:
>>
>> To answer your first question, when you first create a
>> DGEList object, all the normalization factors are initially
>> set to 1 by default. This is equivalent to no normalization.
>> Once you use calcNormFactors, the normalization factors will
>> be set appropriately.
>>
>> I'm not sure about the second question. Could you provide an
>> example of how you are obtaining pseudocounts with edgeR?
>>
>>
>> On Wed 27 Feb 2013 05:12:27 PM PST, Vittoria Roncalli wrote:
>>
>> Hi, I am a edgeR user and I am a little bit confused on
>> the normalization
>> topic.
>> I am using EdgeR to get different expressed genes within
>> 3 conditions
>> (RnaSeq) with 3 replicates each.
>> I am following the user guide step:
>>
>> -update counts file (from mapping against reference
>> transcriptome)
>> - filter the low counts reads (1cpm)
>> - reassess library size
>> - estimate common dispersion
>>
>> Mi first question is related to the normalization. Why,
>> after I import my
>> file, next to the library size there is then column with
>> norm.factors?
>>
>> $samples
>>
>> group lib.size norm.factors
>>
>> X48h_C_r1.sam CONTROL 10898526 1
>>
>> X48h_C_r2.sam CONTROL 7176817 1
>>
>> X48h_C_r3.sam CONTROL 9511875 1
>>
>> X48h_LD_r1.sam LD 11350347 1
>>
>> X48h_LD_r2.sam LD 14836541 1
>>
>> X48h_LD_r3.sam LD 12635344 1
>>
>> X48h_HD_r1.sam HD 11840963 1
>>
>> X48h_HD_r2.sam HD 17335549 1
>>
>> X48h_HD_r3.sam HD 10274526 1
>>
>>
>>
>> Is the normalization automated? What is the difference
>> with the
>> "calNormFactors?"
>>
>> Moreover, if I do not run the calNormFactors, what is
>> into the
>> pseudo.counts output?
>>
>>
>> I am very confused about those points.
>>
>>
>> Thanks in advance for your help.
>>
>>
>> Looking forward to hearing from you.
>>
>>
>> Vittoria
>>
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> <mailto:Bioconductor at r-project.org>
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>>
>>
>>
>> --
>>
>> Vittoria Roncalli
>>
>> Graduate Research Assistant
>> Center Békésy Laboratory of Neurobiology
>> Pacific Biosciences Research Center
>> University of Hawaii at Manoa
>> 1993 East-West Road
>> Honolulu, HI 96822 USA
>>
>> Tel: 808-4695693 <tel:808-4695693>
>>
>
>
>
>
> --
>
> Vittoria Roncalli
>
> Graduate Research Assistant
> Center Békésy Laboratory of Neurobiology
> Pacific Biosciences Research Center
> University of Hawaii at Manoa
> 1993 East-West Road
> Honolulu, HI 96822 USA
>
> Tel: 808-4695693
>
More information about the Bioconductor
mailing list