[BioC] calcNormFactors - normalization

Davis McCarthy dmccarthy at wehi.EDU.AU
Sun May 8 01:55:57 CEST 2011


Lana

The package can handle just 1 sample per group, but in that case you do
not have sufficient degrees of freedom to estimate the dispersion
parameter, which enables the model to account for biological variability
between samples. There are a couple of workarounds that you can use:
1) Dispersion value of zero, which is equivalent to the Poisson model.
This is the default if edgeR detects no replication in your groups.
2) Treat the samples as members of one group to estimate a value for the
common dispersion and plug that value in when you look for DE between
groups.

The first approach will likely overstate the amount of DE between the
samples. The second approach will tend to overestimate the dispersion
parameter, so is conservative. In an ideal world you would have biological
replicate samples.

Cheers
Davis


> Davis,
> Untranslated this means that the package doesn't handle just 1 sample per
> group,
> For both groups.
> Lana
>
> -----Original Message-----
> From: Davis McCarthy [mailto:dmccarthy at wehi.EDU.AU]
> Sent: Saturday, May 07, 2011 12:41 AM
> To: Lana Schaffer
> Cc: 'Mark Robinson'; 'bioconductor at r-project.org'
> Subject: Re: [BioC] calcNormFactors - normalization
>
> Hi Lana
>
> The package can handle more than one sample per group and indeed the full
> utility of the methods in edgeR are unlocked when there is replication in
> at least one group.
>
> Cheers
> Davis
>
>
>> Mark,
>> Thanks.
>> Can you use this package for only 1 sample each group?
>> Lana
>>
>> -----Original Message-----
>> From: Mark Robinson [mailto:mrobinson at wehi.EDU.AU]
>> Sent: Friday, May 06, 2011 5:15 PM
>> To: Lana Schaffer
>> Cc: 'bioconductor at r-project.org'
>> Subject: Re: [BioC] calcNormFactors - normalization
>>
>>
>> On 2011-05-07, at 9:57 AM, Lana Schaffer wrote:
>>
>>> Mark,
>>> My gene library contains only 141 genes.  Is this low number
>>> Alright in this model?
>>
>> Yes, this is alright.  For one thing, you pay a much smaller multiple
>> testing penalty.
>>
>>> The length on the genes are not accounted for in this package edgeR?
>>
>> Correct, not accounted for.  But, you are comparing genes across
>> samples.
>>
>> Mark
>>
>>
>>> Lana
>>>
>>> -----Original Message-----
>>> From: Mark Robinson [mailto:mrobinson at wehi.EDU.AU]
>>> Sent: Friday, May 06, 2011 4:55 PM
>>> To: Lana Schaffer
>>> Cc: 'bioconductor at r-project.org'
>>> Subject: Re: [BioC] calcNormFactors - normalization
>>>
>>> Hi Lana,
>>>
>>> The factor (offset) that gets used in the statistical model is actually
>>> the *product* of lib.size and norm.factors, so the lower depth of
>>> library HCV_100d_2 is taken into account.
>>>
>>> Mark
>>>
>>> On 2011-05-07, at 9:49 AM, Lana Schaffer wrote:
>>>
>>>> Greetings,
>>>> Using d <- calcNormFactors(d)
>>>> I get the following normalization factors.
>>>> Why are the factors so similar when the the 4th count is 1/20 the
>>>> counts as the rest?
>>>>
>>>>> d$samples
>>>>          group lib.size norm.factors
>>>> HCV_45d_1    d45  7812615    1.0471701
>>>> HCV_45d_2    d45  9728373    1.0004453
>>>> HCV_100d_1  d100  8606449    0.9516424
>>>> HCV_100d_2  d100   446991    1.0030340
>>>>
>>>> Lana Schaffer
>>>> Biostatistics, Informatics
>>>> DNA Array Core Facility
>>>> 858-784-2263
>>>>
>>>>
>>>> 	[[alternative HTML version deleted]]
>>>>
>>>> _______________________________________________
>>>> Bioconductor mailing list
>>>> Bioconductor at r-project.org
>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>> Search the archives:
>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>>> ------------------------------
>>> Mark Robinson, PhD (Melb)
>>> Epigenetics Laboratory, Garvan
>>> Bioinformatics Division, WEHI
>>> e: mrobinson at wehi.edu.au
>>> e: m.robinson at garvan.org.au
>>> p: +61 (0)3 9345 2628
>>> f: +61 (0)3 9347 0852
>>> ------------------------------
>>>
>>>
>>> ______________________________________________________________________
>>> The information in this email is confidential and inte...{{dropped:24}}
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
>
> --------------------------------------------------
> Davis J McCarthy
> Research Technician
> Bioinformatics Division
> Walter and Eliza Hall Institute of Medical Research
> 1G Royal Parade, Parkville, Vic 3052, Australia.
> dmccarthy at wehi.edu.au
> http://www.wehi.edu.au
>
>
> ______________________________________________________________________
> The information in this email is confidential and inte...{{dropped:23}}



More information about the Bioconductor mailing list