[BioC] edgeR and calcNormFactors manually

Gordon K Smyth smyth at wehi.EDU.AU
Sun Apr 27 08:46:58 CEST 2014

Dear Scott,

I'm afraid I still have little idea of what you're trying to achieve.  I'm 
not even quite sure whether you are looking for differential exprssion 
and, if so, between which samples.

Your mathematician is describing least squares regression though the 
origin.  I don't see that that would be sensible thing to do in the edgeR 

Have you considered





On Fri, 25 Apr 2014, Scott Daniel wrote:

> Hi,
> First off, I determined that assigning the normalization factors manually
> as-it-were does affect the results (# of DE genes). I did this by comparing
> it to a set where I just set p and r to a scaling factor of 1. The library
> sizes don't change in the y$samples but the results change.
> To go back to the general discussion of why we did ad-hoc scaling, I'll try
> to explain. The 'input' I am speaking of is the RNA sample preceding a
> polysome fractionation. We then take the resultant products and split them
> into 'RNP' (which includes the monosome, free RNA and ribo-nucelo-proteins)
> and 'Poly' (which includes actively-translating, polysomal RNA). We've then
> done RNA-seq on all three samples (input, rnp, poly).
> The idea is that we scale the poly and rnp counts to the input using a
> simple linear model. We had another mathematician explain it like this:
> "So then what you have to do is basically construct a mx2 matrix A, where m
> = total number of genes. Then you fill the first column of A with the
> counts for all genes in your polysome sub-sample, and the second column of
> A with the counts for all genes in your RNP sub-sample. The you make a mx1
> matrix Y, which is the counts for all genes in the input sub-sample. Then
> you find the least-squares solution (in R I think it's lsqfit(A,Y) or
> something) to the equation Y = Ax and solve for x, which will be a 2x1
> vector with your scaling factors n_p and n_r. Then, you replace your
> polysome counts with P*n_p and your RNP counts with R*n_r"
> So, since we have done this manual scaling, our mathematician recommended
> we calculate normalization factors for input only and then apply these
> factors to the corresponding polysome and rnp samples.
> Thanks.
> Best,
> Scott Daniel
> Zarnescu Lab
> MCB Ph.D. Program
> University of Arizona
> On Thu, Apr 17, 2014 at 6:26 PM, Gordon K Smyth <smyth at wehi.edu.au> wrote:
>> Dear Scott,
>> Normalization scaling factors are specific to each library, so you cannot
>> meaningfully compute them for one set of libraries and copy the results to
>> a larger set of libraries.
>> If you try to explain what you are actually trying to achieve, we might be
>> able to give you some advice.
>> As it is, I don't yet understand why you aren't simply using
>> calcNormFactors() in the way it is intended, which would be to apply it to
>> all libraries at once, or why you need to do any prior ad hoc scaling.
>> Not sure what you mean by an 'input' sample, because 'input' is usually
>> associated with ChIP-seq rather than RNA-seq.
>> Best wishes
>> Gordon
>>  Date: Tue, 15 Apr 2014 16:40:53 -0700
>>> From: Scott Daniel <scottdaniel at email.arizona.edu>
>>> To: bioconductor at r-project.org
>>> Subject: [BioC] edgeR and calcNormFactors manually
>>> Dear Colleagues,
>>> I have an RNA-seq library with 27 samples. 9 of these samples are what we
>>> call input as the other 18 are derived from them through a biochemical
>>> process. As such, I have already scaled the genes in those 18 samples to
>>> roughly add-up to the corresponding inputs.
>>> To manage this at the calcNormFactors step I have the following code:
>>> y$samples
>>> y_inputs=y$counts[,grep(".[i]..",colnames(y$counts))]
>>> norm_factor_inputs <- calcNormFactors(y_inputs)
>>> y$samples$norm.factors[grep(".[i]..",row.names(y$samples))] =
>>> norm_factor_inputs
>>> y$samples$norm.factors[grep(".[p]..",row.names(y$samples))] =
>>> norm_factor_inputs
>>> y$samples$norm.factors[grep(".[r]..",row.names(y$samples))] =
>>> norm_factor_inputs
>>> y$samples
>>> So what I'm basically trying to do is calcNormFactors for just the inputs
>>> and then copy them to be the NormFactors for the p and r samples. Is
>>> copying them like this telling edgeR to re-compute the library sizes? It
>>> doesn't seem to be happening because the lib.size before and after these
>>> commands are the same.
>>> Best,
>>> Scott Daniel
>>> Zarnescu Lab
>>> MCB Ph.D. Program
>>> University of Arizona

The information in this email is confidential and intend...{{dropped:4}}

More information about the Bioconductor mailing list