[BioC] Interspecies differential expression of orthologs with Edger

Gordon K Smyth smyth at wehi.EDU.AU
Wed Aug 27 03:45:47 CEST 2014


It works something like this:

   library(edgeR)
   y <- DGEList(counts=counts)
   y <- calcNormFactors(y)

# Column correct log gene lengths
# Columns of gl should add to zero

   gl <- log(geneLength)
   gl <- t(t(gl)-colMeans(gl))

# Combine library sizes, norm factors and gene lengths:

   offset <- expandAsMatrix(getOffset(y))
   offset <- offset + gl

Then

   y$offset <- offset
   y <- estimateGLMCommonDisp(y,design)

etc.

Note that I have not tried this myself.  It should work in principle from 
a differential expression point of view.

On the other hand, there may be side effects regarding dispersion trend 
estimation -- I do not have time to explore this.

Gordon

---------------------------------------------
Professor Gordon K Smyth,
Bioinformatics Division,
Walter and Eliza Hall Institute of Medical Research,
1G Royal Parade, Parkville, Vic 3052, Australia.
http://www.statsci.org/smyth

On Wed, 27 Aug 2014, assaf www wrote:

> Probably wrong, but the reason I thought of using quantile normalization is
> the need to correct both for the species-length, and library size.
>
>
> On Wed, Aug 27, 2014 at 2:40 AM, Gordon K Smyth <smyth at wehi.edu.au> wrote:
>
>> That doesn't look helpful to me.  I suggested that you incorporate gene
>> lengths into the offsets, not do quantile normalization of cpms.
>>
>> Sorry, I just don't have time to develop a code example for you.  I hope
>> someone else will help.
>>
>> The whole topic of interspecies differential expression is a can of worms.
>> Even if you adjust for gene length, there will still be differences in gc
>> content and mappability between the species.
>>
>> Gordon
>>
>>
>> On Wed, 27 Aug 2014, assaf www wrote:
>>
>>  Dear Gordon thanks,
>>>
>>> Suppose I start with the following matrices:
>>>
>>> # 'counts' is the Rsem filtered counts
>>>
>>>> counts[1:4,]
>>>>
>>>                     h0  h1  h2  n0  n1  n2
>>> ENSRNOG00000000021   36  17  20  10  25  38
>>> ENSRNOG00000000024 1283 615 731 644 807 991
>>> ENSRNOG00000000028   26  12  11  18  23  28
>>> ENSRNOG00000000029   22  13  12  16  17  15
>>>
>>> # 'geneLength' is the species-specific gene lengths, for species 'h' and
>>> 'n':
>>>
>>>> geneLength[1:3,]
>>>>
>>>                   h0.length h1.length h2.length n0.length n1.length
>>> n2.length
>>> ENSRNOG00000000021      1200      1200      1200      1303      1303
>>> 1303
>>> ENSRNOG00000000024      1050      1050      1050      3080      3080
>>> 3080
>>> ENSRNOG00000000028      1047      1047      1047      1121      1121
>>> 1121
>>>
>>>
>>> does the following code look correct, and may allow allows across species
>>> analysis ?:
>>> (technically it works, I checked)
>>>
>>> # quantile normalization: (as in here:
>>> http://davetang.org/wiki/tiki-index.php?page=Analysing+
>>> digital+gene+expression
>>> )
>>>
>>> x1 = counts*1000/geneLength
>>> library(limma)
>>> x2 = normalizeBetweenArrays(data.matrix(x1),method="quantile")
>>> offset = log(counts+0.1)-log(x2+0.1)
>>>
>>> ...
>>>
>>> y <- estimateGLMCommonDisp(y,design,offset=offset)
>>> y <- estimateGLMTrendedDisp(y,design,offset=offset)
>>> y <- estimateGLMTagwiseDisp(y,design,offset=offset)
>>> fit <- glmFit(y,design,offset=offset)
>>>
>>> ...
>>>
>>>
>>> Thanks in advance for any help..,
>>> Assaf

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}



More information about the Bioconductor mailing list