[BioC] Interspecies differential expression of orthologs with Edger

Thu Aug 28 17:37:04 CEST 2014

I checked, it's true, the results look the same.
As for FPKM, of course you are right.

Thanks a lot
Assaf

On Thu, Aug 28, 2014 at 2:47 AM, Gordon K Smyth <smyth at wehi.edu.au> wrote:

> The code should have been:
>
>  offset <- expandAsMatrix(getOffset(y),dim(y))
>  offset <- offset + gl
>
> This should give same result as your code.
>
> rpkm() corrects for gene length as well as library size -- that's the
> whole purpose of RPKMs:
>
>   rpkm(y, gene.length=geneLength)
>
> should work for you without any modification.
>
> Gordon
>
>
> On Wed, 27 Aug 2014, assaf www wrote:
>
>  This is very helpful for me, thanks.
>>
>> A slight change that I made in the code you sent, to avoid some R erros,
>> is
>>
>> # to replace:
>> offset = offset + gl
>> # with:
>> offset = sweep(gl,2,offset,"+")
>>
>> In addition to differential expression tests, I wanted also to extract
>> FPKMs values (and/or normalized CPM values), that would take into account
>> all components of the offset (which if I'm not mistaken are: library_size
>> +
>> TMM + gene_size).
>> I assume rpkm()/cpm() should correct only for library_size + TMM.
>> Is there a possibly "decent" solution for that ?
>>
>> all the best, and thanks,
>> Assaf
>>
>>
>>
>> On Wed, Aug 27, 2014 at 4:45 AM, Gordon K Smyth <smyth at wehi.edu.au>
>> wrote:
>>
>>  It works something like this:
>>>
>>>   library(edgeR)
>>>   y <- DGEList(counts=counts)
>>>   y <- calcNormFactors(y)
>>>
>>> # Column correct log gene lengths
>>> # Columns of gl should add to zero
>>>
>>>   gl <- log(geneLength)
>>>   gl <- t(t(gl)-colMeans(gl))
>>>
>>> # Combine library sizes, norm factors and gene lengths:
>>>
>>>   offset <- expandAsMatrix(getOffset(y))
>>>   offset <- offset + gl
>>>
>>> Then
>>>
>>>   y$offset <- offset
>>>   y <- estimateGLMCommonDisp(y,design)
>>>
>>> etc.
>>>
>>> Note that I have not tried this myself.  It should work in principle from
>>> a differential expression point of view.
>>>
>>> On the other hand, there may be side effects regarding dispersion trend
>>> estimation -- I do not have time to explore this.
>>>
>>> Gordon
>>>
>>> ---------------------------------------------
>>> Professor Gordon K Smyth,
>>> Bioinformatics Division,
>>> Walter and Eliza Hall Institute of Medical Research,
>>> 1G Royal Parade, Parkville, Vic 3052, Australia.
>>> http://www.statsci.org/smyth
>>>
>>> On Wed, 27 Aug 2014, assaf www wrote:
>>>
>>>  Probably wrong, but the reason I thought of using quantile normalization
>>>
>>>> is
>>>> the need to correct both for the species-length, and library size.
>>>>
>>>>
>>>> On Wed, Aug 27, 2014 at 2:40 AM, Gordon K Smyth <smyth at wehi.edu.au>
>>>> wrote:
>>>>
>>>>  That doesn't look helpful to me.  I suggested that you incorporate gene
>>>>
>>>>> lengths into the offsets, not do quantile normalization of cpms.
>>>>>
>>>>> Sorry, I just don't have time to develop a code example for you.  I
>>>>> hope
>>>>> someone else will help.
>>>>>
>>>>> The whole topic of interspecies differential expression is a can of
>>>>> worms.
>>>>> Even if you adjust for gene length, there will still be differences in
>>>>> gc
>>>>> content and mappability between the species.
>>>>>
>>>>> Gordon
>>>>>
>>>>>
>>>>> On Wed, 27 Aug 2014, assaf www wrote:
>>>>>
>>>>>  Dear Gordon thanks,
>>>>>
>>>>>
>>>>>> Suppose I start with the following matrices:
>>>>>>
>>>>>> # 'counts' is the Rsem filtered counts
>>>>>>
>>>>>>  counts[1:4,]
>>>>>>
>>>>>>>
>>>>>>>                      h0  h1  h2  n0  n1  n2
>>>>>>>
>>>>>> ENSRNOG00000000021   36  17  20  10  25  38
>>>>>> ENSRNOG00000000024 1283 615 731 644 807 991
>>>>>> ENSRNOG00000000028   26  12  11  18  23  28
>>>>>> ENSRNOG00000000029   22  13  12  16  17  15
>>>>>>
>>>>>> # 'geneLength' is the species-specific gene lengths, for species 'h'
>>>>>> and
>>>>>> 'n':
>>>>>>
>>>>>>  geneLength[1:3,]
>>>>>>
>>>>>>>
>>>>>>>                    h0.length h1.length h2.length n0.length n1.length
>>>>>>>
>>>>>> n2.length
>>>>>> ENSRNOG00000000021      1200      1200      1200      1303      1303
>>>>>> 1303
>>>>>> ENSRNOG00000000024      1050      1050      1050      3080      3080
>>>>>> 3080
>>>>>> ENSRNOG00000000028      1047      1047      1047      1121      1121
>>>>>> 1121
>>>>>>
>>>>>>
>>>>>> does the following code look correct, and may allow allows across
>>>>>> species
>>>>>> analysis ?:
>>>>>> (technically it works, I checked)
>>>>>>
>>>>>> # quantile normalization: (as in here:
>>>>>> http://davetang.org/wiki/tiki-index.php?page=Analysing+
>>>>>> digital+gene+expression
>>>>>> )
>>>>>>
>>>>>> x1 = counts*1000/geneLength
>>>>>> library(limma)
>>>>>> x2 = normalizeBetweenArrays(data.matrix(x1),method="quantile")
>>>>>> offset = log(counts+0.1)-log(x2+0.1)
>>>>>>
>>>>>> ...
>>>>>>
>>>>>> y <- estimateGLMCommonDisp(y,design,offset=offset)
>>>>>> y <- estimateGLMTrendedDisp(y,design,offset=offset)
>>>>>> y <- estimateGLMTagwiseDisp(y,design,offset=offset)
>>>>>> fit <- glmFit(y,design,offset=offset)
>>>>>>
>>>>>> ...
>>>>>>
>>>>>>
>>>>>> Thanks in advance for any help..,
>>>>>> Assaf
>>>>>>
>>>>>>
>>>>>  ____________________________________________________________
>>> __________
>>> The information in this email is confidential and intended solely for the
>>> addressee.
>>> You must not disclose, forward, print or use it without the permission of
>>> the sender.
>>> ______________________________________________________________________
>>>
>>>
>>
> ______________________________________________________________________
> The information in this email is confidential and inte...{{dropped:10}}