[BioC] Interspecies differential expression of orthologs with Edger

Gordon K Smyth smyth at wehi.EDU.AU
Thu Aug 28 01:47:43 CEST 2014


The code should have been:

  offset <- expandAsMatrix(getOffset(y),dim(y))
  offset <- offset + gl

This should give same result as your code.

rpkm() corrects for gene length as well as library size -- that's the 
whole purpose of RPKMs:

   rpkm(y, gene.length=geneLength)

should work for you without any modification.

Gordon


On Wed, 27 Aug 2014, assaf www wrote:

> This is very helpful for me, thanks.
>
> A slight change that I made in the code you sent, to avoid some R erros, is
>
> # to replace:
> offset = offset + gl
> # with:
> offset = sweep(gl,2,offset,"+")
>
> In addition to differential expression tests, I wanted also to extract
> FPKMs values (and/or normalized CPM values), that would take into account
> all components of the offset (which if I'm not mistaken are: library_size +
> TMM + gene_size).
> I assume rpkm()/cpm() should correct only for library_size + TMM.
> Is there a possibly "decent" solution for that ?
>
> all the best, and thanks,
> Assaf
>
>
>
> On Wed, Aug 27, 2014 at 4:45 AM, Gordon K Smyth <smyth at wehi.edu.au> wrote:
>
>> It works something like this:
>>
>>   library(edgeR)
>>   y <- DGEList(counts=counts)
>>   y <- calcNormFactors(y)
>>
>> # Column correct log gene lengths
>> # Columns of gl should add to zero
>>
>>   gl <- log(geneLength)
>>   gl <- t(t(gl)-colMeans(gl))
>>
>> # Combine library sizes, norm factors and gene lengths:
>>
>>   offset <- expandAsMatrix(getOffset(y))
>>   offset <- offset + gl
>>
>> Then
>>
>>   y$offset <- offset
>>   y <- estimateGLMCommonDisp(y,design)
>>
>> etc.
>>
>> Note that I have not tried this myself.  It should work in principle from
>> a differential expression point of view.
>>
>> On the other hand, there may be side effects regarding dispersion trend
>> estimation -- I do not have time to explore this.
>>
>> Gordon
>>
>> ---------------------------------------------
>> Professor Gordon K Smyth,
>> Bioinformatics Division,
>> Walter and Eliza Hall Institute of Medical Research,
>> 1G Royal Parade, Parkville, Vic 3052, Australia.
>> http://www.statsci.org/smyth
>>
>> On Wed, 27 Aug 2014, assaf www wrote:
>>
>>  Probably wrong, but the reason I thought of using quantile normalization
>>> is
>>> the need to correct both for the species-length, and library size.
>>>
>>>
>>> On Wed, Aug 27, 2014 at 2:40 AM, Gordon K Smyth <smyth at wehi.edu.au>
>>> wrote:
>>>
>>>  That doesn't look helpful to me.  I suggested that you incorporate gene
>>>> lengths into the offsets, not do quantile normalization of cpms.
>>>>
>>>> Sorry, I just don't have time to develop a code example for you.  I hope
>>>> someone else will help.
>>>>
>>>> The whole topic of interspecies differential expression is a can of
>>>> worms.
>>>> Even if you adjust for gene length, there will still be differences in gc
>>>> content and mappability between the species.
>>>>
>>>> Gordon
>>>>
>>>>
>>>> On Wed, 27 Aug 2014, assaf www wrote:
>>>>
>>>>  Dear Gordon thanks,
>>>>
>>>>>
>>>>> Suppose I start with the following matrices:
>>>>>
>>>>> # 'counts' is the Rsem filtered counts
>>>>>
>>>>>  counts[1:4,]
>>>>>>
>>>>>>                      h0  h1  h2  n0  n1  n2
>>>>> ENSRNOG00000000021   36  17  20  10  25  38
>>>>> ENSRNOG00000000024 1283 615 731 644 807 991
>>>>> ENSRNOG00000000028   26  12  11  18  23  28
>>>>> ENSRNOG00000000029   22  13  12  16  17  15
>>>>>
>>>>> # 'geneLength' is the species-specific gene lengths, for species 'h' and
>>>>> 'n':
>>>>>
>>>>>  geneLength[1:3,]
>>>>>>
>>>>>>                    h0.length h1.length h2.length n0.length n1.length
>>>>> n2.length
>>>>> ENSRNOG00000000021      1200      1200      1200      1303      1303
>>>>> 1303
>>>>> ENSRNOG00000000024      1050      1050      1050      3080      3080
>>>>> 3080
>>>>> ENSRNOG00000000028      1047      1047      1047      1121      1121
>>>>> 1121
>>>>>
>>>>>
>>>>> does the following code look correct, and may allow allows across
>>>>> species
>>>>> analysis ?:
>>>>> (technically it works, I checked)
>>>>>
>>>>> # quantile normalization: (as in here:
>>>>> http://davetang.org/wiki/tiki-index.php?page=Analysing+
>>>>> digital+gene+expression
>>>>> )
>>>>>
>>>>> x1 = counts*1000/geneLength
>>>>> library(limma)
>>>>> x2 = normalizeBetweenArrays(data.matrix(x1),method="quantile")
>>>>> offset = log(counts+0.1)-log(x2+0.1)
>>>>>
>>>>> ...
>>>>>
>>>>> y <- estimateGLMCommonDisp(y,design,offset=offset)
>>>>> y <- estimateGLMTrendedDisp(y,design,offset=offset)
>>>>> y <- estimateGLMTagwiseDisp(y,design,offset=offset)
>>>>> fit <- glmFit(y,design,offset=offset)
>>>>>
>>>>> ...
>>>>>
>>>>>
>>>>> Thanks in advance for any help..,
>>>>> Assaf
>>>>>
>>>>
>> ______________________________________________________________________
>> The information in this email is confidential and intended solely for the
>> addressee.
>> You must not disclose, forward, print or use it without the permission of
>> the sender.
>> ______________________________________________________________________
>>
>

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}



More information about the Bioconductor mailing list