[BioC] Interspecies differential expression of orthologs with Edger

Wed Aug 27 18:39:10 CEST 2014

Looking at the code for rpkm, it looks like it should work just fine 
for a matrix or a vector.

On Wed Aug 27 09:30:11 2014, assaf www wrote:
> Sorry I'm not clear:
>   for getting the FPKMs, I guess I only need to:
> rpkm(y, gene.length=geneLength, normalized.lib.sizes=T, log=F)
>
> But gene.length should here be a vector, while in this case its a matrix.
>
>
> Assaf
>
>
> On Wed, Aug 27, 2014 at 6:14 PM, assaf www <assafwww at gmail.com> wrote:
>
>> This is very helpful for me, thanks.
>>
>> A slight change that I made in the code you sent, to avoid some R erros,
>> is
>>
>> # to replace:
>> offset = offset + gl
>> # with:
>> offset = sweep(gl,2,offset,"+")
>>
>> In addition to differential expression tests, I wanted also to extract
>> FPKMs values (and/or normalized CPM values), that would take into account
>> all components of the offset (which if I'm not mistaken are: library_size +
>> TMM + gene_size).
>> I assume rpkm()/cpm() should correct only for library_size + TMM.
>> Is there a possibly "decent" solution for that ?
>>
>> all the best, and thanks,
>> Assaf
>>
>>
>>
>> On Wed, Aug 27, 2014 at 4:45 AM, Gordon K Smyth <smyth at wehi.edu.au> wrote:
>>
>>> It works something like this:
>>>
>>>    library(edgeR)
>>>    y <- DGEList(counts=counts)
>>>    y <- calcNormFactors(y)
>>>
>>> # Column correct log gene lengths
>>> # Columns of gl should add to zero
>>>
>>>    gl <- log(geneLength)
>>>    gl <- t(t(gl)-colMeans(gl))
>>>
>>> # Combine library sizes, norm factors and gene lengths:
>>>
>>>    offset <- expandAsMatrix(getOffset(y))
>>>    offset <- offset + gl
>>>
>>> Then
>>>
>>>    y$offset <- offset
>>>    y <- estimateGLMCommonDisp(y,design)
>>>
>>> etc.
>>>
>>> Note that I have not tried this myself.  It should work in principle from
>>> a differential expression point of view.
>>>
>>> On the other hand, there may be side effects regarding dispersion trend
>>> estimation -- I do not have time to explore this.
>>>
>>> Gordon
>>>
>>> ---------------------------------------------
>>> Professor Gordon K Smyth,
>>> Bioinformatics Division,
>>> Walter and Eliza Hall Institute of Medical Research,
>>> 1G Royal Parade, Parkville, Vic 3052, Australia.
>>> http://www.statsci.org/smyth
>>>
>>> On Wed, 27 Aug 2014, assaf www wrote:
>>>
>>>   Probably wrong, but the reason I thought of using quantile normalization
>>>> is
>>>> the need to correct both for the species-length, and library size.
>>>>
>>>>
>>>> On Wed, Aug 27, 2014 at 2:40 AM, Gordon K Smyth <smyth at wehi.edu.au>
>>>> wrote:
>>>>
>>>>   That doesn't look helpful to me.  I suggested that you incorporate gene
>>>>> lengths into the offsets, not do quantile normalization of cpms.
>>>>>
>>>>> Sorry, I just don't have time to develop a code example for you.  I hope
>>>>> someone else will help.
>>>>>
>>>>> The whole topic of interspecies differential expression is a can of
>>>>> worms.
>>>>> Even if you adjust for gene length, there will still be differences in
>>>>> gc
>>>>> content and mappability between the species.
>>>>>
>>>>> Gordon
>>>>>
>>>>>
>>>>> On Wed, 27 Aug 2014, assaf www wrote:
>>>>>
>>>>>   Dear Gordon thanks,
>>>>>
>>>>>>
>>>>>> Suppose I start with the following matrices:
>>>>>>
>>>>>> # 'counts' is the Rsem filtered counts
>>>>>>
>>>>>>   counts[1:4,]
>>>>>>>
>>>>>>>                       h0  h1  h2  n0  n1  n2
>>>>>> ENSRNOG00000000021   36  17  20  10  25  38
>>>>>> ENSRNOG00000000024 1283 615 731 644 807 991
>>>>>> ENSRNOG00000000028   26  12  11  18  23  28
>>>>>> ENSRNOG00000000029   22  13  12  16  17  15
>>>>>>
>>>>>> # 'geneLength' is the species-specific gene lengths, for species 'h'
>>>>>> and
>>>>>> 'n':
>>>>>>
>>>>>>   geneLength[1:3,]
>>>>>>>
>>>>>>>                     h0.length h1.length h2.length n0.length n1.length
>>>>>> n2.length
>>>>>> ENSRNOG00000000021      1200      1200      1200      1303      1303
>>>>>> 1303
>>>>>> ENSRNOG00000000024      1050      1050      1050      3080      3080
>>>>>> 3080
>>>>>> ENSRNOG00000000028      1047      1047      1047      1121      1121
>>>>>> 1121
>>>>>>
>>>>>>
>>>>>> does the following code look correct, and may allow allows across
>>>>>> species
>>>>>> analysis ?:
>>>>>> (technically it works, I checked)
>>>>>>
>>>>>> # quantile normalization: (as in here:
>>>>>> http://davetang.org/wiki/tiki-index.php?page=Analysing+
>>>>>> digital+gene+expression
>>>>>> )
>>>>>>
>>>>>> x1 = counts*1000/geneLength
>>>>>> library(limma)
>>>>>> x2 = normalizeBetweenArrays(data.matrix(x1),method="quantile")
>>>>>> offset = log(counts+0.1)-log(x2+0.1)
>>>>>>
>>>>>> ...
>>>>>>
>>>>>> y <- estimateGLMCommonDisp(y,design,offset=offset)
>>>>>> y <- estimateGLMTrendedDisp(y,design,offset=offset)
>>>>>> y <- estimateGLMTagwiseDisp(y,design,offset=offset)
>>>>>> fit <- glmFit(y,design,offset=offset)
>>>>>>
>>>>>> ...
>>>>>>
>>>>>>
>>>>>> Thanks in advance for any help..,
>>>>>> Assaf
>>>>>>
>>>>>
>>> ______________________________________________________________________
>>> The information in this email is confidential and intended solely for the
>>> addressee.
>>> You must not disclose, forward, print or use it without the permission of
>>> the sender.
>>> ______________________________________________________________________
>>>
>>
>>
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor