[BioC] Interspecies differential expression of orthologs with Edger

assaf www assafwww at gmail.com
Wed Aug 27 17:14:22 CEST 2014


This is very helpful for me, thanks.

A slight change that I made in the code you sent, to avoid some R erros, is

# to replace:
offset = offset + gl
# with:
offset = sweep(gl,2,offset,"+")

In addition to differential expression tests, I wanted also to extract
FPKMs values (and/or normalized CPM values), that would take into account
all components of the offset (which if I'm not mistaken are: library_size +
TMM + gene_size).
I assume rpkm()/cpm() should correct only for library_size + TMM.
Is there a possibly "decent" solution for that ?

all the best, and thanks,
Assaf



On Wed, Aug 27, 2014 at 4:45 AM, Gordon K Smyth <smyth at wehi.edu.au> wrote:

> It works something like this:
>
>   library(edgeR)
>   y <- DGEList(counts=counts)
>   y <- calcNormFactors(y)
>
> # Column correct log gene lengths
> # Columns of gl should add to zero
>
>   gl <- log(geneLength)
>   gl <- t(t(gl)-colMeans(gl))
>
> # Combine library sizes, norm factors and gene lengths:
>
>   offset <- expandAsMatrix(getOffset(y))
>   offset <- offset + gl
>
> Then
>
>   y$offset <- offset
>   y <- estimateGLMCommonDisp(y,design)
>
> etc.
>
> Note that I have not tried this myself.  It should work in principle from
> a differential expression point of view.
>
> On the other hand, there may be side effects regarding dispersion trend
> estimation -- I do not have time to explore this.
>
> Gordon
>
> ---------------------------------------------
> Professor Gordon K Smyth,
> Bioinformatics Division,
> Walter and Eliza Hall Institute of Medical Research,
> 1G Royal Parade, Parkville, Vic 3052, Australia.
> http://www.statsci.org/smyth
>
> On Wed, 27 Aug 2014, assaf www wrote:
>
>  Probably wrong, but the reason I thought of using quantile normalization
>> is
>> the need to correct both for the species-length, and library size.
>>
>>
>> On Wed, Aug 27, 2014 at 2:40 AM, Gordon K Smyth <smyth at wehi.edu.au>
>> wrote:
>>
>>  That doesn't look helpful to me.  I suggested that you incorporate gene
>>> lengths into the offsets, not do quantile normalization of cpms.
>>>
>>> Sorry, I just don't have time to develop a code example for you.  I hope
>>> someone else will help.
>>>
>>> The whole topic of interspecies differential expression is a can of
>>> worms.
>>> Even if you adjust for gene length, there will still be differences in gc
>>> content and mappability between the species.
>>>
>>> Gordon
>>>
>>>
>>> On Wed, 27 Aug 2014, assaf www wrote:
>>>
>>>  Dear Gordon thanks,
>>>
>>>>
>>>> Suppose I start with the following matrices:
>>>>
>>>> # 'counts' is the Rsem filtered counts
>>>>
>>>>  counts[1:4,]
>>>>>
>>>>>                      h0  h1  h2  n0  n1  n2
>>>> ENSRNOG00000000021   36  17  20  10  25  38
>>>> ENSRNOG00000000024 1283 615 731 644 807 991
>>>> ENSRNOG00000000028   26  12  11  18  23  28
>>>> ENSRNOG00000000029   22  13  12  16  17  15
>>>>
>>>> # 'geneLength' is the species-specific gene lengths, for species 'h' and
>>>> 'n':
>>>>
>>>>  geneLength[1:3,]
>>>>>
>>>>>                    h0.length h1.length h2.length n0.length n1.length
>>>> n2.length
>>>> ENSRNOG00000000021      1200      1200      1200      1303      1303
>>>> 1303
>>>> ENSRNOG00000000024      1050      1050      1050      3080      3080
>>>> 3080
>>>> ENSRNOG00000000028      1047      1047      1047      1121      1121
>>>> 1121
>>>>
>>>>
>>>> does the following code look correct, and may allow allows across
>>>> species
>>>> analysis ?:
>>>> (technically it works, I checked)
>>>>
>>>> # quantile normalization: (as in here:
>>>> http://davetang.org/wiki/tiki-index.php?page=Analysing+
>>>> digital+gene+expression
>>>> )
>>>>
>>>> x1 = counts*1000/geneLength
>>>> library(limma)
>>>> x2 = normalizeBetweenArrays(data.matrix(x1),method="quantile")
>>>> offset = log(counts+0.1)-log(x2+0.1)
>>>>
>>>> ...
>>>>
>>>> y <- estimateGLMCommonDisp(y,design,offset=offset)
>>>> y <- estimateGLMTrendedDisp(y,design,offset=offset)
>>>> y <- estimateGLMTagwiseDisp(y,design,offset=offset)
>>>> fit <- glmFit(y,design,offset=offset)
>>>>
>>>> ...
>>>>
>>>>
>>>> Thanks in advance for any help..,
>>>> Assaf
>>>>
>>>
> ______________________________________________________________________
> The information in this email is confidential and inte...{{dropped:10}}



More information about the Bioconductor mailing list