[BioC] Interspecies differential expression of orthologs with Edger
assaf www
assafwww at gmail.com
Thu Aug 28 17:37:04 CEST 2014
I checked, it's true, the results look the same.
As for FPKM, of course you are right.
Thanks a lot
Assaf
On Thu, Aug 28, 2014 at 2:47 AM, Gordon K Smyth <smyth at wehi.edu.au> wrote:
> The code should have been:
>
> offset <- expandAsMatrix(getOffset(y),dim(y))
> offset <- offset + gl
>
> This should give same result as your code.
>
> rpkm() corrects for gene length as well as library size -- that's the
> whole purpose of RPKMs:
>
> rpkm(y, gene.length=geneLength)
>
> should work for you without any modification.
>
> Gordon
>
>
> On Wed, 27 Aug 2014, assaf www wrote:
>
> This is very helpful for me, thanks.
>>
>> A slight change that I made in the code you sent, to avoid some R erros,
>> is
>>
>> # to replace:
>> offset = offset + gl
>> # with:
>> offset = sweep(gl,2,offset,"+")
>>
>> In addition to differential expression tests, I wanted also to extract
>> FPKMs values (and/or normalized CPM values), that would take into account
>> all components of the offset (which if I'm not mistaken are: library_size
>> +
>> TMM + gene_size).
>> I assume rpkm()/cpm() should correct only for library_size + TMM.
>> Is there a possibly "decent" solution for that ?
>>
>> all the best, and thanks,
>> Assaf
>>
>>
>>
>> On Wed, Aug 27, 2014 at 4:45 AM, Gordon K Smyth <smyth at wehi.edu.au>
>> wrote:
>>
>> It works something like this:
>>>
>>> library(edgeR)
>>> y <- DGEList(counts=counts)
>>> y <- calcNormFactors(y)
>>>
>>> # Column correct log gene lengths
>>> # Columns of gl should add to zero
>>>
>>> gl <- log(geneLength)
>>> gl <- t(t(gl)-colMeans(gl))
>>>
>>> # Combine library sizes, norm factors and gene lengths:
>>>
>>> offset <- expandAsMatrix(getOffset(y))
>>> offset <- offset + gl
>>>
>>> Then
>>>
>>> y$offset <- offset
>>> y <- estimateGLMCommonDisp(y,design)
>>>
>>> etc.
>>>
>>> Note that I have not tried this myself. It should work in principle from
>>> a differential expression point of view.
>>>
>>> On the other hand, there may be side effects regarding dispersion trend
>>> estimation -- I do not have time to explore this.
>>>
>>> Gordon
>>>
>>> ---------------------------------------------
>>> Professor Gordon K Smyth,
>>> Bioinformatics Division,
>>> Walter and Eliza Hall Institute of Medical Research,
>>> 1G Royal Parade, Parkville, Vic 3052, Australia.
>>> http://www.statsci.org/smyth
>>>
>>> On Wed, 27 Aug 2014, assaf www wrote:
>>>
>>> Probably wrong, but the reason I thought of using quantile normalization
>>>
>>>> is
>>>> the need to correct both for the species-length, and library size.
>>>>
>>>>
>>>> On Wed, Aug 27, 2014 at 2:40 AM, Gordon K Smyth <smyth at wehi.edu.au>
>>>> wrote:
>>>>
>>>> That doesn't look helpful to me. I suggested that you incorporate gene
>>>>
>>>>> lengths into the offsets, not do quantile normalization of cpms.
>>>>>
>>>>> Sorry, I just don't have time to develop a code example for you. I
>>>>> hope
>>>>> someone else will help.
>>>>>
>>>>> The whole topic of interspecies differential expression is a can of
>>>>> worms.
>>>>> Even if you adjust for gene length, there will still be differences in
>>>>> gc
>>>>> content and mappability between the species.
>>>>>
>>>>> Gordon
>>>>>
>>>>>
>>>>> On Wed, 27 Aug 2014, assaf www wrote:
>>>>>
>>>>> Dear Gordon thanks,
>>>>>
>>>>>
>>>>>> Suppose I start with the following matrices:
>>>>>>
>>>>>> # 'counts' is the Rsem filtered counts
>>>>>>
>>>>>> counts[1:4,]
>>>>>>
>>>>>>>
>>>>>>> h0 h1 h2 n0 n1 n2
>>>>>>>
>>>>>> ENSRNOG00000000021 36 17 20 10 25 38
>>>>>> ENSRNOG00000000024 1283 615 731 644 807 991
>>>>>> ENSRNOG00000000028 26 12 11 18 23 28
>>>>>> ENSRNOG00000000029 22 13 12 16 17 15
>>>>>>
>>>>>> # 'geneLength' is the species-specific gene lengths, for species 'h'
>>>>>> and
>>>>>> 'n':
>>>>>>
>>>>>> geneLength[1:3,]
>>>>>>
>>>>>>>
>>>>>>> h0.length h1.length h2.length n0.length n1.length
>>>>>>>
>>>>>> n2.length
>>>>>> ENSRNOG00000000021 1200 1200 1200 1303 1303
>>>>>> 1303
>>>>>> ENSRNOG00000000024 1050 1050 1050 3080 3080
>>>>>> 3080
>>>>>> ENSRNOG00000000028 1047 1047 1047 1121 1121
>>>>>> 1121
>>>>>>
>>>>>>
>>>>>> does the following code look correct, and may allow allows across
>>>>>> species
>>>>>> analysis ?:
>>>>>> (technically it works, I checked)
>>>>>>
>>>>>> # quantile normalization: (as in here:
>>>>>> http://davetang.org/wiki/tiki-index.php?page=Analysing+
>>>>>> digital+gene+expression
>>>>>> )
>>>>>>
>>>>>> x1 = counts*1000/geneLength
>>>>>> library(limma)
>>>>>> x2 = normalizeBetweenArrays(data.matrix(x1),method="quantile")
>>>>>> offset = log(counts+0.1)-log(x2+0.1)
>>>>>>
>>>>>> ...
>>>>>>
>>>>>> y <- estimateGLMCommonDisp(y,design,offset=offset)
>>>>>> y <- estimateGLMTrendedDisp(y,design,offset=offset)
>>>>>> y <- estimateGLMTagwiseDisp(y,design,offset=offset)
>>>>>> fit <- glmFit(y,design,offset=offset)
>>>>>>
>>>>>> ...
>>>>>>
>>>>>>
>>>>>> Thanks in advance for any help..,
>>>>>> Assaf
>>>>>>
>>>>>>
>>>>> ____________________________________________________________
>>> __________
>>> The information in this email is confidential and intended solely for the
>>> addressee.
>>> You must not disclose, forward, print or use it without the permission of
>>> the sender.
>>> ______________________________________________________________________
>>>
>>>
>>
> ______________________________________________________________________
> The information in this email is confidential and inte...{{dropped:10}}
More information about the Bioconductor
mailing list