[BioC] Interspecies differential expression of orthologs with Edger

assaf www assafwww at gmail.com
Tue Sep 9 15:00:36 CEST 2014


Thanks Gordon and Sean

OK, I see what you mean now about Roast, sorry for the mess !!!
but your answers are highly informative for me .

I guess that simply aggregating 1,500 olfactory receptors members of the
gene family for example, would just increase the mess.

Let me read and check this, both the Roast approach, and the aggregation.
Assaf


On Tue, Sep 9, 2014 at 7:26 AM, Gordon K Smyth <smyth at wehi.edu.au> wrote:

> Dear Assaf,
>
> As Sean and the edgeR manual have already told you, you can define the
> genomic features any way you like.  You can still do a comparison using
> edgeR.  Why do you keep asking?
>
> However, it is your responsibility (not ours) to make sure that the
> genomic features you have defined make biological sense for your specific
> problem.  The transcripts arising from within each genomic feature need to
> behave reasonably consistently, or else you need to be interested only in
> the aggregate behaviour.
>
> I can see why it might make sense to define genomic regions based on
> ortholog families rather than individual genes.  Whether it makes sense to
> group together large families of genes, I am a bit sceptical about that. A
> roast() test would seem more appropriate for that sort of thing.
>
> You are assuming that reducing the groups will lower FDR.  This does not
> necessarily follow.
>
> Best wishes
> Gordon
>
> ---------------------------------------------
> Professor Gordon K Smyth,
> Bioinformatics Division,
> Walter and Eliza Hall Institute of Medical Research,
> 1G Royal Parade, Parkville, Vic 3052, Australia.
> http://www.statsci.org/smyth
>
> On Mon, 8 Sep 2014, assaf www wrote:
>
>  Hi sean
>>
>> I guess I'm not clear, sorry.
>>
>> I mean that in principle it is possible to aggregate genes based on their
>> membership in gene families (or any other criteria), and to compare the
>> sum
>> of read counts per sample per groups of genes (usually it would be counts
>> per sample per genes). What I would be interested to learn is if such
>> comparison can be done in Edger.
>>
>> About FDR : In the above case, after grouping there are less multiple
>> comparisons, and lower FDR.
>>
>> best
>> Assaf
>>
>> On Mon, Sep 8, 2014 at 12:01 AM, Sean Davis <sdavis2 at mail.nih.gov> wrote:
>>
>>
>>>
>>>
>>> On Sun, Sep 7, 2014 at 3:32 PM, assaf www <assafwww at gmail.com> wrote:
>>>
>>>  Dear Gordon
>>>>
>>>> I am aware of the limitations of the corss-species inference -
>>>> Still , it is critical for me to minimize false positives, before the
>>>> real-time PCR validation stage.
>>>>
>>>> Just trying to understand some other things, that may, or may not, be
>>>> related to the corss-species issue:
>>>> Edger manual says that any kind of "genomic feature" may be used,
>>>> but can "genomic feature" also be defined as 'groups of genes' ?
>>>> I mean, can it be correct to try Edger after summing up the counts of
>>>> genes
>>>> belonging to specific categories
>>>> (e.g. gene families) ? so instead of having 12,000 genes I end up with,
>>>> say
>>>> 2,000 gene groups ?
>>>> this can also be good for the FDR, etc.
>>>>
>>>>
>>> Hi, Assaf.
>>>
>>> edgeR and other related tools will happily use counts from arbitrary
>>> genomic features and have been applied to data such as DNAse-Seq and
>>> ChIP-Seq.  I'm not sure how doing so will "be good for the FDR", but I
>>> may
>>> misunderstand your point.
>>>
>>> Sean
>>>
>>>
>>>
>>>
>>>> Thanks a lot, all the Best,
>>>> Assaf
>>>>
>>>> On Sun, Sep 7, 2014 at 4:11 AM, Gordon K Smyth <smyth at wehi.edu.au>
>>>> wrote:
>>>>
>>>>  Dear Assaf,
>>>>>
>>>>> You are getting the sort of results that I would expect you to get when
>>>>> you try to compare two RNA sources that are very different.
>>>>>
>>>>> The diagonal lines in the MA plot are simply a result of having low
>>>>>
>>>> counts
>>>>
>>>>> (0,1,2 etc) in one species and high counts in the other for the same
>>>>>
>>>> genes.
>>>>
>>>>>
>>>>> When you compare different species, I'd intuitively expect almost every
>>>>> gene to be differentially expressed to some degree.  So I'm not
>>>>>
>>>> surprised
>>>>
>>>>> that a large proportion of genes are assesssed as DE.
>>>>>
>>>>> That's about as much help as I can give you.  I can't give advice that
>>>>> would allow you to get the same sort of results as you might be used
>>>>> to,
>>>>> because comparing different species isn't a normal thing to do.
>>>>>
>>>>> Best wishes
>>>>> Gordon
>>>>>
>>>>>
>>>>>  Date: Fri, 5 Sep 2014 23:22:28 +0300
>>>>>
>>>>>> From: assaf www <assafwww at gmail.com>
>>>>>> To: Gordon K Smyth <smyth at wehi.edu.au>
>>>>>> Cc: Bioconductor mailing list <bioconductor at r-project.org>
>>>>>> Subject: Re: [BioC] Interspecies differential expression of orthologs
>>>>>>         with    Edger
>>>>>>
>>>>>> Thanks Gordon,
>>>>>>
>>>>>> To summarize the results I got on the cross-species data, after
>>>>>>
>>>>> embedding
>>>>
>>>>> the length-effect to the GLM offset matrix, as in the code you sent,
>>>>>> please
>>>>>> see the attached MA plot:
>>>>>>
>>>>>> 1) for >5 and <-5 log fold change, genes' logFC is positively
>>>>>>
>>>>> correlated
>>>>
>>>>> with mean log CPM, something I haven?t seen before in Edger standard
>>>>>>
>>>>> runs.
>>>>
>>>>> 2) most genes with fold change around > 1.3, or < -1.3, are
>>>>>>
>>>>> significant,
>>>>
>>>>> which looks to me too ?liberal?. Please note that each group contains 6
>>>>>> true biological replicates (variance within each group is large) .
>>>>>>
>>>>>> The first problem worries me most, any idea is very welcomed.
>>>>>>
>>>>>> Many thanks,
>>>>>> Assaf
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Sep 3, 2014 at 2:08 AM, Gordon K Smyth <smyth at wehi.edu.au>
>>>>>>
>>>>> wrote:
>>>>
>>>>>
>>>>>>
>>>>>>  On Tue, 2 Sep 2014, assaf www wrote:
>>>>>>>
>>>>>>>  Does Edger DE analysis is built on the assumption that most genes
>>>>>>> are
>>>>>>> not
>>>>>>>
>>>>>>>  differentially expressed, and that only a small portion of them do
>>>>>>>>
>>>>>>> (say
>>>>
>>>>> <20%)  ?
>>>>>>>>
>>>>>>>>
>>>>>>>>  Only the calcNormFactors() step of edgeR makes any assumption of
>>>>>>> this
>>>>>>> sort. calcNormFactors assumes that either that most genes are not DE
>>>>>>>
>>>>>> or
>>>>
>>>>> that the DE is reasonably symmetric.
>>>>>>>
>>>>>>>  I mean, in cross-species studies, or when comparing different
>>>>>>>
>>>>>> tissues of
>>>>
>>>>>
>>>>>>>  the same organism, if this assumption doesn't hold, should it be a
>>>>>>>> serious
>>>>>>>> concern ?
>>>>>>>>
>>>>>>>>
>>>>>>>>  In a cross-species comparison there will be many DE genes, but some
>>>>>>>
>>>>>> will
>>>>
>>>>> be up and some will be down.  The DE will not be all in one
>>>>>>>
>>>>>> direction, I
>>>>
>>>>> would guess that normalization will not be a serious concern.
>>>>>>>
>>>>>>> Of all the concerns with cross-species comparisons, this seems to me
>>>>>>>
>>>>>> to
>>>>
>>>>> be
>>>>>>> far from the most serious.
>>>>>>>
>>>>>>> Best wishes
>>>>>>> Gordon
>>>>>>>
>>>>>>>  -------------- next part --------------
>>>>>>>
>>>>>> A non-text attachment was scrubbed...
>>>>>> Name: crossspecies.png
>>>>>> Type: image/png
>>>>>> Size: 65085 bytes
>>>>>> Desc: not available
>>>>>> URL: <https://stat.ethz.ch/pipermail/bioconductor/
>>>>>> attachments/20140905/c599392b/attachment-0001.png>
>>>>>>
>>>>>> ------------------------------
>>>>>>
>>>>>>
>>>>> ______________________________________________________________________
>>>>> The information in this email is confidential and inte...{{dropped:10}}
>>>>>
>>>>
>>>> _______________________________________________
>>>> Bioconductor mailing list
>>>> Bioconductor at r-project.org
>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>> Search the archives:
>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>
>>>>
>>>
>>>
>>
> ______________________________________________________________________
> The information in this email is confidential and inte...{{dropped:10}}



More information about the Bioconductor mailing list