[BioC] Identifying Processes as Upregulated or Downregulated

James W. MacDonald jmacdon at uw.edu
Tue Feb 11 22:04:52 CET 2014


Hi Joseph,

The flaw in your reasoning is here:

"Let's assume that the process represented by GO_A is such that it
cannot be simultaneously upregulated and downregulated;"

You aren't measuring a process. You are measuring gene expression. And 
the up-regulation and down-regulation of genes can have inhibitory or 
excitatory effects on a particular process.

In addition, GO terms aren't even necessarily related to a single 
process. Instead, we use them as a stand-in for the underlying pathways 
that we hope to measure (but don't really know much about). If we had 
better pathway information we wouldn't even be bothering with GO terms 
at all.

So you can certainly contrive a situation where you would only want to 
consider up-regulated genes for a particular GO term, but that 
situation is unlikely to hold in general. And when you are doing a 
multiple hypergeometric tests, using all the GO terms in your universe, 
it is not IMO a good idea to make very strong assumptions, especially 
if you don't need to do so.

Best,

Jim



On Tuesday, February 11, 2014 3:36:13 PM, Joseph Shaw wrote:
> Hi Jim,
>
> Thanks for your reply.
>
> My worry, originally, was that that failure to differentiate between
> upregulated and downregulated processes would lead to spurious
> results.
>
> Let's create another scenario. Assume we have a group of genes
> identified as upregulated and another group of genes identified as
> downregulated. Furthermore, assume two subsets: one belonging to the
> upregulated group and one belonging to the downregulated group. Each
> subset is associated with several GO terms including one GO term which
> is common to both subsets - let's call this common term GO_A.
> Now, it may be the case that, individually, when tested against a
> defined gene universe, neither subset yields statistically significant
> results for GO_A, but combining the aforementioned subsets and testing
> against a gene universe does, in fact, yield a statistically
> significant result for GO_1.
> Let's assume that the process represented by GO_A is such that it
> cannot be simultaneously upregulated and downregulated; if this is the
> case, wouldn't it be incorrect to combine the upregulated and
> downregulated gene lists?
>
> Let's return to the example provided in your previous mail.
> My understanding of the GO DAG is far from exhaustive, so it's very
> possible that I'm wrong, but, given that the GO terms become more
> specific as we move towards leaf nodes, would we eventually arrive at
> a terms representative of negative regulation of programmed cell death
> and positive regulation of programmed cell death?
> If this is the case, assuming there was a sufficient amount of genes
> identified as differentially expressed for both enhancer (identified
> as upregulated in our experiment) and preventer (identified as
> downregulated in our experiment) genes so as to yield statistically
> significant results for separate tests. Would it be incorrect to
> conclude that negative regulation of preventers of programmed cell
> death and positive regulation of enhancers of programmed cell death
> have both been shown to be statistically significant significant? It
> seems to me that both these results are compatible.
>
> Joseph
>
> On Tue, Feb 11, 2014 at 2:00 PM, James W. MacDonald <jmacdon at uw.edu> wrote:
>> Hi Joseph,
>>
>> I think you are making a simplifying assumption that isn't helpful. In other
>> words, you are assuming that up-regulation of a set of genes means something
>> different than down-regulation, or a mixture thereof. But this flies in the
>> face of much that we know about biological processes.
>>
>> As an example, say we have a set of genes with 'programmed cell death' as
>> their GO term. And further assume that some of these genes enhance this
>> process, and some prevent the process. Now if most of the enhancers are
>> up-regulated, and most of the 'preventers' are down-regulated, are you
>> prepared to say these genes should be tested separately because the
>> up-regulated genes are involved with a different process than the
>> down-regulated genes?
>>
>> Best,
>>
>> Jim
>>
>>
>>
>>
>> On Monday, February 10, 2014 6:43:52 PM, Joseph Shaw wrote:
>>>
>>> Hi all,
>>>
>>> I am in the process of performing some ontological analysis with
>>> GOstats. Given that GOstats doesn't require any information on
>>> relative increases or decreases in expression for its hypergeometric
>>> testing procedure, am I correct in assuming that it does not
>>> differentiate between upregulated and downregulated genes?
>>>
>>> If this is the case then providing a list of differentially expressed
>>> genes (both upregulated and downregulated) to the testing procedure
>>> will result in ontology results where upregulation and downregulation
>>> may be confounded.
>>> In other words, combining upregulated and downregulated genes and
>>> comparing the resulting list to the gene universe will enable the
>>> testing procedure to identify regulated ontological processes, but it
>>> won't be able to identify whether the processes are upregulated or
>>> downregulated. In fact, given that there is no distinction provided as
>>> input, it may even be both.
>>>
>>> To me, it seems that in order to prevent this from happening two
>>> separate testing procedures should be performed: one comparing
>>> upregulated genes to the gene universe and one comparing downregulated
>>> genes to the gene universe. Is this approach advisable? Is there a
>>> correct protocol which addresses the above issue?
>>>
>>> Joseph
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>>
>> --
>> James W. MacDonald, M.S.
>> Biostatistician
>> University of Washington
>> Environmental and Occupational Health Sciences
>> 4225 Roosevelt Way NE, # 100
>> Seattle WA 98105-6099

--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099



More information about the Bioconductor mailing list