[BioC] applying coxfilter after LIMMA

James W. MacDonald jmacdon at med.umich.edu
Fri Nov 14 15:15:55 CET 2008

Hi Adrian,

First, please don't take things off the BioC list.

Adrian Johnson wrote:
> Hi Jim,
> I am not sure if Ive said  it correctly that is correct for
> statisticians (i am a biology student). Say I want to find prognostic
> potential of the top 100 genes differentially expressed between normal
> and cancer tissues. I have the survival data, remission status, sex
> and other covariate information.
> In a tutorial by Drs. Gentleman and  Dudoit and others, they used
> bootstrap based MTP and cox-t statistics  from multtest package to
> associate gene expression measure and survival data. (Website :
> www.stat.berkeley.edu/~sandrine/Docs/Talks/MBI04/mbi.html).
> If I am not mistaken, the aim there was to identify differentially
> expressed genes (using either "f" or "t" stististics) on a filtered
> expression matrix derived from RMA on affy study. The filter is a.
> the coefficient of variation is between 0.7 and 10 b. at least 20% of
> the samples have a measured intensity of at least 100  (100 on linear
> scale).
> ( http://www.bioconductor.org/workshops/2006/BioC2006/labs/kdhansen/multtest.html
> at section 1: getting started)

That was the goal of that particular workshop, but they didn't mix the 
two (t-tests and survival analysis).

> This above step seems to be old, instead I wanted to test the
> prognostic potential of top 100 genes filtered using adj.P.value (be
> it BH method) from limma topTable function on eBayes fit object, by
> applying cox-proportional hazard method on 100 genes using ER or
> mutation status and survival data.
> The hypothesis is that the differentially expressed genes between
> cancer and normal samples are prognostic genes. Instead of applying
> cox model on every row of the gene expression matrix, I want to apply
> on the genes that I know are differentially expressed.
> I have no idea how this can be done.

The problem with your methodology IMO, is that a gene may be 
differentially expressed between cancer and normal yet have no 
prognostic ability vis a vis survival.

Two examples:

Normal - c(4.5, 4.1, 4.7,4.5)
Cancer - c(6.8, 7.2, 7.3, 6.6)
Surv.time - c(3, 4.5, 15, 20) ## months

These are likely significantly different, but I doubt there would be any 
significance for the cancer samples in a Cox model.

Normal - c(4.5, 4.1, 4.7,4.5)
Cancer - c(8.3, 6.4, 5.1, 3.4)
Surv.time - c(3, 4.5, 15, 20) ## months

These might be significantly different between cancer and normal 
(probably not), but the Cox model would likely have a very small p-value.

Granted these are probably extreme examples, but the point here is that 
the t-test is probably not the best way to filter samples for a survival 



> Is my question still valid or is it still naive way of connecting two
> totally different things. I appreciate your suggestion and help.
> thank you.
> Adrian
> On Thu, Nov 13, 2008 at 9:01 AM, James W. MacDonald
> <jmacdon at med.umich.edu> wrote:
>> Hi Adrian,
>> Adrian Johnson wrote:
>>> Dear group,
>>> I have two types of samples (cancer and normal) with covariate data
>>> including survival times.
>>> I applied limma (and filtered genes that are significantly
>>> differentially expressed between cancer and normal. Say I have 500
>>> genes after (adj.P.Value using BH) filtering.
>>> Is it meaningful to apply coxfilter on those 500 genes (by supplying
>>> expression values for those 500 genes  and survival times for all
>>> samples) instead of using kOverA flter.
>> What is the hypothesis being tested here? A t-test and a Cox model test for
>> completely different things, so I don't see why you would follow one with
>> the other.
>> Best,
>> Jim
>>> Thanks
>>> Ad.
