[BioC] DESeq and paired samples - pairing vs pooling

Wed May 25 03:09:19 CEST 2011

Dear Tim,

> From: Timothy Hughes <timothy.hughes at medisin.uio.no>
> To: bioconductor at r-project.org
> Subject: [BioC] DESeq and paired samples - pairing vs pooling
>
> We are performing a study of 8 individuals with cancer. We have 8 pairs 
> of samples. Each pair consists of two samples from the same individual: 
> one from cancerous tissue and one from normal tissue.
>
> We have used DESeq to perform a pooled comparison between the normal and 
> cancerous samples and find a number of genes that are differentially 
> expressed.
>
> We would also like to perform a paired analysis (simple comparison 
> between the two tissue samples from the same individual). Our logic is 
> that the pooled analysis will tend to identify genes as differentially 
> expressed only if they are fairly consistently up or down-regulated 
> across individuals.

Not quite sure what you mean by a pooled analysis in this context.  I 
think you mean treating the cancer and normal tissue samples as 
independent groups.  Basically you should perform a paired analysis here, 
because your data is naturally paired, and otherwise you will be ignoring 
the baseline differences between individuals.  The DE genes you have found 
are probably not wrong, but you have probably missed many others.

> But, the etiology of the same cancer type may be heterogeneous and we 
> aim to investigate this by performing the paired analysis.

Unfortunately, a paired analysis doesn't give you a way to handle 
heterogeneity of cancers.  A paired analysis will still look for 
differential expression that is consistent across the patients.  It looks 
for genes that have more or less consistent relative changes between 
normal and cancer for each patient.  It will find genes that are common to 
the majority of the cancers.

> In connection with this,
> I have two questions:
> 1. we read in the DESeq paper that this can be done, but are we correct in
> believing that we can interpret the results as I describe above?

I wonder where you have read this?  I don't think the DESeq authors claim 
it handles paired tests.

See above for comments on interpretation.

> 2. Does it make sense to do a paired analysis as described above or would it
> make more sense to pool the normal tissues and then compare each cancerous
> tissue to the pool?

If you want to find genes that are specific to one cancer, and not to the 
other, nor to the normal tissues, then comparing each individual cancer to 
the group of normals is probably your best route, at least the simplest 
one.  You could do a standard two-group analysis with n=1 in one of the 
groups.  This does ignore the pairing of the cancer tissue to one of the 
normals but, with 8 individuals, the penalty probably isn't too high.

I can think of ways to a more careful analysis, but they'd be harder to 
explain in a publication.  Using the edgeR package, you could (i) fit a 
paired samples model, in order to extract the biological coefficient of 
variation (BCV) from all the individuals, then (ii) compare each 
individual cancer to its own paired normal tissue, using the BCV 
previously estimated from all the patients.

Of course, plotting the data to see how different the cancer samples seem 
to be should be the first step.  I personally use plotMDS.dge() in the 
edgeR package for this purpose.

Best wishes
Gordon

> Thanks for your help.
>
> Tim.
>
> -- 
> Tim Hughes PhD (http://digitised.info)
> Medical Genetics Department
> Oslo University Hospital (Ullevål)
> Kirkeveien 166
> 0407 Oslo
> Norway
>
> Tel:  (+47) 23 02 72 55