[BioC] DESeq Run Unequal Sample Size

Simon Anders anders at embl.de
Wed Jul 23 10:10:57 CEST 2014


Dear Gihanna

On 23/07/14 03:53, Gihanna Galindez wrote:
> Hi Dr. Anders, I would just like to consult about our Illumina run. We have
> two groups of samples from a non-model organism. Group A organisms are
> considerably larger than Group B, which are as small as a dot. As a
> result,  one QIAgen RNeasy extraction from from Group B requires a larger
> number of samples. For each group of samples we have 4 libraries from 4
> corresponding extractions. Thus, we have a total of 8 libraries. All
> libraries from Group A have n=7. On the other hand, all libraries from
> Group B have n=21. Given the unequal sample size from each library, I would
> like to ask if differential expression analysis between Groups A and B will
> still be valid?

This depends a lot on what you mean by "differentailly expressed".

In a somewhat trivial sense, all genes will be expressed much more 
strongly in Group A than in Group B. After all, if a Group-A organism is 
so much larger, it will contain way more transcript molecules than a 
group-B organism for most if not all genes.

You won't see this in RNA-Seq data, though, because the number of reads 
you get out of a library does not depend on the number of mRNA molecules 
that went into the library prep, only on the way the flow cell was seeded.

You are probably not interested in seeing this, either. It won't tell 
you anything you did not know yet.

What you might be interested is which transcripts' abundance, as seen in 
relation to the other genes in the same cell, depends on the group. The 
normalization procedure of DESeq2 aim to chose size factors (i.e., 
scaling factors for normalization) such that most genes or "average" 
genes seem to stay unchanged. hence, you will find genes whose ratio 
between these two groups deviates from the overall trend caused by the 
size difference. If this is what you want, you are fine.

But make sure to have close look at the MA plot.

   Simon



More information about the Bioconductor mailing list