[BioC] Gene enrichment question
alexg at ruggedtextile.com
Wed Aug 15 17:02:16 CEST 2012
On 15.08.2012 14:51, Aliaksei Holik wrote:
> Dear listers,
> Apologies if my question is not strictly related to Bioconductor,
> though one never knows, maybe there's a package that does what I
> I am analysing a list of differentially expressed genes from an
> Illumina microarray. In particular I'm trying to compare the list of
> differentially expressed genes to an existing list of genes
> preferentially expressed in the stem cell population (stem cell
> signature). When I do so, 10% of DE genes belong to the stem cell
> signature. What I'd like to do now is to find out, how likely that
> would happen by chance, i.e. put a p value on it.
> At the moment I know:
> There're 17119 unique genes in my dataset.
> Of them 86 are differentially expressed.
> The stem cell signature contains 510 genes.
> It is combined from several platforms, which makes it hard to
> establish the total number of unique genes, but it's at least 20819
> (the size of the largest platform).
> There are 9 overlapping genes between DE genes and the stem cell
> So I wonder:
> 1) If there's an accepted way to calculate a p value using these
> data. For instance could I run a like of a chi squared test? E.g.
> cell specific genes represent 510/20819=2.4% of total dataset. So
> expected number of the stem cell genes in my DE genes would be
> 86x2.4%=2. So my chi squared test would be based on 9 observed vs 2
For the total number of genes I used your lower estimate to be
conservative. To be completely correct I think you would need to remove
any of the 510 genes that are not in your 17,119 gene dataset. That will
only boost the P value though (as they cannot be called DE if they are
not in your dataset) and it is already 'significant' by most journals
More information about the Bioconductor