[BioC] Bioconductor Digest, Vol 119, Issue 5: Overrepresentation pathway with KEGGPROFILE and SPIA

Sat Jan 5 18:09:55 CET 2013

 >Message: 6
Hi ,
Based on a selection of gene ID , to find the overrepresentation of pathway,?
we could use: 1.?find_enriched_pathway function (KEGG profile) or 2. spia ( SPIA package) where PNDE gives an overrepresentation. These functions works very well. However (Based on a same ?selection of gene ID!),
?I get some differents results.
If I compare the top ten list, I have only one pathwayID. I expected a similar result with a little potential differents!

A. If I look inside the function to compute the PVALUE ?the function seems the same :?

KEGG profile :?
pvalue[x] <- phyper(kegg_result_length[x], keggpathway2gene_length[x],?
? ? ? ? ? ??length(unique(unlist(keggpathway2gene))) - kegg_result_length[x],?
? ? ? ? ? ? length(unique(unlist(kegg_result))), lower.tail = F)

And SPIA:
ph[i] <- phyper(q = noMy - 1, m = pSize[i], n = length(all) -?
? ? ? ? ? ? ? ? pSize[i], k = length(de), lower.tail = FALSE)?
?

HENCE, the compute of pvalues seems the same.

B. The compute of pvalues seems the same ! Not really : the reference of compute the overepresentation .
KEGG profile:?
the reference is based on?keggpathway2gene

And SPIA:
the reference is based on "all" . All is all id ?present on the chips. In my case ( Illumina HT6 v2 , this chips is considered as pangenomic.?

HENCE, the reference muste be the same in this case.

My question.
?In your opinion,
Why this MAJOR difference between these both methods?
Actually, I offer the both results ?but I need to justify the difference.?

If the authors of these methods ( or others) could be given me some explications or explain to me where I'm wrong , I will appreciate that !

Greg Montr?al
<

Hi Greg,
There may be a few reasons why you see those differences between SPIA pNDE ranking and KEGGprofile (which I am not familiar with):
Firstly, the data base of pathways may be different between the two. You can do a blank test with de argument de in spia including all the genes in the "all" argument and something similar with KEGGprofile. You will see then if the same list of pathway having the same number of recognized gene IDs are used by the two packages.
Another reason can be the fact that in spia the pathway size (m = pSize[i]) is given by the list of genes in the pathway that are present on the array (included in all) while this may not be the case with KEGGprofile.

Adi Tarca