[BioC] information retrieval from pubmed

Julian Gehring julian.gehring at embl.de
Sat May 10 09:18:18 CEST 2014


Hi Nick,

COSMIC is generally a well curated source for your purpose.

Going with Steve's suggestion, you can use the 'COSMIC.67' bioconductor 
package to get the cancer gene census list:

   data(cgc_67, package = "COSMIC.67")

The Cancer Gene Census (CGC) is a list of genes that are causal to 
cancer, currently including ~600 genes.

If you want to go a step further, you could parse the mutation calls 
from a number of large scale cancer sequencing studies, mainly the ICGC 
and TCGA.  You can find the somatic mutation calls of 8 TCGA studies in 
the 'SomaticCancerAlterations' package, and could find the genes 
overlapping the mutations.

Best wishes
Julian



More information about the Bioconductor mailing list