[BioC] Difficulties in using the mgsa package for Gene Set Analysis

Thu Jan 17 11:39:54 CET 2013

Dear Juan,

[...]
> Item annotations:
>          symbol                              name
> 1302934 St8sia5 ST8 alpha-N-acetyl-neuraminide...
> ...
> 1302939   Eef1g eukaryotic translation elongat...
> ... and  29261  other items.
>
> Applying the function mgsa() to my list of differentially expressed genes
> and these gene sets doesn't work, as it looks for matches between the
> 'symbol' category in the gene sets and the genes of interest. However, the
> numbers in the 'symbol' category are RGD IDs (from the Rat Genome
> Database, http://rgd.mcw.edu/), and I haven't been able to find a way to
> either change these to something else (Entrez ID, gene symbol, etc) or
> somehow get the RGD IDs for my genes of interest without looking for them
> manually.
>
> So, in order to apply MGSA to my data, I am hoping to get some help on how
> to do one of these three things:
>
> 1) Modify the MgsaGoSets object so it uses as 'symbol' a more common gene
> ID, such as Entrez ID, instead of RGD ID.

I've peeked into RGD association file. As far as I understood it (I found
no documentation in the README) it provides both RGD and gene symbols. The
readGAF() function reads both information in as you can see in the output.
However, only the primary id is used by mgsa() and the primary id is RGD.
If you can turn your list into a list of gene symbols you could use the
undocumented gaf at itemAnnotations data frame to convert from the one name
space to the other.

> 2) Obtain the RGD IDs of my list of differentially expressed genes from a
> more common gene ID.

I'm unfortunately no expert in this, but maybe you can use BioMart at
Ensemble for this. Unfortunately, this site doesn't work for me currently
so I couldn't try it out.

See http://www.ensembl.org/info/data/biomart.html

Hope this helps.

Bye
Sebastian