[BioC] from RefSeq to GO terms / gene symbol to geneID

Dear Simon and Sean,

sorry to get back to this issue so late but I have tried out various 
options to try to solve it. I parsed the files you mentioned but did not 
get many hits since many of my proteins does not have a Entrez gene id 
for some reason. In my search I also tried some of the Entrez e-utils 
and could get the accession numbers for my proteins. Can I go from 
accession number to GO term using biomaRt for example?

Thanks again!

Lina Rosenberg

>>> Dear list,
>>> This might be a question that has been discussed previously but I could not
>>> find any good solution for it. I have lists of human proteins from various
>>> proteomics studies that I want to compare with regards to the GO terms
>>> associated to them. I have the RefSeq GI protein id for the proteins and my
>>> questions is how I best map those to other identifiers that I can use in
>>> subsequent GO analysis? 
>>> It might be that this problem is solved best outside R but maybe someone
>>> still can give me a hint to the best solution. For me this is a problem that
>>> comes up quite often - the need to map between different identifiers - and I
>>> have not yet find any really good solution to it. If I for example use IPI I
>>> always loose some proteins/genes since the coverage is rather bad, but maybe
>>> there is no solution that will give perfect mapping?!
> The file located here:
> ftp://ftp.ncbi.nih.gov/gene/DATA/gene2refseq.gz
> and described in detail here:
> ftp://ftp.ncbi.nih.gov/gene/DATA/README
> maps refseq to Entrez Gene ID.  Once you have the Entrez Gene ID, you
> can use the bioconductor annotation packages to get GO mappings.  The
> file above is a tab-delimited text file, so you should be able to read
> it into R and do the matching by GI number rather easily.
> Hope that helps.
> Sean
> Hi, Alex,
> You can parse ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene_info.gz
> There are 4 useful columns: tax_id (column 1), GeneID (column 2), Symbol 
> (column 3), and Synonyms (column 5). You can:
> 1 Read in the file
> 2 filter it based on tax_id
> 3 match your gene symboles to the "Symbol" column and find their Gene ID
> 4 removed the matched gene symboles from your list
> 5 match the rest of gene symboles to the "Synonyms" column and find their Gene 
> ID
> hope this helps
> nianhua
> Nianhua Li
> Software Developer
