[BioC] Gene location (Base pair number)

Martin Morgan mtmorgan at fhcrc.org
Sat Jul 4 23:07:37 CEST 2009


Hi Tim --

Tim Smith wrote:
> Hi Martin,
> 
> Thanks for that. I tried your code and got:
> 
> --------------------------------------------
>>   egid = revmap(org.Hs.egSYMBOL)[["WNT16"]]  
>>   org.Hs.egCHRLOC[[egid]]
> 
>         7         7 
> 120752656 120756325 
> 
>>   org.Hs.egCHRLOCEND[[egid]]
> 
>         7         7 
> 120768394 120768394
> --------------------------------------------
> 
> However, if I go to NCBI site (http://www.ncbi.nlm.nih.gov/sites/entrez) and search for 'WNT16', I get the following information for WNT16:
> 
> 
> 
> Chromosome: 7;Location: 7q31
> Annotation: Chromosome 7, NC_000007.13 (120965421..120981158)
> 
> 
> Why is there a discrepancy between the values returned from bioconductor (UCSC?) and NCBI? Is there anything I can do that will get me a match with the NCBI location numbers?

In general the answer is that the Bioconductor annotation packages are a
snap-shot of particular data resources, whereas web-based retrievals
capture data current when you access it. A corollary is that the only
way to know what data is available today from NCBI is to visit the NCBI
site (today, and not tomorrow or yesterday).

The details of when snap shots are taken can be found on the help pages,
e.g.,

  ?org.Hs.egCHRLOC

or interactively, e.g.,

  org.Hs.eg.db_dbInfo()

The biomaRt package is also useful to explore, in terms of retrieving
web-based annotations.

Martin

> 
> 
> thanks!
> 
> 
> 
>  
> 
> 
> 
> 
> 
> Hi Tim --
> 
> One suggestion is to use the org.Hs.eg.db package. The 'eg' means that
> the information is keyed off Entrez ids, so you need to map your SYMBOL
> to EG
> 
>   egid = revmap(org.Hs.egSYMBOL)[["WNT16"]]
> 
> and then retrieve location information
> 
>   org.Hs.egCHRLOC[[egid]]
>   org.Hs.egCHRLOCEND[[egid]]
> 
> for many symbols, symids, one might
> 
>   egids = mappedLkeys(revmap(org.Hs.egSYMBOL)[symids])
>   as.list(org.Hs.egCHRLOC[egids])
> 
> etc. Some book-keeping might be needed to ensure correct symid -> egid
> -> CHRLOC mapping
> 
> Martin
> 
> Tim Smith wrote:
> 
>> Hi,
>>
>> I wanted the exact base pair locations for several genes (e.g. wnt16 in the human wnt pathway). Which bioconductor package should I use?
>>
>> thanks!
>>
>>
>>
> 
> 
>       
> 	[[alternative HTML version deleted]]
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list