[BioC] Gene location (Base pair number)

Hervé Pagès hpages at fhcrc.org
Tue Jul 7 04:09:50 CEST 2009


Hi Tim,

Tim Smith wrote:
> Hi Martin,
> 
> Thanks for that. I tried your code and got:
> 
> --------------------------------------------
>>   egid = revmap(org.Hs.egSYMBOL)[["WNT16"]]  
>>   org.Hs.egCHRLOC[[egid]]
> 
>         7         7 
> 120752656 120756325 
> 
>>   org.Hs.egCHRLOCEND[[egid]]
> 
>         7         7 
> 120768394 120768394
> --------------------------------------------
> 
> However, if I go to NCBI site (http://www.ncbi.nlm.nih.gov/sites/entrez) and search for 'WNT16', I get the following information for WNT16:
> 
> 
> 
> Chromosome: 7;Location: 7q31
> Annotation: Chromosome 7, NC_000007.13 (120965421..120981158)
> 
> 
> Why is there a discrepancy between the values returned from bioconductor (UCSC?) and NCBI? Is there anything I can do that will get me a match with the NCBI location numbers?
> 

This is because they use a different reference assembly:
   - NCBI is now using the Genome Reference Consortium Human Build 37 (GRCh37),
   - UCSC is still using hg18 (at UCSC, GRCh37 is called the hg19 assembly).

Unfortunately it's hard to figure out which assembly is used for the
org.Hs.egCHRLOC or org.Hs.egCHRLOCEND maps. The man page says:

      Mappings were based on data provided by: UCSC Genome
      Bioinformatics (Homo sapiens) (
      ftp://hgdownload.cse.ucsc.edu/goldenPath/currentGenomes/Homo_sapiens
      ) on 2008-Sep3

and if you connect (by anonymous FTP) to hgdownload.cse.ucsc.edu,
you'll be able to see that the Homo_sapiens folder is actually a
symlink to hg18:

   hpages at thinkpad:~$ ftp hgdownload.cse.ucsc.edu
   Connected to hgdownload.cse.ucsc.edu.
   220 FTP Server ready.
   Name (hgdownload.cse.ucsc.edu:hpages): anonymous
   331 Anonymous login ok, send your complete email address as your password
   Password:
   230 User anonymous logged in.
   Remote system type is UNIX.
   Using binary mode to transfer files.
   ftp> cd goldenPath/currentGenomes
   250 CWD command successful
   ftp> ls
   200 PORT command successful
   150 Opening ASCII mode data connection for file list
   dr-xr-xr-x   2 ftp      ftp          4096 May 11 17:18 .
   dr-xr-xr-x 128 ftp      ftp          4096 Jun 17 00:03 ..
   lr--r--r--   1 ftp      ftp            10 Sep  3  2008 Anolis_carolinensis -> 
   ../anoCar1
   lr--r--r--   1 ftp      ftp            10 Sep  3  2008 Anopheles_gambiae -> 
../anoGam1
   lr--r--r--   1 ftp      ftp            10 Sep  3  2008 Apis_mellifera -> 
../apiMel2
   lr--r--r--   1 ftp      ftp            10 Sep  3  2008 Bos_taurus -> ../bosTau4
   lr--r--r--   1 ftp      ftp            10 Sep  3  2008 Branchiostoma_floridae 
-> ../braFlo1
   lr--r--r--   1 ftp      ftp             9 Sep  3  2008 Caenorhabditis_brenneri 
-> ../caePb2
   lr--r--r--   1 ftp      ftp            12 Sep  3  2008 Caenorhabditis_briggsae 
-> ../cbJul2002
   lr--r--r--   1 ftp      ftp             6 Sep  3  2008 Caenorhabditis_elegans 
-> ../ce2
   lr--r--r--   1 ftp      ftp            10 Sep  3  2008 Caenorhabditis_japonica 
-> ../caeJap1
   lr--r--r--   1 ftp      ftp            10 Sep  3  2008 Caenorhabditis_remanei 
-> ../caeRem3
   lr--r--r--   1 ftp      ftp            10 Sep  3  2008 Callithrix_jacchus -> 
../calJac1
   lr--r--r--   1 ftp      ftp            10 Sep  3  2008 Canis_familiaris -> 
../canFam2
   lr--r--r--   1 ftp      ftp            10 Sep  3  2008 Cavia_porcellus -> 
../cavPor3
   lr--r--r--   1 ftp      ftp             6 Sep  3  2008 Ciona_intestinalis -> 
../ci2
   lr--r--r--   1 ftp      ftp            10 Sep  3  2008 Danio_rerio -> ../danRer5
   lr--r--r--   1 ftp      ftp            10 Sep  3  2008 Drosophila_ananassae -> 
../droAna2
   lr--r--r--   1 ftp      ftp            10 Sep  3  2008 Drosophila_erecta -> 
../droEre1
   lr--r--r--   1 ftp      ftp            10 Sep  3  2008 Drosophila_grimshawi -> 
../droGri1
   lr--r--r--   1 ftp      ftp             6 Sep  3  2008 Drosophila_melanogaster 
-> ../dm3
   lr--r--r--   1 ftp      ftp            10 Sep  3  2008 Drosophila_mojavensis 
-> ../droMoj2
   lr--r--r--   1 ftp      ftp            10 Sep  3  2008 Drosophila_persimilis 
-> ../droPer1
   lr--r--r--   1 ftp      ftp             6 Sep  3  2008 
Drosophila_pseudoobscura -> ../dp3
   lr--r--r--   1 ftp      ftp            10 Sep  3  2008 Drosophila_sechellia -> 
../droSec1
   lr--r--r--   1 ftp      ftp            10 Sep  3  2008 Drosophila_simulans -> 
../droSim1
   lr--r--r--   1 ftp      ftp            10 Sep  3  2008 Drosophila_virilis -> 
../droVir2
   lr--r--r--   1 ftp      ftp            10 Sep  3  2008 Drosophila_yakuba -> 
../droYak2
   lr--r--r--   1 ftp      ftp            10 Dec  4  2008 Equus_caballus -> 
../equCab2
   lr--r--r--   1 ftp      ftp            10 Sep  3  2008 Felis_catus -> ../felCat3
   lr--r--r--   1 ftp      ftp             6 Sep  3  2008 Fugu_rubripes -> ../fr2
   lr--r--r--   1 ftp      ftp            10 Sep  3  2008 Gallus_gallus -> ../galGal3
   lr--r--r--   1 ftp      ftp             7 May 11 17:18 Homo_sapiens -> ../hg18
   lr--r--r--   1 ftp      ftp            10 Sep  3  2008 Monodelphis_domestica 
-> ../monDom4
   lr--r--r--   1 ftp      ftp             6 Sep  3  2008 Mus_musculus -> ../mm9
   lr--r--r--   1 ftp      ftp            10 Sep  3  2008 
Ornithorhynchus_anatinus -> ../ornAna1
   lr--r--r--   1 ftp      ftp            10 Nov  7  2008 Oryzias_latipes -> 
../oryLat2
   lr--r--r--   1 ftp      ftp            10 Sep  3  2008 Pan_troglodytes -> 
../panTro2
   lr--r--r--   1 ftp      ftp            10 Sep  3  2008 Petromyzon_marinus -> 
../petMar1
   lr--r--r--   1 ftp      ftp             6 Sep  3  2008 Rattus_norvegicus -> ../rn4
   lr--r--r--   1 ftp      ftp            10 Sep  3  2008 Rhesus_macaque -> 
../rheMac2
   lr--r--r--   1 ftp      ftp            12 Sep  3  2008 SARS_coronavirus -> 
../scApr2003
   lr--r--r--   1 ftp      ftp            10 Sep  3  2008 
Saccharomyces_cereviciae -> ../sacCer1
   lr--r--r--   1 ftp      ftp            10 Sep  3  2008 
Saccharomyces_cerevisiae -> ../sacCer1
   lr--r--r--   1 ftp      ftp            10 Sep  3  2008 
Strongylocentrotus_purpuratus -> ../strPur2
   lr--r--r--   1 ftp      ftp            10 Sep  3  2008 Taeniopygia_guttata -> 
../taeGut1
   lr--r--r--   1 ftp      ftp             6 Sep  3  2008 Takifugu_rubripes -> ../fr2
   lr--r--r--   1 ftp      ftp            10 Sep  3  2008 Tetraodon_nigroviridis 
-> ../tetNig1
   lr--r--r--   1 ftp      ftp            10 Sep  3  2008 Xenopus_tropicalis -> 
../xenTro1
   226 Transfer complete

The problem is that this symlink could be changed at any time so
the information provided in the org.Hs.egCHRLOC man page will become
meaningless sooner or later...

Cheers,
H.


> 
> thanks!
> 
> 
> 
>  
> 
> 
> 
> 
> 
> Hi Tim --
> 
> One suggestion is to use the org.Hs.eg.db package. The 'eg' means that
> the information is keyed off Entrez ids, so you need to map your SYMBOL
> to EG
> 
>   egid = revmap(org.Hs.egSYMBOL)[["WNT16"]]
> 
> and then retrieve location information
> 
>   org.Hs.egCHRLOC[[egid]]
>   org.Hs.egCHRLOCEND[[egid]]
> 
> for many symbols, symids, one might
> 
>   egids = mappedLkeys(revmap(org.Hs.egSYMBOL)[symids])
>   as.list(org.Hs.egCHRLOC[egids])
> 
> etc. Some book-keeping might be needed to ensure correct symid -> egid
> -> CHRLOC mapping
> 
> Martin
> 
> Tim Smith wrote:
> 
>> Hi,
>>
>> I wanted the exact base pair locations for several genes (e.g. wnt16 in the human wnt pathway). Which bioconductor package should I use?
>>
>> thanks!
>>
>>
>>
> 
> 
>       
> 	[[alternative HTML version deleted]]
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioconductor mailing list