[BioC] get position information for different gene/transcript IDs

Thu Aug 19 03:26:19 CEST 2010

Hi Marc,

Thanks a lot for your information. It is very very helpful. May I ask
one more question?

In my case,  the snp_position is based on Human Build 36.  The package
org.Hs.eg.db was updated in October 2009, so can I assume it is on
Build 36?

Thanks again,
Shirely

On Wed, Aug 18, 2010 at 12:56 PM, Marc Carlson <mcarlson at fhcrc.org> wrote:
> Hi Shirley,
>
> You can find this information in an organism package.  org.Hs.eg.db for
> humans (for example).
>
> What will be harder will be getting all the information from the mixed
> bag of IDs that you describe.  But you can still retrieve it for each ID
> type separately like this example for refseq IDs:
>
> ## So for refseq you can start by getting the entrez gene ID
> library(org.Hs.eg.db)
> a = c("NM_000014", "NM_000015", "foo")
> gene_id = unlist2(mget(a, revmap(org.Hs.egREFSEQ), ifnotfound=NA))
> gene_id = gene_id[!is.na(gene_id)]
>
> ## And then once you have the IDs, you can map them to a position
> chrloc = toTable(org.Hs.egCHRLOC[gene_id])
> chrlocend = toTable(org.Hs.egCHRLOCEND[gene_id])
>
> ## And now because we were careful to make sure that we always have an
> entrez gene ID,
> ## you can just merge all the results together to make this easier to
> look at:
> merge(merge(data.frame(gene_id=gene_id, refseq=names(gene_id)), chrloc),
> chrlocend)
>
>
> ## This process can then be repeated for other kinds of IDs etc:
> b = c("Hs.100217", "Hs.100299")
> gene_id = unlist2(mget(b, revmap(org.Hs.egUNIGENE), ifnotfound=NA))
> gene_id = gene_id[!is.na(gene_id)]
>
> chrloc = toTable(org.Hs.egCHRLOC[gene_id])
> chrlocend = toTable(org.Hs.egCHRLOCEND[gene_id])
> ## etc.
>
> Does that help?
>
>
>  Marc
>
>
>
>
>
> On 08/17/2010 07:26 PM, shirley zhang wrote:
>> Dear list,
>>
>> I have a list of human SNPs and genes/transcripts which are obtained
>> from different studies.  I would like to calculate the distance
>> between these SNPs and genes/transcripts on the same chromosome.
>> However, the ID for the gene/transcript across studies  is not
>> consistent. For example, the ID could be Gene Symbol/Alias (TRIM5l),
>> RNA nucleotide accession (NM_013387), XM_173221 (its gene symbol is
>> LOC254266, but its record is an obsolete version), Contig22780_RC,
>> HSS00001378, Hs.445401 (UniGene entry Hs.445401), etc.
>>
>> I would like to retrieve the start and end position for those
>> gene/transcript. Can anybody help me how to get the position
>> information based on these different gene/transcript IDs. If I also
>> need to match all of other IDs to gene symbol/entrezID, what kind of
>> database or bioConductor package will allow me to do the mapping as
>> perfect as possible? How about biomaRt package?
>>
>> Thanks
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>

-- 
Xiaoling (Shirley) Zhang

M.D., Ph.D. (Bioinformatics)
Boston University, Boston, MA
Tel: (857) 233-9862
Email: zhangxl at bu.edu