[BioC] How to get position for each gene ID/gene symbol instead of position for each transcript

Steve Lianoglou mailinglist.honeypot at gmail.com
Wed Aug 25 04:41:47 CEST 2010


Sorry:

> You can do this pretty "simply" with GenomicFeatures, if you want to
> stick with that:
>
> R> txdb <- loadFeatures('your.transcript.db')
> R> xcripts <- transcriptsBy(txdb, by='gene')
>
> ## This part is really slow -- this will be subject of next email
> R> gene.bounds <- seqapply(xcripts, reduce)

Should have used `range` instead of `reduce` here:

R> gene.bounds <- seqapply(xcripts, range)

The rest is the same ...

> the names() of gene.bounds is the entrez.id of the gene. You can use
> the org.Hs.eg.db pacakges
>
> R> library(org.Hs.eg.db)
> R> symbols <- mget(names(gene.bounds), org.Hs.egSYMBOL, ifnotfound=NA)
>
> symbols will now be a list (names are entrez ids, values are the gene
> symbols) that you can manipulate in "the standard R way"
>
> Hope that helps,
>
> -steve
>
> --
> Steve Lianoglou
> Graduate Student: Computational Systems Biology
>  | Memorial Sloan-Kettering Cancer Center
>  | Weill Medical College of Cornell University
> Contact Info: http://cbio.mskcc.org/~lianos/contact
>



-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact



More information about the Bioconductor mailing list