[BioC] BSgenome or org.Hs.eg.db to find gene length

Marc Carlson mcarlson at fhcrc.org
Thu Oct 11 19:10:35 CEST 2012


Hi Fatemehsadat,

You could consider doing it this way:

library(Homo.sapiens)
cols(Homo.sapiens) ## shows cols you could use
keytypes(Homo.sapiens) ## shows keytypes
k <- keys(Homo.sapiens,keytype="ENTREZID")  ## discovers all available 
keys of this kind
result <- select(Homo.sapiens, k, cols=c("TXNAME","TXSTART","TXEND", 
"TXSTRAND"), keytype="ENTREZID")

Then you could process that result according to your definition of what 
you think constitutes the "gene range".  Do you think it is the max 
range?  The average?  Maybe the max range plus some buffering sequence 
to account for likely transcriptional regulators?  It's your call how 
you want to do that step, but the data frame in result should give you 
the range positions for all the transcripts and their associated gene IDs.

OR, you might also consider doing it this way:

result2 <- transcriptsBy(TxDb.Hsapiens.UCSC.hg19.knownGene, by= "gene")


Which will give you a list like object that is also suitable for use in 
range operations.

Hope this helps,


   Marc


On 10/11/2012 09:42 AM, Fatemehsadat Seyednasrollah wrote:
> Dear list,
>
> As I have read I can find chromosome number (using org.Hs.egCHR) , chromosome location (org.Hs.egCHRLOC) and end position(using org.Hs.egCHRLOCEND) of a list of gene symbols. But I did not find which one mapped the gene length to its symbol. Should I subtract what I get in  org.Hs.egCHRLOCEND from org.Hs.egCHRLOC for each gene symbol to find the gene length or is there an easier way to find it for a long list of gene symbols.
>
> Thank you
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list