[BioC] getting gene ranges from a TranscriptDb (GenomicFeatures package)

Elizabeth Purdom epurdom at stat.berkeley.edu
Tue Sep 14 23:43:08 CEST 2010


  Hello,

I would like to get the range of genes from a TranscriptDb object. 
Ideally, I would like to do:

txdb <- loadFeatures(system.file("extdata", "UCSC_knownGene_sample.sqlite",
                                    package="GenomicFeatures"))
genes(txdb)

And have this return the entire genomic range of the gene (much like 
transcripts() returns the genomic range of the transcript, including the 
introns). However, this does not exist. So I have made a little function 
to do this:

genes<-function(txdb){
     txByGene<-transcriptsBy(txdb,"gene")
     geneStarts<-sapply(start(txByGene),min)
     geneEnds<-sapply(end(txByGene),max)
     geneChr<-sapply(seqnames(txByGene),unique)
     geneStrand<-sapply(strand(txByGene),unique)
   
GRanges(seqnames=geneChr,strand=geneStrand,ranges=IRanges(geneStarts,geneEnds),gene_name=names(txByGene)) 

}

However, this has run me into all kinds of problems because the same 
gene name can be used for transcripts on different chromosomes, namely 
the X and Y chromosome (which makes me wonder about the safety of the 
assumption that gene names are unique -- does this count as unique?).

Before I go further down this route, is there not a simple way to get 
this information?

Thanks,
Elizabeth

-- 

Elizabeth Purdom
Assistant Professor
Department of Statistics
UC, Berkeley

Office: 433 Evans Hall
(510) 642-6154 (office)
(510) 642-7892 (fax)
Mailing: 367 Evans Hall Berkeley, CA 94720-3860
epurdom at stat.berkeley.edu



More information about the Bioconductor mailing list