[BioC] biomaRt 3'UTR coordinates
iaingallagher at btopenworld.com
Fri Dec 5 10:59:41 CET 2008
I'm using the following script to try and retrieve the 3'UTR start and end coordinates from Ensembl.
#read in probes called present on affy array (CPH in this script)
present <- read.table('cph_present_probes.txt', header=F, sep='\t')
#present is a set of transcript ids
#get DB connection to retrieve required info
#get 3'utr coords
utr_coords<-getBM(attributes=c('ensembl_gene_id', 'sequence_3utr_start', 'sequence_3utr_end'), filters='ensembl_transcript_id', values=present, mart=ensmart)
Running the script gives the following error.
1 Query ERROR: caught BioMart::Exception::Usage: Attribute 3utr_start NOT FOUND
Error in getBM(attributes = c("ensembl_gene_id", "sequence_3utr_start", :
Number of columns in the query result doesn't equal number of attributes in query. This is probably an internal error, please report.
Presumably some transcripts have more than 1 3'UTR (hence the number of columns difference described above)
Can anyone suggest a solution? Either a way to retrieve the start and end coords of the 3'UTRs or the length of the 3'UTRs (my real objective).
I have a separate script which will download the 3'UTR sequences and then count the characters but the datasets are large and that process seems somewhat laborious if the information is directly available.
R version 2.8.0 (2008-10-20)
attached base packages:
 stats graphics grDevices utils datasets methods base
other attached packages:
loaded via a namespace (and not attached):
 RCurl_0.91-0 XML_1.95-3
More information about the Bioconductor