[BioC] biomaRt 3'UTR coordinates

Iain Gallagher iaingallagher at btopenworld.com
Fri Dec 5 10:59:41 CET 2008

Hello list.

I'm using the following script to try and retrieve the 3'UTR start and end coordinates from Ensembl.


#read in probes called present on affy array (CPH in this script)

present <- read.table('cph_present_probes.txt', header=F, sep='\t')

#present is a set of transcript ids

#get DB connection to retrieve required info

ensmart=useMart("ensembl", dataset="hsapiens_gene_ensembl")

#get 3'utr coords

utr_coords<-getBM(attributes=c('ensembl_gene_id', 'sequence_3utr_start', 'sequence_3utr_end'), filters='ensembl_transcript_id', values=present, mart=ensmart)

Running the script gives the following error.

1 Query ERROR: caught BioMart::Exception::Usage: Attribute 3utr_start NOT FOUND
Error in getBM(attributes = c("ensembl_gene_id", "sequence_3utr_start",  : 
  Number of columns in the query result doesn't equal number of attributes in query.  This is probably an internal error, please report.

Presumably some transcripts have more than 1 3'UTR (hence the number of columns difference described above)

Can anyone suggest a solution? Either a way to retrieve the start and end coords of the 3'UTRs or the length of the 3'UTRs (my real objective).

I have a separate script which will download the 3'UTR sequences and then count the characters but the datasets are large and that process seems somewhat laborious if the information is directly available.



> sessionInfo()
R version 2.8.0 (2008-10-20) 


attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] biomaRt_1.16.0

loaded via a namespace (and not attached):
[1] RCurl_0.91-0 XML_1.95-3  

More information about the Bioconductor mailing list