[BioC] biomaRt 3'UTR coordinates

Iain Gallagher iaingallagher at btopenworld.com
Fri Dec 5 10:59:41 CET 2008


Hello list.

I'm using the following script to try and retrieve the 3'UTR start and end coordinates from Ensembl.

rm(list=ls())
library(biomaRt)

#read in probes called present on affy array (CPH in this script)

present <- read.table('cph_present_probes.txt', header=F, sep='\t')
present<-as.character(present[,1])

#present is a set of transcript ids

#get DB connection to retrieve required info

ensmart=useMart("ensembl", dataset="hsapiens_gene_ensembl")

#get 3'utr coords

utr_coords<-getBM(attributes=c('ensembl_gene_id', 'sequence_3utr_start', 'sequence_3utr_end'), filters='ensembl_transcript_id', values=present, mart=ensmart)

Running the script gives the following error.

                                                                             V1
1 Query ERROR: caught BioMart::Exception::Usage: Attribute 3utr_start NOT FOUND
Error in getBM(attributes = c("ensembl_gene_id", "sequence_3utr_start",  : 
  Number of columns in the query result doesn't equal number of attributes in query.  This is probably an internal error, please report.

Presumably some transcripts have more than 1 3'UTR (hence the number of columns difference described above)

Can anyone suggest a solution? Either a way to retrieve the start and end coords of the 3'UTRs or the length of the 3'UTRs (my real objective).

I have a separate script which will download the 3'UTR sequences and then count the characters but the datasets are large and that process seems somewhat laborious if the information is directly available.

Thanks

Iain

> sessionInfo()
R version 2.8.0 (2008-10-20) 
x86_64-pc-linux-gnu 

locale:
LC_CTYPE=en_GB.UTF-8;LC_NUMERIC=C;LC_TIME=en_GB.UTF-8;LC_COLLATE=en_GB.UTF-8;LC_MONETARY=C;LC_MESSAGES=en_GB.UTF-8;LC_PAPER=en_GB.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_GB.UTF-8;LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] biomaRt_1.16.0

loaded via a namespace (and not attached):
[1] RCurl_0.91-0 XML_1.95-3  



More information about the Bioconductor mailing list