[BioC] biomaRt 3'UTR coordinates

Sat Dec 6 15:11:15 CET 2008

Dear Iain

thank you for providing this feedback! In order to do something about 
it, can you provide us with a reproducible example?

You could do this, for example, by defining the content of your vector 
"present" in the script, rather than reading a file from your file 
system that nobody else can see, or by putting it on a webserver and use 
a file connection to its URL in your call to read.table.

Best wishes
      Wolfgang

----------------------------------------------------
Wolfgang Huber, EMBL-EBI, http://www.ebi.ac.uk/huber

Iain Gallagher ha scritto:
> Hello list.
> 
> I'm using the following script to try and retrieve the 3'UTR start and end coordinates from Ensembl.
> 
> rm(list=ls())
> library(biomaRt)
> 
> #read in probes called present on affy array (CPH in this script)
> 
> present <- read.table('cph_present_probes.txt', header=F, sep='\t')
> present<-as.character(present[,1])
> 
> #present is a set of transcript ids
> 
> #get DB connection to retrieve required info
> 
> ensmart=useMart("ensembl", dataset="hsapiens_gene_ensembl")
> 
> #get 3'utr coords
> 
> utr_coords<-getBM(attributes=c('ensembl_gene_id', 'sequence_3utr_start', 'sequence_3utr_end'), filters='ensembl_transcript_id', values=present, mart=ensmart)
> 
> Running the script gives the following error.
> 
>                                                                              V1
> 1 Query ERROR: caught BioMart::Exception::Usage: Attribute 3utr_start NOT FOUND
> Error in getBM(attributes = c("ensembl_gene_id", "sequence_3utr_start",  : 
>   Number of columns in the query result doesn't equal number of attributes in query.  This is probably an internal error, please report.
> 
> Presumably some transcripts have more than 1 3'UTR (hence the number of columns difference described above)
> 
> Can anyone suggest a solution? Either a way to retrieve the start and end coords of the 3'UTRs or the length of the 3'UTRs (my real objective).
> 
> I have a separate script which will download the 3'UTR sequences and then count the characters but the datasets are large and that process seems somewhat laborious if the information is directly available.
> 
> Thanks
> 
> Iain
> 
>> sessionInfo()
> R version 2.8.0 (2008-10-20) 
> x86_64-pc-linux-gnu 
> 
> locale:
> LC_CTYPE=en_GB.UTF-8;LC_NUMERIC=C;LC_TIME=en_GB.UTF-8;LC_COLLATE=en_GB.UTF-8;LC_MONETARY=C;LC_MESSAGES=en_GB.UTF-8;LC_PAPER=en_GB.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_GB.UTF-8;LC_IDENTIFICATION=C
> 
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base     
> 
> other attached packages:
> [1] biomaRt_1.16.0
> 
> loaded via a namespace (and not attached):
> [1] RCurl_0.91-0 XML_1.95-3  
>