[BioC] help with biomaRt bioconductor - Filter upstream_flank NOT FOUND problem

Wolfgang Huber whuber at embl.de
Tue Aug 7 11:08:39 CEST 2012


Dear Steffen / List,
below is a more compact code example that reproduces Tom's problem. I am 
rather confused by the fact that the problem seemed to occur stochastically!

-------------------
library(biomaRt)
options(error=recover)
ensembl = useMart("ensembl")
human = useDataset("hsapiens_gene_ensembl",mart=ensembl)
attr = c('ensembl_gene_id','ensembl_transcript_id',
        'external_gene_id','chromosome_name','strand','transcript_start')
bmres = getBM(attr, 'biotype', values = 'protein_coding', human)

for(id in bmres[,"ensembl_transcript_id"]){
  sequence = getSequence(id=id, type='ensembl_transcript_id',
                        seqType='transcript_flank',upstream = 3000,
                        mart = human)
  sl = with(sequence, nchar(as.character(transcript_flank)))
  cat(id, sl, "\n")
}
-------------------

One running this once, I got
...(lots of lines)
ENST00000520540 3000
ENST00000519310 3000
ENST00000442920 3000
Error in getBM(c(seqType, type), filters = c(type, "upstream_flank"),  :
   Query ERROR: caught BioMart::Exception::Usage: Filter upstream_flank 
NOT FOUND

The next time, the same error already occurred in the very first 
iteration of the for-loop, for id="ENST00000539570". The next time, in 
the third iteration for id="ENST00000510508".

Any idea what is going on here?


Further comments:
- for *Steffen*: The documentation and the code of 'getSequence' do not 
seem to match each other (e.g. the description of argument 'seqType'), 
MySQL mode is mentioned but afaIu is not supported any more -> perhaps 
some maintenance would be nice to users.
- for *Tom*: Making these queries (such as getSequence) within a 
for-loop is bad practice, since it needlessly clogs the network and the 
BioMart webservers. Please use R's vector-capabilities, e.g.

------------------------
sequence = getSequence(id=bmres[,"ensembl_transcript_id"],
   type='ensembl_transcript_id', seqType='transcript_flank',
   upstream = 3000, mart = human)
sl = with(sequence, nchar(as.character(transcript_flank)))
-------------------------

Best wishes
	Wolfgang


Tom Hait scripsit 08/06/2012 12:37 PM:
> Hello,
>
> I'm a student in bioinformatics in Tel Aviv University.
> I'm working with you biomaRt API in order to generate automatically FASTA
> sequences downloading.
> I experienced some problem, here is my code:
>
> #open biomart libaray
> library(biomaRt)
> #open data set of human
> human = useDataset("hsapiens_gene_ensembl",mart=ensembl)
> #select the attributes that we want from the data set
> attr<-c('ensembl_gene_id','ensembl_transcript_id',
> 'external_gene_id','chromosome_name','strand','transcript_start')
> #downloading the map between transcript id and transcript name
> tmpgene<-getBM(attr, 'biotype', values = 'protein_coding', human)
> #save in a TSV format (the file is saved in txt)
> write.table(tmpgene,"Z:/tomhait/organisms/human/transcript_names.txt",
> row.names=FALSE, quote=FALSE)
> #collect all sequences with upstream flank 3000 bases based on the first
> column (ensembl_id) of tmpgene
> i<-1
> for(id1 in tmpgene[,2]){
>   #retrieve sequence
>   sequence<-getSequence(id=id1,
> type='ensembl_transcript_id',seqType='transcript_flank',upstream = 3000,
> mart = human)
>   #check if sequence was retrieved
>   sLengths <- with(sequence, nchar(as.character(transcript_flank)))
>
> #writing to a new file in "Z:/tomhait/organisms/human/mart_export_new.txt"
> #you can change it to "mart_export_new.txt" and it will create a new file
> in R directory
>   if(length(sLengths) > 0){
>    x<-sequence[,1]
>    y<-y<-strsplit(gsub("([[:alnum:]]{60})", "\\1 ", x), " ")[[1]]
>    title<-paste(paste(">",tmpgene[i,1],sep=""),tmpgene[i,2],tmpgene[i,3],tmpgene[i,4],tmpgene[i,5],tmpgene[i,6],
> sep="|")
>    write(title,file="Z:/tomhait/organisms/human/mart_export_new.txt",ncolumns
> = 1, append=TRUE,sep="")
>    write(y,file="Z:/tomhait/organisms/human/mart_export_new.txt",ncolumns =
> 1, append=TRUE,sep="\n")
>    write("\n",file="Z:/tomhait/organisms/human/mart_export_new.txt",ncolumns
> = 1, append=TRUE,sep="\n")
>   }
>   i<-i+1
> }
>
> I got the message:
> Error in getBM(c(seqType, type), filters = c(type, "upstream_flank"),  :
>    Query ERROR: caught BioMart::Exception::Usage: Filter upstream_flank NOT
> FOUND
>
> Could you please help me to solve this problem?
>
> Best Regards,
>
> Tom Hait.
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>


-- 
Best wishes
	Wolfgang

Wolfgang Huber
EMBL
http://www.embl.de/research/units/genome_biology/huber



More information about the Bioconductor mailing list