[BioC] biomaRt querry error

James W. MacDonald jmacdon at med.umich.edu
Mon Dec 19 15:57:59 CET 2011


Hi Assa,

On 12/19/2011 8:16 AM, Assa Yeroslaviz wrote:
> Hi,
>
> I have a problem running getSequence in biomaRt.
>
> I've found a thread two years old, but there was no solution mentioned
> there, so I am not sure whether it was fixed or not.
>
> The behavior is though exactly as Julie showed in her example. When I ran
> the command for several files in a for loop, I get this error for some of
> the queries but not for all.

You should never query an online database using a for loop. First, this 
can get your IP banned, as it is considered abusive. Second, there are 
any number of glitches that can occur in an online data transaction. 
Doing bunches of them raises that possibility.

A preferable method is to do a single query for all the data you need. 
You can then manipulate the results into sets if need be.

Best,

Jim


>
> this is what I do:
> library(biomaRt)
> ensembl<-useMart("ensembl", dataset="dmelanogaster_gene_ensembl")
> ...
> for (j in 1:length(dirs)){
>   ...
>      files<-list.files()
>      for (i in 1:length(files)) {
>          ...
>                      ids<-read.table(files[i])[,1]
>                      cat("in the list of genes we have here", ids[1],"genes
> \n")
>                      result<-getSequence(id=as.character(ids),
> type="flybase_gene_id", seqType="coding_gene_flank", upstream="1000",
> mart=ensembl)
>                     ...
>                }
>        }
>
> ids is a vector of flybase IDs (FBgn0030073 FBgn0038346 FBgn0025366
> FBgn0011770 FBgn0029828 FBgn0031701 ...)
>
> This is how the output looks like:
> [1] "working directory is: /home/ayeroslaviz/myProject/4_genelists/geneids"
> unpaired_downregulated_0.1_nonRed.txt
> in the list of genes we have here 48 genes
> saving to ../martSequences/unpaired_downregulated_0.1_nonRed.txt
> unpaired_downregulated_0.1.txt
> in the list of genes we have here 48 genes
> saving to  ../martSequences/unpaired_downregulated_0.1.txt
> unpaired_listfile_decreasing.tsv
> unpaired_listfile_decreasing.txt
> unpaired_listfile_increasing.tsv
> unpaired_listfile_increasing.txt
> unpaired_upregulated_0.1_nonRed.txt
> in the list of genes we have here 28 genes
> Error in getBM(c(seqType, type), filters = c(type, "upstream_flank"),  :
>    Query ERROR: caught BioMart::Exception::Usage: Filter upstream_flank NOT
> FOUND
>
> the output lists the files created during the loop for the different input
> files.
> I ran the complete loop again and it shows no error messages. The third
> time I ran it it gave the same error message.
>
> The problem happens mainly, when I am running this lines as a script from
> the shell.
> home at vm1-pipeline:~/myProject$ ./run.sh config.txt
> the config file is at: ...
> my working directory is at: ...
>
> Run full analysis? (y)es, (n)o:
> n
>
> Select STEPS to run:
> ...
>
> 8
> 8
> Loading required package: Biobase
>
> Welcome to Bioconductor
>
>    Vignettes contain introductory material. To view, type
>    'browseVignettes()'. To cite Bioconductor, see
>    'citation("Biobase")' and for packages 'citation("pkgname")'.
>
> Error in getBM(c(seqType, type), filters = c(type, "upstream_flank"),  :
>    Query ERROR: caught BioMart::Exception::Usage: Filter upstream_flank NOT
> FOUND
> Calls: getSequence ->  getBM
> Execution halted
>
> I would appreciate ant ideas to solve this problem.
>
> Thanks
> Assa
>
>
>> date()
> [1] "Mon Dec 19 14:12:10 2011"
>> R.version
>                 _
> platform       x86_64-pc-linux-gnu
> arch           x86_64
> os             linux-gnu
> system         x86_64, linux-gnu
> status
> major          2
> minor          14.0
> year           2011
> month          10
> day            31
> svn rev        57496
> language       R
> version.string R version 2.14.0 (2011-10-31)
>> sessionInfo()
> R version 2.14.0 (2011-10-31)
> Platform: x86_64-pc-linux-gnu (64-bit)
>
> locale:
>   [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C
>   [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8
>   [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8
>   [7] LC_PAPER=C                 LC_NAME=C
>   [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] biomaRt_2.10.0 stringr_0.5    affy_1.32.0    Biobase_2.14.0
>
> loaded via a namespace (and not attached):
> [1] affyio_1.22.0         BiocInstaller_1.2.1   plyr_1.6
> [4] preprocessCore_1.16.0 RCurl_1.7-0           tools_2.14.0
> [7] XML_3.6-0             zlibbioc_1.0.0
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
James W. MacDonald, M.S.
Biostatistician
Douglas Lab
University of Michigan
Department of Human Genetics
5912 Buhl
1241 E. Catherine St.
Ann Arbor MI 48109-5618
734-615-7826

**********************************************************
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues 



More information about the Bioconductor mailing list