[BioC] GEOquery

Cei Abreu-Goodger cei at sanger.ac.uk
Fri Jul 11 17:57:40 CEST 2008

Ok, I just realized that the options can be passed quite easily:

But now, we return to the original issue, how do I use this parameter to 
get geoGEO working, since it doesn't pass on extra parameters.

Let me re-state:


Times out when no ftp_proxy is set (which could be solved if I was able 
to disable the ftp.use.epsv option of RCurl):

Error in curlPerform(curl = curl, .opts = opts, .encoding = .encoding) :
  couldn't connect to host

or if I use our proxy server, it gets trapped in HTML garbage:

Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, 
na.strings,  :
  line 1 did not have 8 elements

Which apparently cannot be worked around, I've already asked our IT 
department to see if they could change the proxy server settings.

Any suggestions?


 > sessionInfo()
R version 2.7.0 (2008-04-22)


attached base packages:
[1] stats     graphics  grDevices datasets  tools     utils     methods 
[8] base    

other attached packages:
[1] biomaRt_1.14.0 GEOquery_2.4.0 RCurl_0.9-3    Biobase_2.0.1

loaded via a namespace (and not attached):
[1] XML_1.95-2

Cei Abreu-Goodger wrote:
> Hi Sean,
> I'm trying to help Harpreet to get the GEOquery library working 
> properly over here. Thanks to what you pointed out, we are able to 
> track the problem down to curl using our http proxy, which for ftp 
> transfers is not required. We still have one problem, that I can't 
> figure how to turn off the "ftp.use.epsv" option in RCurl. So, on a 
> linux terminal, I can use:
> curl --disable-epsv 
> "ftp://ftp.ncbi.nih.gov/pub/geo/DATA/SeriesMatrix/GSE4201/"
> -r--r--r--   1 ftp      anonymous   930471 Apr 13 05:32 
> GSE4201_series_matrix.txt.gz
> (without the --disable-epsv it times out unless I set the ftp_proxy, 
> but then I get the HTML index instead of the file listing)
> inside R, I imagine I have to turn the "ftp.use.epsv" option off, and 
> I've tried doing something like this:
> myCurl <- getCurlOptionsConstants()
> myCurl[["ftp.use.epsv"]] <- 0
> getURL("ftp://ftp.ncbi.nih.gov/pub/geo/DATA/SeriesMatrix/GSE4201/", 
> .opts=list(myCurl))
> but it keeps timing out...
> I also tried:
> curlSetOpt("ftp.use.epsv"=0)
> but that doesn't seem to have any effect on what 
> getCurlOptionsConstants() returns, it just creates a CURLOptions 
> object, which I can't figure out how to use.
> Do you have any suggestions, or should I search for help directly with 
> the RCurl developers?
> Many thanks,
> Cei
>> So, this appears to be the problem.  It looks like your proxy is
>> intercepting the ftp directory listing and converting it to HTML.  I
>> do not know how to solve this problem, as it appears to be a proxy
>> configuration issue at your institution.  However, I can't say for
>> sure.  The output of the getURL() command should look like:
>>> getURL("ftp://ftp.ncbi.nih.gov/pub/geo/DATA/SeriesMatrix/GSE4201/")
>> [1] "-r--r--r--   1 ftp      anonymous   930471 Apr 13 05:32
>> GSE4201_series_matrix.txt.gz\n"
>> Notice how yours is much longer and is HTML, not plain text.
>> Sean

Cei Abreu-Goodger, PhD

Wellcome Trust Sanger Institute
Computational and Functional Genomics
Wellcome Trust Genome Campus
Hinxton, Cambridge, CB10 1SA, UK

 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE.

More information about the Bioconductor mailing list