[BioC] GEOquery

Cei Abreu-Goodger cei at sanger.ac.uk
Fri Jul 11 17:57:40 CEST 2008


Ok, I just realized that the options can be passed quite easily:
getURL("ftp://ftp.ncbi.nih.gov/pub/geo/DATA/SeriesMatrix/GSE4201/", 
"ftp.use.epsv"=0)

But now, we return to the original issue, how do I use this parameter to 
get geoGEO working, since it doesn't pass on extra parameters.

Let me re-state:

library(GEOquery)
g<-getGEO("GSE4201",GSEMatrix=TRUE)

Times out when no ftp_proxy is set (which could be solved if I was able 
to disable the ftp.use.epsv option of RCurl):

Error in curlPerform(curl = curl, .opts = opts, .encoding = .encoding) :
  couldn't connect to host


or if I use our proxy server, it gets trapped in HTML garbage:

Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, 
na.strings,  :
  line 1 did not have 8 elements


Which apparently cannot be worked around, I've already asked our IT 
department to see if they could change the proxy server settings.

Any suggestions?

Cei

 > sessionInfo()
R version 2.7.0 (2008-04-22)
x86_64-unknown-linux-gnu

locale:
C

attached base packages:
[1] stats     graphics  grDevices datasets  tools     utils     methods 
[8] base    

other attached packages:
[1] biomaRt_1.14.0 GEOquery_2.4.0 RCurl_0.9-3    Biobase_2.0.1

loaded via a namespace (and not attached):
[1] XML_1.95-2





Cei Abreu-Goodger wrote:
> Hi Sean,
>
> I'm trying to help Harpreet to get the GEOquery library working 
> properly over here. Thanks to what you pointed out, we are able to 
> track the problem down to curl using our http proxy, which for ftp 
> transfers is not required. We still have one problem, that I can't 
> figure how to turn off the "ftp.use.epsv" option in RCurl. So, on a 
> linux terminal, I can use:
>
> curl --disable-epsv 
> "ftp://ftp.ncbi.nih.gov/pub/geo/DATA/SeriesMatrix/GSE4201/"
> -r--r--r--   1 ftp      anonymous   930471 Apr 13 05:32 
> GSE4201_series_matrix.txt.gz
>
> (without the --disable-epsv it times out unless I set the ftp_proxy, 
> but then I get the HTML index instead of the file listing)
>
> inside R, I imagine I have to turn the "ftp.use.epsv" option off, and 
> I've tried doing something like this:
>
> myCurl <- getCurlOptionsConstants()
> myCurl[["ftp.use.epsv"]] <- 0
> getURL("ftp://ftp.ncbi.nih.gov/pub/geo/DATA/SeriesMatrix/GSE4201/", 
> .opts=list(myCurl))
>
> but it keeps timing out...
>
> I also tried:
>
> curlSetOpt("ftp.use.epsv"=0)
>
> but that doesn't seem to have any effect on what 
> getCurlOptionsConstants() returns, it just creates a CURLOptions 
> object, which I can't figure out how to use.
>
> Do you have any suggestions, or should I search for help directly with 
> the RCurl developers?
>
> Many thanks,
>
> Cei
>> So, this appears to be the problem.  It looks like your proxy is
>> intercepting the ftp directory listing and converting it to HTML.  I
>> do not know how to solve this problem, as it appears to be a proxy
>> configuration issue at your institution.  However, I can't say for
>> sure.  The output of the getURL() command should look like:
>>
>>  
>>> getURL("ftp://ftp.ncbi.nih.gov/pub/geo/DATA/SeriesMatrix/GSE4201/")
>>>     
>> [1] "-r--r--r--   1 ftp      anonymous   930471 Apr 13 05:32
>> GSE4201_series_matrix.txt.gz\n"
>>
>> Notice how yours is much longer and is HTML, not plain text.
>>
>> Sean
>>
>>
>>   
>
>


-- 
Cei Abreu-Goodger, PhD

Wellcome Trust Sanger Institute
Computational and Functional Genomics
Wellcome Trust Genome Campus
Hinxton, Cambridge, CB10 1SA, UK



-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE.



More information about the Bioconductor mailing list