[BioC] Retrieving MAQC data from GEO using GEOquery

Mark Dunning mark.dunning at gmail.com
Thu Jul 8 11:08:45 CEST 2010


Hi Sean,

Downloading the series file directly and running getGEO worked for me.

Many thanks,

Mark

On Wed, Jul 7, 2010 at 5:21 PM, Sean Davis <sdavis2 at mail.nih.gov> wrote:
>
>
> On Wed, Jul 7, 2010 at 12:05 PM, Mark Dunning <mark.dunning at gmail.com>
> wrote:
>>
>> Hi,
>>
>> I am trying to retrieve the MAQC arrays from GEO. However I am only
>> interested in the arrays that were run on Illumina and the dataset
>> contains 19 different platforms. Is there a way of specifying which
>> platform I want to retrieve? The getGEO command seems to fail on the
>> first platform in the series and never gets to the one I'm interested
>> in (GPL2507).
>
> Hi, Mark.  I really should support this use case directly, but I don't have
> the syntactic sugar in place to do so right now.  It is on the TODO list,
> though.  However, what you want to do is pretty simple to do directly:
> download.file('ftp://ftp.ncbi.nih.gov/pub/geo/DATA/SeriesMatrix/GSE5350/GSE5350-GPL2507_series_matrix.txt.gz',destfile='GSE5350-GPL2507_series_matrix.txt.gz')
> gse = getGEO(filename=GSE5350-GPL2507_series_matrix.txt.gz")
>>
>> >library(GEOquery)
>> >temp = getGEO(GEO="GSE5350", GSEMatrix=TRUE, GSElimits=c(127,150))
>> Found 19 file(s)
>> GSE5350-GPL1355_series_matrix.txt.gz
>> trying URL
>> 'ftp://ftp.ncbi.nih.gov/pub/geo/DATA/SeriesMatrix/GSE5350/GSE5350-GPL1355_series_matrix.txt.gz'
>> Error in
>> download.file(sprintf("ftp://ftp.ncbi.nih.gov/pub/geo/DATA/SeriesMatrix/%s/%s",
>>  :
>>  cannot open URL
>>
>> 'ftp://ftp.ncbi.nih.gov/pub/geo/DATA/SeriesMatrix/GSE5350/GSE5350-GPL1355_series_matrix.txt.gz'
>>
>
> This error is intermittent and is on the NCBI end.  I get this on a pretty
> regular basis.  Try back in a few minutes and it will probably work.
>
>>
>> I tried using the GSElimits parameter but it still persists in trying
>> to download all the data.
>>
>
> Unfortunately, GSElimits only apply to a full SOFT format download
> (GSEMatrix=FALSE).  There is not an easy way to use GSElimits when
> GSEMatrix=TRUE since there are multiple files involved.
> Sean
>
>>
>> Cheers,
>>
>> Mark
>>
>>
>> > sessionInfo()
>> R version 2.11.1 (2010-05-31)
>> x86_64-unknown-linux-gnu
>>
>> locale:
>>  [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C
>>  [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8
>>  [5] LC_MONETARY=C              LC_MESSAGES=en_GB.UTF-8
>>  [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C
>>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>
>> other attached packages:
>> [1] GEOquery_2.12.0 RCurl_1.4-2     bitops_1.0-4.1  Biobase_2.8.0
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>



More information about the Bioconductor mailing list