[BioC] queryGEO fails on GDS files (GEO Datasets)

Sean Davis sdavis2 at mail.nih.gov
Wed Jan 4 16:50:50 CET 2006


Peter,

I have recently uploaded a new package to bioconductor called GEOquery.  It
is available as a development package
(http://www.bioconductor.org/packages/bioc/1.8/html/GEOquery.html), but it
doesn't depend on much, so should work with recent R and bioconductor
releases.  It is capable of downloading and parsing GDS, GSM, GPL, and GSE.
(GSE download and parsing seems to be broken on windows, at least for some
GSEs--working on that).  After installing, you could do:
 
> library(GEOquery)
# the following takes about a minute or so....
> gds813 <- getGEO('GDS813')

And then to convert to an exprSet, simply do:

> eset <- GDS2eSet(GDS,do.log2=TRUE)
> eset
Expression Set (exprSet) with
    22690 genes
    20 samples
         phenoData object with 4 variables and 38 cases
     varLabels
        : sample
        : disease.state
        : tissue
        : description

Sean


On 1/4/06 10:27 AM, "Peter" <bioconductor-mailinglist at maubp.freeserve.co.uk>
wrote:

> This follows on from a question from Saurin D. Jani, on the list a year ago:
> 
> https://stat.ethz.ch/pipermail/bioconductor/2005-January/007405.html
> 
> A working example:
> 
> library(AnnBuilder)
> geo <- GEO()
> queryGEO(geo,"GSM107")
> 
> This downloads and parses:-
> 
> http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM107&targ=self&form=text&v
> iew=data
> 
> This fails for GEO Datasets (GDS files) like GDS813 (Saurin's example)
> because the URL isn't accepted - the NCBI returns an HTML page which
> redirects you to:
> 
> http://www.ncbi.nlm.nih.gov/projects/geo/gds/gds_browse.cgi?gds=813
> 
> This page in turn can be used (by a human, a little more tricky in code)
> to download the actual GDS file - but only in compressed form:
> 
> ftp://ftp.ncbi.nih.gov/pub/geo/data/gds/soft_gz/GDS813.soft.gz
> 
> What this means is that at the moment, queryGEO doesn't support GDS
> files.  Even if it did, they are generally large and only available in
> compressed format, making things generally more complicated.
> 
> Would it make more sense to provide to separate functions:
> 
> Firstly, to download the file (dealing with all possible URLs) and if
> need be decompress it.
> 
> Secondly, to parse a GEO file from the provided handle/filename/url
> 
> This makes sense for other large GEO files like the GPL annotation
> files, as well as the GEO datasets (GDS files).  It seems wasteful and
> slow to download them fresh each time.
> 
> Peter
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor



More information about the Bioconductor mailing list