[BioC] queryGEO fails on GDS files (GEO Datasets)
sdavis2 at mail.nih.gov
Wed Jan 4 16:50:50 CET 2006
I have recently uploaded a new package to bioconductor called GEOquery. It
is available as a development package
(http://www.bioconductor.org/packages/bioc/1.8/html/GEOquery.html), but it
doesn't depend on much, so should work with recent R and bioconductor
releases. It is capable of downloading and parsing GDS, GSM, GPL, and GSE.
(GSE download and parsing seems to be broken on windows, at least for some
GSEs--working on that). After installing, you could do:
# the following takes about a minute or so....
> gds813 <- getGEO('GDS813')
And then to convert to an exprSet, simply do:
> eset <- GDS2eSet(GDS,do.log2=TRUE)
Expression Set (exprSet) with
phenoData object with 4 variables and 38 cases
On 1/4/06 10:27 AM, "Peter" <bioconductor-mailinglist at maubp.freeserve.co.uk>
> This follows on from a question from Saurin D. Jani, on the list a year ago:
> A working example:
> geo <- GEO()
> This downloads and parses:-
> This fails for GEO Datasets (GDS files) like GDS813 (Saurin's example)
> because the URL isn't accepted - the NCBI returns an HTML page which
> redirects you to:
> This page in turn can be used (by a human, a little more tricky in code)
> to download the actual GDS file - but only in compressed form:
> What this means is that at the moment, queryGEO doesn't support GDS
> files. Even if it did, they are generally large and only available in
> compressed format, making things generally more complicated.
> Would it make more sense to provide to separate functions:
> Firstly, to download the file (dealing with all possible URLs) and if
> need be decompress it.
> Secondly, to parse a GEO file from the provided handle/filename/url
> This makes sense for other large GEO files like the GPL annotation
> files, as well as the GEO datasets (GDS files). It seems wasteful and
> slow to download them fresh each time.
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
More information about the Bioconductor