[BioC] queryGEO fails on GDS files (GEO Datasets)

Sean Davis sdavis2 at mail.nih.gov
Wed Jan 4 17:33:17 CET 2006

On 1/4/06 10:50 AM, "Sean Davis" <sdavis2 at mail.nih.gov> wrote:

> Peter,
> I have recently uploaded a new package to bioconductor called GEOquery.  It
> is available as a development package
> (http://www.bioconductor.org/packages/bioc/1.8/html/GEOquery.html), but it
> doesn't depend on much, so should work with recent R and bioconductor
> releases.  It is capable of downloading and parsing GDS, GSM, GPL, and GSE.
> (GSE download and parsing seems to be broken on windows, at least for some
> GSEs--working on that).  After installing, you could do:
>> library(GEOquery)
> # the following takes about a minute or so....
>> gds813 <- getGEO('GDS813')
> And then to convert to an exprSet, simply do:
>> eset <- GDS2eSet(GDS,do.log2=TRUE)

Made a typo in the line above:

 eset <- GDS2eSet(gds813,do.log2=TRUE)

Will make an exprSet including the sample information from the GDS that was
downloaded and parsed using getGEO above.

>> eset
> Expression Set (exprSet) with
>     22690 genes
>     20 samples
>          phenoData object with 4 variables and 38 cases
>      varLabels
>         : sample
>         : disease.state
>         : tissue
>         : description
> Sean
> On 1/4/06 10:27 AM, "Peter" <bioconductor-mailinglist at maubp.freeserve.co.uk>
> wrote:
>> Would it make more sense to provide to separate functions:
>> Firstly, to download the file (dealing with all possible URLs) and if
>> need be decompress it.

See the function "getGEOfile" in the GEOquery package.

>> Secondly, to parse a GEO file from the provided handle/filename/url
>> This makes sense for other large GEO files like the GPL annotation
>> files, as well as the GEO datasets (GDS files).  It seems wasteful and
>> slow to download them fresh each time.

The getGEO function also includes a filename argument.  The file given by
the filename will be parsed as a GEO file; .gz files are handled
appropriately as long as the file extension '.gz' is present.


More information about the Bioconductor mailing list