[BioC] GEOquery: getGEO() doesn\'t work (error \"invalid \'nlines\' argument\")

ecsi at gmx.net ecsi at gmx.net
Tue May 29 16:17:33 CEST 2012


Hi Sean,

> The "system.file" part of your command above is not necessary (and is 
> probably the problem).  System.file is for locating files that came 
> with a specific software package.  So, you want something like:
>
> GSE19711 <- getGEO('mypath/GSE19711_family.soft.gz')

This works! Thanks a lot!

> Note that you will have to do a fair bit of work to get the data out 
> of a SOFT format file.  Instead, you should consider using a GSEMatrix 
> file. Alternatively, download the raw data and use a 
> platform-appropriate package to read in and analyze the data. 
>  Finally, note that you do not need to download files separately.

Well, my problem is that I am not quite sure about the "best" way to get 
the data I need. I'll try to give an example:

We have the GEO Series GSE19711. For all the samples of this series, I 
need some specific information. Let's use the first sample of GSE19711 
as an example: http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM491937

I need to know the age of the patient ("ageatdiagnosis", if it is a pre- 
or a post-treatment sample, and the sex of the patient (in this case all 
samples are from women) and maybe some other information (in case of 
other series). And of course, I need the data matrix itself, to be 
finally able to create something similar to an ExpressionSet, but using 
the methylumi package, because all this is about methylation and not 
gene expression.

I have to deal with several thousand samples from many different GEO 
series, therefore I want to automate the fetching of the phenodata 
information of the patients. Searching for a solution to do this, I 
found the GEOquery package and I thought it would be the best way to 
deal with the soft-Files because these files are available for all 
series I want to analyze, and they contain all information available, I 
thought (so far I worked only with expression data where I used RAW 
files, but there were always also phenodata files available, so it was a 
lot easier).

If you can think of any better way to get the data I need and to 
annotate the sample <-> phenodata information in an easy way, please 
tell me, I would be very happy.

Simone



More information about the Bioconductor mailing list