[BioC] GEOquery, GSEMatrix parameter and lifecycle of GEO series data

Gustavo Fernández Bayón gbayon at gmail.com
Wed Jun 27 10:51:53 CEST 2012


Hi everybody. 

I am experiencing quite a few problems while trying to download and parse a dataset of methylation values. These are not technical problems, IMHO. GEOquery works perfectly, and it really makes getting this kind of data an easy task. However, I think I do not understand exactly the lifecycle of GEO series data, and I would like to ask in this list for any hint on this behavior, so I could try to fix it.

What I first did was to download and parse the desired GSE data file, with the default value of GSMMatrix parameter (TRUE). Besides, I extracted the ExpressionSet and the assayData I was looking for.

my.gse <- getGEO('GSE30870', destdir='/Users/gbayon/Documents/GEO/')
my.expr.set <- my.gse[[1]]
beta.values <- exprs(my.expr.set)

What really gave me a surprise at first, was to see many strange values (all containing the 'NA' string) in the featureNames of the expression set.

>head(featureNames(es), n=20)
[1] "NA" "cg00000108" "cg00000109" "cg00000165" "NA.1" "NA.2" "NA.3" 
[8] "NA.4" "cg00000363" "NA.5" "NA.6" "NA.7" "NA.8" "cg00000734"
[15] "NA.9" "cg00000807" "cg00000884" "NA.10" "NA.11" "NA.12"



If I select an individual GSM in the series, and download it, the featureNames are ok. If I try to download the GSE with GSEMatrix=FALSE, I get a list of GSM data sets, and the results is again good. This made me suspect of the intermediate, pre-parsed, matrix form. I haven't found a clue about the lifecycle of this kind of data. I mean, how the matrix is built. Is it a manual process? Is it automatic?

If it is a manual process, then I guess I will have to contact the responsible of uploading the data to see if they can fix it. But, if it is not, I would like to know if this is something relating to BioC or, more plausibly, to GEO. 

Any help would be appreciated.

Regards,
Gustavo


---------------------------
Enviado con Sparrow (http://www.sparrowmailapp.com/?sig)



More information about the Bioconductor mailing list