[BioC] GEOquery package

Sean Davis sdavis2 at mail.nih.gov
Tue Aug 30 17:58:55 CEST 2011


On Tue, Aug 30, 2011 at 11:36 AM, Jing Huang <huangji at ohsu.edu> wrote:
> Dear Sean and all members,
>
> I am trying to extract GSE data from GEO and do analysis. I am wondering if the GSE data has been normalized and log 2 transformed. R scripts and output are copied below.  Can somebody help me on this?
>
>>Table(GSMList(gse)[[1]])[1:5, ]
>     ID_REF       VALUE
> 1 1007_s_at 7.693888187
> 2   1053_at 8.571408272
> 3    117_at 5.179812431
> 4    121_at 7.468027592
> 5 1255_g_at 3.118550777
>> Columns(GSMList(gse)[[1]])[1:5, ]
>     Column                Description
> 1    ID_REF
> 2     VALUE log2 signal intensity, RMA       <<<<< Does this means that the value is log2 transformed and the data was         normalized by RMA
> NA     <NA>                       <NA>
> NA.1   <NA>                       <NA>
> NA.2   <NA>                       <NA>
>
> According to GEOquery package I should do following steps in order to get the eset:

Hi, Jing.

In general, you can simply use:

gse = getGEO('GSEXXXXX')

Then, gse will be a list of ExpressionSets.  There is no longer a need
in the vast majority of settings to do the steps below.  This is
pointed out in the vignette.

As for the data and log2 transformation, it appears that these data
are log2 transformed.  However, there is no standard at GEO, so you
will need to read the details from the GEO website, read the paper, or
contact the original submitters to be sure.

Sean



>> probesets <- Table(GPLList(gse)[[1]])$ID
>> data.matrix <- do.call("cbind", lapply(GSMList(gse), function(x) {
> + tab <- Table(x)
> + mymatch <- match(probesets, tab$ID_REF)
> + return(tab$VALUE[mymatch])
> + }))
>> data.matrix <- apply(data.matrix, 2, function(x) {
> + as.numeric(as.character(x))
> + })
>> data.matrix <- log2(data.matrix)
>> data.matrix[1:5, ]
>
>     GSM424759 GSM424760 GSM424761 GSM424762 GSM424763 GSM424764 GSM424765
> [1,]  2.943713  2.917086  2.926155  2.983485  2.973219  2.962445  2.926030
> [2,]  3.099532  3.136898  3.152696  3.217172  3.206948  3.198448  3.135146
> [3,]  2.372900  2.309177  2.354380  2.373350  2.368464  2.381139  2.314555
> [4,]  2.900727  2.873853  2.863911  2.879232  2.927384  2.913594  2.852870
> [5,]  1.640876  1.645330  1.494274  1.792643  1.719597  1.648126  1.605055
>
> Is the log2 transformation  necessary for this dataset?
> Many thanks
>
> Jing
>
>
>        [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>



More information about the Bioconductor mailing list