[BioC] getGEO and wilcox.test

Ovokeraye Achinike-Oduaran ovokeraye at gmail.com
Tue Mar 20 13:37:28 CET 2012


Hi,

Sorry about the vagueness.

This is how I have retrieved my data from GEO. I'm trying to see the
DE of the genes across the two conditions (IR and IS). I just couldn't
figure out how to apply this info to wilcox.test()

gds157dat = getGEO('GDS157',destdir=".")
gds157eset = GDS2eSet(gds157dat, do.log2=TRUE)
groups= pData(gds157eset)$metabolism
groups=as.character(groups)
groups[groups=="insulin sensitive"]= "IS"
groups[groups=="insulin resistant"]= "IR"

sessionInfo()
R version 2.14.1 (2011-12-22)
Platform: i386-pc-mingw32/i386 (32-bit)

locale:
[1] LC_COLLATE=English_.1252  LC_CTYPE=English_.1252
[3] LC_MONETARY=English_.1252 LC_NUMERIC=C
[5] LC_TIME=English_.1252

attached base packages:
[1] stats4    splines   stats     graphics  grDevices utils     datasets
[8] methods   base

other attached packages:
 [1] coin_1.0-21         modeltools_0.2-19   mvtnorm_0.9-9992
 [4] survival_2.36-12    XML_3.9-4.1         RCurl_1.91-1.1
 [7] bitops_1.0-4.1      puma_2.6.0          mclust_3.4.11
[10] limma_3.10.2        ArrayExpress_1.14.0 affy_1.32.1
[13] GEOquery_2.20.8     Biobase_2.14.0

loaded via a namespace (and not attached):
[1] affyio_1.22.0         BiocInstaller_1.2.1   preprocessCore_1.16.0
[4] zlibbioc_1.0.0
>

Regards,

Avoks

On Tue, Mar 20, 2012 at 2:15 PM, Sean Davis <sdavis2 at mail.nih.gov> wrote:
>
>
> On Tue, Mar 20, 2012 at 7:56 AM, Vincent Carey <stvjc at channing.harvard.edu>
> wrote:
>>
>> Please read the posting guide
>> http://www.bioconductor.org/help/mailing-list/posting-guide/ before
>> querying this list.
>>
>> You have not given any information on how you have used getGEO.  To help
>> you, I issued
>>
>> > library(GEOquery)
>> Setting options('download.file.method.GEOquery'='auto')
>> > gg = getGEO("GDS157")
>> File stored at:
>>
>> /var/folders/4D/4DI98FkjGzq0K2niUTEHSE+++TM/-Tmp-//RtmpGnz9Cf/GDS157.soft.gz
>> > gg
>> An object of class "GDS"
>
>
> At this point, if you would like to work with an ExpressionSet instead of a
> GDS object, try:
>
> expset = GDS2eSet(gg)
>
> Sean
>
>>
>> channel_count
>> [1] "1"
>> dataset_id
>> [1] "GDS157" "GDS157"
>> description
>> [1] "Analysis of gene expression in pooled vastus lateralis muscle samples
>> from insulin-sensitive and insulin-resistant equally obese, non-diabetic
>> Pima Indians. A search for susceptibility genes for type 2 diabetes.    "
>> ...
>>
>> > getClass("GDS")
>> Class "GDS" [package "GEOquery"]
>>
>> Slots:
>>
>> Name:           gpl    dataTable       header
>> Class:          GPL GEODataTable         list
>>
>> Extends: "GEOData"
>> > getClass("GEODataTable")
>> Class "GEODataTable" [package "GEOquery"]
>>
>> Slots:
>>
>> Name:     columns      table
>> Class: data.frame data.frame
>>
>> Here I am using R's self-describing capacities to learn about what the
>> query returned.
>>
>> > gg at dataTable@columns
>>    sample        metabolism
>> 1  GSM2289 insulin resistant
>> 2  GSM2294 insulin resistant
>> 3  GSM2299 insulin resistant
>> 4  GSM2304 insulin resistant
>> 5  GSM2309 insulin resistant
>> 6  GSM2313 insulin sensitive
>> 7  GSM2318 insulin sensitive
>> 8  GSM2323 insulin sensitive
>> 9  GSM2328 insulin sensitive
>> 10 GSM2333 insulin sensitive
>>
>> description
>> 1  Value for GSM2289: insulin resistant sample pool 1 muscle on HuFL; src:
>> muscle
>> 2  Value for GSM2294: insulin resistant sample pool 2 muscle on HuFL; src:
>> muscle
>> 3  Value for GSM2299: insulin resistant sample pool 3 muscle on HuFL; src:
>> muscle
>> 4  Value for GSM2304: insulin resistant sample pool 4 muscle on HuFL; src:
>> muscle
>> 5  Value for GSM2309: insulin resistant sample pool 5 muscle on HuFL; src:
>> muscle
>> 6  Value for GSM2313: insulin sensitive sample pool 1 muscle on HuFL; src:
>> muscle
>> 7  Value for GSM2318: insulin sensitive sample pool 2 muscle on HuFL; src:
>> muscle
>> 8  Value for GSM2323: insulin sensitive sample pool 3 muscle on HuFL; src:
>> muscle
>> 9  Value for GSM2328: insulin sensitive sample pool 4 muscle on HuFL; src:
>> muscle
>> 10 Value for GSM2333: insulin sensitive sample pool 5 muscle on HuFL; src:
>> muscle
>>
>> Now I start to see that the collection of samples may be viewed as falling
>> into two classes.  If you want to use wilcox.test to address a two-sample
>> problem arising from this experiment, you will have to use the information
>> shown above to distinguish numerical values on gene expression into the
>> classes.  There is more than enough information in the above to begin this
>> process; for biological interpretation you need to know a little more: you
>> will need to know the GPL80 is documented in the package hu6800.db.
>>
>> On Tue, Mar 20, 2012 at 7:24 AM, Ovokeraye Achinike-Oduaran <
>> ovokeraye at gmail.com> wrote:
>>
>> > Hi all,
>> >
>> > I am not quite sure how to use the expression set I get from getGEO(),
>> > say gds157, in wilcox.test().
>> >
>> > Please help.
>> >
>> > Thanks.
>> >
>> > Avoks
>> >
>> > _______________________________________________
>> > Bioconductor mailing list
>> > Bioconductor at r-project.org
>> > https://stat.ethz.ch/mailman/listinfo/bioconductor
>> > Search the archives:
>> > http://news.gmane.org/gmane.science.biology.informatics.conductor
>> >
>>
>>        [[alternative HTML version deleted]]
>>
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>



More information about the Bioconductor mailing list