[BioC] GEOquery and Sample Subsets

Sean Davis sdavis2 at mail.nih.gov
Tue Jun 4 20:10:52 CEST 2013


On Tue, Jun 4, 2013 at 2:02 PM, Thomas H. Hampton
<Thomas.H.Hampton at dartmouth.edu> wrote:
> This looks totally cool.
>
> Is there a place where one can view the schema of the relational db?

Hi, Tom.

See the vignette for a diagram and for examples.  We obviously also
assume some familiarity with SQL.

Sean


> In any case -- Thanks tons!
>
> Tom
>
>
>
> ________________________________________
> From: seandavi at gmail.com [seandavi at gmail.com] on behalf of Sean Davis [sdavis2 at mail.nih.gov]
> Sent: Tuesday, June 04, 2013 1:19 PM
> To: Thomas H. Hampton
> Cc: bioconductor at r-project.org; Jack zhu
> Subject: Re: [BioC] GEOquery and Sample Subsets
>
> On Tue, Jun 4, 2013 at 1:14 PM, Thomas H. Hampton
> <Thomas.H.Hampton at dartmouth.edu> wrote:
>> Exactly!
>
> This might help:
>
> http://www.bioconductor.org/packages/release/bioc/html/GEOmetadb.html
>
> Let us know if you have questions.
>
> Sean
>
>
>> Thanks.
>>
>> ________________________________________
>> From: seandavi at gmail.com [seandavi at gmail.com] on behalf of Sean Davis [sdavis2 at mail.nih.gov]
>> Sent: Tuesday, June 04, 2013 12:54 PM
>> To: Thomas H. Hampton
>> Cc: bioconductor at r-project.org
>> Subject: Re: [BioC] GEOquery and Sample Subsets
>>
>> On Tue, Jun 4, 2013 at 12:38 PM, Thomas H. Hampton
>> <Thomas.H.Hampton at dartmouth.edu> wrote:
>>> I am using to GEOquery to establish sample subsets of GEO data -- that is, I would
>>> like to know which samples are replicates.
>>>
>>> I am doing it something like this:
>>>
>>> gds505 <- getGEO("GDS505")
>>> Columns(gds505)
>>>
>>>> str(Columns(gds505))
>>> 'data.frame': 17 obs. of  4 variables:
>>>  $ sample       : Factor w/ 17 levels "GSM11805","GSM11814",..: 2 4 5 7 9 10 12 14 16 1 ...
>>>  $ disease.state: Factor w/ 2 levels "normal","RCC": 2 2 2 2 2 2 2 2 2 1 ...
>>>  $ individual   : Factor w/ 10 levels "001","005","011",..: 6 4 1 2 3 5 8 9 10 6 ...
>>>  $ description  : chr  "Value for GSM11814: C035 Renal Clear Cell Carcinoma U133A; src: Trizol...
>>>
>>> The problem I have is that the getGEO command retrieves a rather large object:
>>>
>>>> print(object.size(gds505), units="Mb")
>>> 12.6 Mb'
>>>
>>> This takes up a lot of time and bandwidth if you plan to do it for thousands of accessions.
>>>
>>> Is there a way to retrieve less?
>>
>> Hi, Tom.  Are you saying that you really want just the metadata to
>> start; in other words, you just want the sample information without
>> the expression values?
>>
>> Sean
>>
>>
>>> I am happy to use R, BioConductor, bioperl or whatever.
>>>
>>> Best,
>>>
>>> Tom
>>>
>>>
>>>
>>>         [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list