[BioC] Combining GEOquery with samr or siggenes

Sean Davis sdavis2 at mail.nih.gov
Thu Sep 1 13:54:54 CEST 2011


On Thu, Sep 1, 2011 at 7:40 AM, Voke AO <ovokeraye at gmail.com> wrote:
> Hi,
>
> This would work if I had the exact sample sizes and knew what columns
> are cases or controls. I guess my question is more like...if I knew
> 100 GDS files had data related to a disease of interest, it would be a
> vicious process to go through each one and create a cl or cls file to
> correspond to the data. In such a situation, will it be possible to
> have a code that will
> 1. Go through the specified number of GDS files
> 2. Detect the different classes of samples and assign numbers
> accordingly for SAM/Siggenes for each GDS file, that will eventually
> be called back in the loop process for the SAM/Siggenes analysis of
> each file.
>
> I'm very much a beginner in programming so, I'm hoping I don't sound too naive.

Unfortunately, there will be some programming involved here.  As for
the class info, the phenoData slot of an ExpressionSet resulting from
a call to GDS2eSet will be fully populated.  Often, there will be a
standard column "disease.state" that you could use to create the cls
object.

 > gds = getGEO("GDS10")
 > eset = GDS2eSet(gds)
 > pData(eset)$disease.state
 [1] diabetic           diabetic           diabetic-resistant diabetic-resistant
 [5] diabetic-resistant diabetic-resistant diabetic-resistant diabetic-resistant
 [9] diabetic-resistant diabetic-resistant nondiabetic        nondiabetic
[13] nondiabetic        nondiabetic        diabetic           diabetic
[17] diabetic-resistant diabetic-resistant diabetic-resistant diabetic-resistant
[21] diabetic-resistant diabetic-resistant diabetic-resistant diabetic-resistant
[25] nondiabetic        nondiabetic        nondiabetic        nondiabetic
Levels: diabetic diabetic-resistant nondiabetic

Hope that helps.

Sean

> Thanks again.
>
> ~V
>
> On Thu, Sep 1, 2011 at 12:50 PM, Sean Davis <sdavis2 at mail.nih.gov> wrote:
>> On Thu, Sep 1, 2011 at 5:57 AM, Voke AO <ovokeraye at gmail.com> wrote:
>>> Hi,
>>>
>>> Is it possible to somehow have a code that can pull out several GDS
>>> info (like a batch process) using GEOquery in a way that they can
>>> subsequently be analyzed with SAM or Siggenes in a kind of loop?
>>
>> Yes.  Here is a simple example.  You will need to supply the code to
>> do any actual analysis and return the actual result, but I hope you
>> get the idea.  I used sapply as the loop structure, but you could use
>> any loop structure that you like.
>>
>> Hope that helps.
>>
>> Sean
>>
>>> gdslist = c('GDS3717','GDS3718','GDS3719')
>>> analysisfunc = function(gdsid) {
>>  gdsdat = getGEO(gdsid,destdir=".")
>>  gdseset = GDS2eSet(gdsdat)
>>  message("DO SIGGENES STUFF HERE")
>>  return(sprintf("Results from %s would be here",gdsid))
>> }
>>> resultlist = sapply(gdslist,analysisfunc)
>> File stored at:
>> ./GDS3717.soft.gz
>> File stored at:
>> /var/folders/23/234W5ZnqHPih-U4YppHVCU+++TI/-Tmp-//RtmpgeksBL/GPL570.annot.gz
>> DO SIGGENES STUFF HERE
>> File stored at:
>> ./GDS3718.soft.gz
>> File stored at:
>> /var/folders/23/234W5ZnqHPih-U4YppHVCU+++TI/-Tmp-//RtmpgeksBL/GPL1261.annot.gz
>> DO SIGGENES STUFF HERE
>> File stored at:
>> ./GDS3719.soft.gz
>> File stored at:
>> /var/folders/23/234W5ZnqHPih-U4YppHVCU+++TI/-Tmp-//RtmpgeksBL/GPL1319.annot.gz
>> DO SIGGENES STUFF HERE
>> There were 50 or more warnings (use warnings() to see the first 50)
>>> resultlist
>>                             GDS3717                              GDS3718
>> "Results from GDS3717 would be here" "Results from GDS3718 would be here"
>>                             GDS3719
>> "Results from GDS3719 would be here"
>>
>



More information about the Bioconductor mailing list