[BioC] GEOmetadb query

Jack Zhu zhujack at mail.nih.gov
Fri Jan 22 17:14:40 CET 2010


Hi Boris and Sean,

I agree that we probably can not  directly find human cancer samples
with both gene expression and copy number data. In GEO, a gene
expression GSM and a copy number GSM are two different entries  even
molecular of these two samples were extracted from a same human
cancer sample.  Whether a user can find any clue between these two
GSMs will depend on how the submitter  submitted the data to GEO.  But
at GSE level, GEO has introduced Super GSE concept, which tries to put
all related GSEs (mightbe different platforms) under one Super GSE for
a manuscript (you might want to double check with GEO about this), but
I am not sure how many such super GSEs are there.

If you try to find out what lab/center submitted both gene expression
and copy number data, I would try this (need some reading and manual
comparing at the end):

> library(GEOmetadb)
> getSQLiteFile()
> con <- dbConnect(SQLite(), "GEOmetadb.sqlite")

####  Find human cancer expression GSMs:

> gsm_human_cancer_exp <- sqliteQuickSQL(con,"SELECT DISTINCT gsm FROM gsm WHERE characteristics_ch1 LIKE '%cancer%'  AND molecule_ch1 = 'total RNA' AND organism_ch1 = 'Homo sapiens' ")
## Convert to GSE
> gse_conversion1 <-  geoConvert(gsm_human_cancer_exp[[1]], 'gse')
> gse_human_cancer_exp <- unique(gse_conversion1$gse$to_acc)


####  Find human cancer aCGH GSMs (might not be accurate):
> gsm_human_cancer_cgh <- sqliteQuickSQL(con,"SELECT DISTINCT gsm FROM gsm WHERE characteristics_ch1 LIKE '%cancer%'  AND molecule_ch1 = 'genomic DNA' AND organism_ch1 = 'Homo sapiens' ")
## Convert to GSE
> gse_conversion2 <-  geoConvert(gsm_human_cancer_cgh[[1]], 'gse')
> gse_human_cancer_cgh <- unique(gse_conversion2$gse$to_acc)

## Try to compare manually if any GSEs of gse_human_cancer_exp and
gse_human_cancer_cgh are from the same submitter, or lab

> dbDisconnect(con)


Hope this helps.

Jack


On Thu, Jan 21, 2010 at 11:56 PM, Davis, Sean (NCI) <seandavi at gmail.com> wrote:
> ---------- Forwarded message ----------
> From: Boris Zybailov <boriszybailov at gmail.com>
> Date: Thu, Jan 21, 2010 at 11:51 PM
> Subject: Re: [BioC] GEOmetadb query
> To: Sean Davis <seandavi at gmail.com>
>
>
> Thank you for the quick response.
> Yes, this is exactly what I need
>
> On Thu, Jan 21, 2010 at 11:49 PM, Sean Davis <seandavi at gmail.com> wrote:
>> On Thu, Jan 21, 2010 at 11:42 PM,  <BorisZybailov at gmail.com> wrote:
>>> Dear list,
>>>
>>> I installed GEOmetadb in order to find all the human cancer-related gene
>>> expression GEO series,
>>> for which there are also aCGH data available. But I can not figure out how
>>> to do this and
>>> I would really appreciate any advice.
>>
>> Hi, Boris.  Just to clarify, you want to find human cancer samples
>> with both gene expression and copy number data?
>
> Hi, Jack.  I don't think what he wants to do is possible with GEO.
> However, I suppose one could pull out all GSEs with both CGH and
> expression data and hope those have what he wants (paired human cancer
> samples).  Do you mind following up with one of your masterful
> queries?
>
> Thanks,
> Sean
>



More information about the Bioconductor mailing list