[BioC] processing Illumina HT12v4.0 expression data from GEO

Abhishek Pratap abhishek.vit at gmail.com
Wed Jul 23 17:43:12 CEST 2014


Hi Sean

Thanks for the details. I actually was wondering if I can get the raw
data so I can do my own normalization. For example in the case of affy
based GEO studies I normally see the raw CEL files also present which
can be used with fRMA for producing normalized data.

In this specific study
http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE58037 I dont see
the raw data similar to CEL files in affy studies. Just wondering if
this particular case where that is missing or beadarray based studies
dont tend to have raw data in GEO.

Cheers!
-Abhi

On Wed, Jul 23, 2014 at 3:40 AM, Sean Davis <sdavis2 at mail.nih.gov> wrote:
>
>
>
> On Thu, Jul 17, 2014 at 8:17 PM, Abhishek Pratap <abhishek.vit at gmail.com>
> wrote:
>>
>> Hi Guys
>>
>> I would like to know the basic analysis workflow for downloading and
>> processing a Illumina HTV12 expression data from GEO. I have seen the
>> beadArray vignette but not sure which normalization process to use.
>>
>> For example with Affy datasets I normally download the raw data and
>> normalize it with fRMA package to produce a final expression matrix of
>> genes.
>>
>> Here is some code but basically the final goal is to produce a
>> normalized  expression matrix at genelevel.
>>
>> library( GEOquery )
>> gse <- getGEO("GSE58037")
>> gse <- gse[[1]]
>> mat <- exprs(gse)
>>
>
> Hi, Abhi.
>
> The "mat" variable above will give you expression measures as submitted by
> the authors.  NCBI GEO provides a description:
>
> "The data were normalised using normal-exponential convolution model-based
> background correction and quantile normalization. Merging of the data,
> background removal and normalization processes were performed using the
> limma R package. All of the batches were normalized at once after excluding
> probes with low quality."
>
> If you do not want to use those normalized values, then you will need to
> define for yourself what the best approach is.  I don't know of an accepted
> "best" approach for such arrays.
>
> Sean
>
>
>>
>> Appreciate any pointers
>>
>> Thanks!
>> -Abhi
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>



More information about the Bioconductor mailing list