[BioC] Affymetrix data double normalisation

shirley zhang shirley0818 at gmail.com
Tue Sep 25 17:19:46 CEST 2012


Hi Jim,

I kindly have a similar question: how to analyze two large affymetrix
gene expression datasets.

I have >2,000 affymetrix data in a relatively old groups. These data
have been normalized a year ago, which took a lot of efforts
(miss/mixed sample correction, quality check, etc.)

Then recently, we got another  3000 data on the same affymetrix
platform, but in a relatively younger group. These data have been
normalized separately from the previous data.

Now, my question is if I would like to analyze these data together
(>5,000 samples), what are your suggestions? Two possible ways that I
can think of are the following:

1. Re-normalize all of these 5,000 samples all together
2. double normalize the two datasets, for example,
standard-transformation (z-score) or global median normalization for
each dataset, then group them together for the down-stream statistical
analysis.

Thanks in advance for your help,
Shirley

On Tue, Sep 25, 2012 at 10:21 AM, James W. MacDonald <jmacdon at uw.edu> wrote:
> Hi Jun,
>
> On 9/25/2012 7:11 AM, Jun Han [guest] wrote:
>>
>> Hi,
>> I would like to use gcrma to do a within group normalization first (30
>> groups in total), then input all the normalised 30 groups to do another
>> global gcrma.
>> Is this possible? Does the gcrma accept the inputs from the first
>> normalisation output?
>
>
> The short answer is no. When you run gcrma(), you do background correction,
> normalization, and finally summarization of the probe-level data, resulting
> in probeset-level data. In other words, you are taking the PM probes and
> summarizing them into a single value at the probeset level (after background
> correcting and normalizing).
>
> Since gcrma() expects you to be inputting an AffyBatch containing PM and MM
> probe data, it fails when you input an ExpressionSet containing summarized
> probeset level data.
>
> I assume you are trying to combine two groups that you think should not be
> normalized and summarized together. This leads to two questions - first, why
> don't you think these data can be combined prior to the gcrma() step, and
> second, if the answer to the first question is because of a batch effect,
> have you looked at e.g., sva or comBat?
>
> Best,
>
> Jim
>
>
>> Many thanks.
>> Jun
>>
>>   -- output of sessionInfo():
>>
>>> gcrma12<-gcrma(gcrma1,gcrma2)
>>
>> Error in function (classes, fdef, mtable)  :
>>    unable to find an inherited method for function "indexProbes", for
>> signature "ExpressionSet", "character"
>>
>> --
>> Sent via the guest posting facility at bioconductor.org.
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
> --
> James W. MacDonald, M.S.
> Biostatistician
> University of Washington
> Environmental and Occupational Health Sciences
> 4225 Roosevelt Way NE, # 100
> Seattle WA 98105-6099
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list