[BioC] Affymetrix data double normalisation

Tue Sep 25 17:21:29 CEST 2012

Hi Shirley,

A third option is to use fRMA, which is designed specifically for the 
situation you are in right now.

Best,

Jim

On 9/25/2012 11:19 AM, shirley zhang wrote:
> Hi Jim,
>
> I kindly have a similar question: how to analyze two large affymetrix
> gene expression datasets.
>
> I have>2,000 affymetrix data in a relatively old groups. These data
> have been normalized a year ago, which took a lot of efforts
> (miss/mixed sample correction, quality check, etc.)
>
> Then recently, we got another  3000 data on the same affymetrix
> platform, but in a relatively younger group. These data have been
> normalized separately from the previous data.
>
> Now, my question is if I would like to analyze these data together
> (>5,000 samples), what are your suggestions? Two possible ways that I
> can think of are the following:
>
> 1. Re-normalize all of these 5,000 samples all together
> 2. double normalize the two datasets, for example,
> standard-transformation (z-score) or global median normalization for
> each dataset, then group them together for the down-stream statistical
> analysis.
>
> Thanks in advance for your help,
> Shirley
>
> On Tue, Sep 25, 2012 at 10:21 AM, James W. MacDonald<jmacdon at uw.edu>  wrote:
>> Hi Jun,
>>
>> On 9/25/2012 7:11 AM, Jun Han [guest] wrote:
>>> Hi,
>>> I would like to use gcrma to do a within group normalization first (30
>>> groups in total), then input all the normalised 30 groups to do another
>>> global gcrma.
>>> Is this possible? Does the gcrma accept the inputs from the first
>>> normalisation output?
>>
>> The short answer is no. When you run gcrma(), you do background correction,
>> normalization, and finally summarization of the probe-level data, resulting
>> in probeset-level data. In other words, you are taking the PM probes and
>> summarizing them into a single value at the probeset level (after background
>> correcting and normalizing).
>>
>> Since gcrma() expects you to be inputting an AffyBatch containing PM and MM
>> probe data, it fails when you input an ExpressionSet containing summarized
>> probeset level data.
>>
>> I assume you are trying to combine two groups that you think should not be
>> normalized and summarized together. This leads to two questions - first, why
>> don't you think these data can be combined prior to the gcrma() step, and
>> second, if the answer to the first question is because of a batch effect,
>> have you looked at e.g., sva or comBat?
>>
>> Best,
>>
>> Jim
>>
>>
>>> Many thanks.
>>> Jun
>>>
>>>    -- output of sessionInfo():
>>>
>>>> gcrma12<-gcrma(gcrma1,gcrma2)
>>> Error in function (classes, fdef, mtable)  :
>>>     unable to find an inherited method for function "indexProbes", for
>>> signature "ExpressionSet", "character"
>>>
>>> --
>>> Sent via the guest posting facility at bioconductor.org.
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>> --
>> James W. MacDonald, M.S.
>> Biostatistician
>> University of Washington
>> Environmental and Occupational Health Sciences
>> 4225 Roosevelt Way NE, # 100
>> Seattle WA 98105-6099
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099