[BioC] summarizing probe intensites before or after normalization- 1. how to do with RMA 2. Opinions?

James W. MacDonald jmacdon at med.umich.edu
Mon Sep 11 16:57:09 CEST 2006


Hi Karl,

You have to convert the expression data to a data.frame in order to get 
a boxplot for each column.

boxplot(data.frame(exprs(datrma)))

I am shocked at the boxplots for rma(). I have never seen RMA processed 
data look that different, which makes me wonder what the raw data look 
like. Also, are you sure these are RMA processed data (e.g., you didn't 
accidently mask the rma() processed data with other data)? Unless the 
raw data are completely horrible, I don't know how you could get such 
disparate results.



Best,

Jim

k. brand wrote:
> James,
> 
> Thank you for your fast detailed response.
> 
> At your suggetion i tried your suugestion- ie.,
> 
> library(affyPLM)
> dat <- ReadAffy()
> datrma <- rma(dat, normalize=FALSE)
> datrma <- normalize.quantiles(exprs(datrma))
> boxplot(datrma)
> 
> Find attached the boxplot output "RMA then QnormJimscript.jpeg" which 
> indicates some missing syntax since there is only one box for the 12 
> arrays?! In fact in my attempts to realize this endeavour i saw this 
> ouput too. Unfortunately my lack of R knowledge prevented me getting 
> around whatever the problem is. If you have a further suggestion im all 
> ears.
> 
> "data", the result of inheriting my teachers bad habits, is now "dat"...
> 
> Further i attach a boxplot (better than histogram right- "RMA vs Qunt 
> norm of MAS5 preproced & summed.jpg") comparing the two methods in my 
> orignal post. When i look at the variation of the RMA processed and 
> normed data i question how effectively can i compare these arrays with 
> each. Especially along side the method shown which is MAS5 preprocessed 
> then quantile normalized using a colleagues script. These arrays look 
> much more comparable, even ideally so. Why dont i just use this appraoch 
> you may ask? A: im convinced RMA is a superior preprocessing and 
> summarizing method...i just need to be able to reconcile the apparent 
> variation in the final output. Or perhaps, better understand it.
> 
> Your further thoughts, suggests greatly appreciated,
> 
> karl
> 
> 
> 
> 
> 
> on 9/11/2006 3:42 PM James W. MacDonald said the following:
> 
>> k. brand wrote:
>>
>>> Dear All,
>>>
>>> I compared two normalization approaches for an experiment using 
>>> twelve affy 430-2.0 chips. (histogram plot comparing bith methods 
>>> forwarded on request).
>>>
>>> #1. RMA
>>> library(affy)
>>> data <- ReadAffy()
>>> datarma <- rma(data)
>>> exprs2excel(datarma, file="dataRMA.csv")
>>>
>>> Plotting histograms of the output shows arrays NOT perfectly aligning 
>>> at the means and spreads.
>>>
>>> I used a custom script to effect a quantile normalization on MAS5 
>>> preprocessed but unnormalized data-
>>>
>>> #2. Mas5 sans interchip normalization
>>> library(affy)
>>> data <- ReadAffy()
>>> datamas5sannorm <- mas5(data, normalize=FALSE)
>>> exprs2excel(datamas5sannorm, file="datamas5sannorm.csv")
>>> f.qnorm <- function(x,qinit=0.75,perc=100)  {...
>>>
>>> The means and spreads of this normalization approach do align perfectly.
>>>
>>> THUS- summarizing probe intensites before or after normalization does 
>>> appear to make a noticeable difference, as may be expected.
>>>
>>> My questions/requests-
>>>
>>> 1. Help to effect Bolstad normalization of the RMA preprocessed and 
>>> summarized data. Whilst I succeed in generating unnormalized RMA 
>>> preprocessed data with-
>>>
>>> library(affy)
>>> data <- ReadAffy()
>>> datarma <- rma(data, normalize=FALSE)
>>
>>
>> Next step would be
>>
>> datarma <- normalize.quantiles(exprs(datarma))
>>
>> also note that 'data' is not a very good variable name, as you are 
>> masking an existing function. When creating variable names it is often 
>> enlightening to type the name first at an R prompt to see if you get 
>> any response.
>>
>>
>>>
>>> As a result of my limited R experience, I failed in finding a method 
>>> to effect Bolstad (quantile) normalization on this output.
>>>
>>> 2. Thoughts/comments on the benefits/caveats of normalizing before or 
>>> after summarizing probe intensities.
>>
>>
>> Normalizing after summarization for something like rma() seems 
>> questionable to me. Since the expression values are based on fitting a 
>> model to the PM probe values, if you don't normalize first you are 
>> ignoring any non-biological variability which may end up biasing your 
>> results. Using median polish for the model fit should help protect 
>> against this, but I don't know that I would want to take chances.
>>
>> As an aside, how far off are the histograms? Are you sure that there 
>> is a reasonable difference? Eyeballing a histogram isn't the best way 
>> to determine if the mean and variance are different or not. A quick 
>> run through with some data here shows very little differences:
>>
>>  > eset <- justRMA(filenames=list.celfiles()[1:10])
>>  > apply(exprs(eset),2,summary)
>>         Sample 1 Sample 2 Sample 3 Sample 4 Sample 5 Sample 6 Sample 7
>> Min.       4.085    4.070    4.091    4.051    4.068    4.090    4.087
>> 1st Qu.    5.835    5.859    5.832    5.812    5.842    5.858    5.852
>> Median     7.079    7.069    7.048    7.061    7.070    7.077    7.080
>> Mean       7.225    7.227    7.224    7.227    7.229    7.225    7.232
>> 3rd Qu.    8.352    8.324    8.351    8.363    8.361    8.330    8.347
>> Max.      14.550   14.440   14.420   14.400   14.490   14.430   14.260
>>
>> Best,
>>
>> Jim
>>
>>
>>>
>>> I look forward to any thoughts, advice & suggestions from users.
>>>
>>> thanks in advance,
>>>
>>> Karl
>>>
>>>
>>> ===========================================
>>>
>>>    > sessionInfo()
>>> Version 2.3.0 (2006-04-24)
>>> i386-pc-mingw32
>>>
>>> attached base packages:
>>> [1] "tools"     "methods"   "stats"     "graphics"  "grDevices" "utils"
>>>        "datasets"  "base"
>>>
>>> other attached packages:
>>>        affy   affyio  Biobase
>>> "1.10.0"  "1.0.0" "1.10.0"
>>>
>>
>>
> 


-- 
James W. MacDonald, M.S.
Biostatistician
Affymetrix and cDNA Microarray Core
University of Michigan Cancer Center
1500 E. Medical Center Drive
7410 CCGC
Ann Arbor MI 48109
734-647-5623


**********************************************************
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues.



More information about the Bioconductor mailing list