[BioC] How to analyze Affy data, CEL files not available

Thu Feb 8 05:02:27 CET 2007

I have found that after MAS there are still patterns in the data that 
can be removed by multi-array normalization.  (Try a pairs plot to see this).

Quantile normalization will remove these patterns, but this is very 
stringent and can remove signal as well as noise.

If there is an array that makes a good baseline, you could use loess 
normalization using this array as a common reference.  If not, cyclic 
loess could be done.  You start by picking an array as the reference 
and then normalizing every array against this.  Then you cycle to the 
next array, and renormalize all the arrays using this normalized 
array as the reference, etc.  I would have to do a web search to find 
the paper that suggested this.

--Naomi

At 01:02 PM 2/7/2007, James W. MacDonald wrote:
>Hi Bobby,
>
>Bobby Prill wrote:
> > I would like to analyze a set of 40 Affy experiments, but I do not
> > have the CEL files.  What I have is a spreadsheet of the MAS
> > expression measures, one column per array.  Each row corresponds to
> > one gene.
> >
> > I load the data:
> > eset = read.exprSet(exprs="mydata.txt", phenoData="phenoData.txt")
> >
> > My general question is, should/can I perform some sort of
> > normalization so that the arrays are comparable from one to
> > another?   or is this what MAS has already done?  (I'm not familiar
> > with Affy MAS.)
> >
> > Other problems include:
> >
> > 1. MA plots indicate that the data cloud is skewed (not perfectly
> > centered on M==0 line).  Should I loess?
>
>Almost certainly not. A loess normalization is almost always an
>intra-array normalization for spotted cDNA microarrays rather than
>something useful for the Affy chip type. I would look at a boxplot of
>the data to see if the samples tend to line up. MAS5.0 usually ends up
>doing a scaling and centering of the data, so you will likely see boxes
>with fairly equal medians and inter-quartile ranges.
>
>I suppose you could do a quantile normalization at this point, but that
>might not be necessary or a good idea.
>
> >
> > 2.  Also, the M values have high variance at low A, which I think is
> > a byproduct of the MAS. Probably nothing I can do about this.
>
>Nope.
> >
> > I think the typical advice would be to obtain CEL files and run rma
> > ().  But if I'm stuck with the MAS expression calls, what to do?
>
>I would make sure the boxplots line up reasonably well, then go on to
>higher level analyses. If you have the P/M/A calls you can filter out
>the 'absent' samples, or use one of the various options in the
>genefilter package.
>
>HTH,
>
>Jim
>
>
> >
> > Thanks.
> >
> > - Bobby
> >
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor at stat.math.ethz.ch
> > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > Search the archives: 
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
>--
>James W. MacDonald, M.S.
>Biostatistician
>Affymetrix and cDNA Microarray Core
>University of Michigan Cancer Center
>1500 E. Medical Center Drive
>7410 CCGC
>Ann Arbor MI 48109
>734-647-5623
>
>
>**********************************************************
>Electronic Mail is not secure, may not be read every day, and should 
>not be used for urgent or sensitive issues.
>
>_______________________________________________
>Bioconductor mailing list
>Bioconductor at stat.math.ethz.ch
>https://stat.ethz.ch/mailman/listinfo/bioconductor
>Search the archives: 
>http://news.gmane.org/gmane.science.biology.informatics.conductor

Naomi S. Altman                                814-865-3791 (voice)
Associate Professor
Dept. of Statistics                              814-863-7114 (fax)
Penn State University                         814-865-1348 (Statistics)
University Park, PA 16802-2111