[BioC] RMA vs VSN

Roger Vallejo rvallejo at psu.edu
Mon Jun 21 21:08:12 CEST 2004


This must be of interest for those preprocessing data from affymetrix
chips. We have compared RMA vs VSN performing an lme-ANOVA. If you are
wondering what to use RMA or VSN? or what are the potential pitfalls or
benefits from using either normalization or background data correction
approach. Then, please read below and make your own conclusions.

 

Thank you to the enlightening discussion followed up with colleagues at
the Bioconductor group.

 

Roger

 

 

Roger L. Vallejo, Ph.D.

Assist. Professor of Genomics & Bioinformatics

Genomics & Bioinformatics Laboratory

Department of Dairy & Animal Science

The Pennsylvania State University

305 Henning Building

University Park, PA 16802

Phone:       (814) 865-1846 

Email:        rvallejo at psu.edu

 

-----Original Message-----
From: Rafael Irizarry [mailto:ririzarr at jhsph.edu] 
Sent: Monday, June 21, 2004 2:12 PM
To: Roger Vallejo
Subject: Re: [BioC] RMA vs VSN

 

you should consider posting something on the bioc list. i think  this  

may help many others.

On Jun 21, 2004, at 11:35 AM, Roger Vallejo wrote:

 

> Dear Rafael,

> I am glad that I asked this question on RMA and VSN. Your comments  

> below

> are true. I have quickly checked outputs from LME-ANOVA using data

> preprocessed separately with RMA and VSN.  Indeed several interesting

> genes detected with RMA as significant ones are not detected or missed

> with VSN. Also generally the P-values for those significant genes are

> more striking when using RMA than VSN. God knows what else I might be

> missing by using VSN, because I am just checking for those genes that


> we

> know are related to immune and inflammatory events.  So the rate of

> false undiscoveries is increased with vsn at expense of slightly lower

> FDR. I would rather maximize gene discovery at slightly higher and

> acceptable FDR.

> Thanks for the excellent point!

> Roger

>

>

> Roger L. Vallejo, Ph.D.

> Assist. Professor of Genomics & Bioinformatics

> Genomics & Bioinformatics Laboratory

> Department of Dairy & Animal Science

> The Pennsylvania State University

> 305 Henning Building

> University Park, PA 16802

> Phone:       (814) 865-1846

> Email:        rvallejo at psu.edu

>

> -----Original Message-----

> From: Rafael A. Irizarry [mailto:ririzarr at jhsph.edu]

> Sent: Saturday, June 19, 2004 3:58 PM

> To: Roger Vallejo

> Cc: rafa at jhu.edu

> Subject: RE: [BioC] RMA vs VSN

>

> i believe the difference does not come from the vsn but from

> background=FALSE. try

>

>  eset <- expresso(Data,bg.correct=FALSE,

>  normalize.method="quantiles", pmcorrect.method="pmonly",

>  summary.method="medianpolish")

>

> i suspect you will get similar results.

>

> when you do not bg correct the variance level for low expressed genes


> is

> much smaller. but also the estimates of fold change get attenuated.

> false

> discoveries are lower but false "undiscoveries" increase.

>

>  On Sat, 19

> Jun 2004, Roger Vallejo wrote:

>

>> Dear Rafael,

>> Thank you very much for your comments.

>> Our results are somewhat different for VSN vs. RMA. If they were

> similar

>> likely I could have kept using RMA because it is part of our standard

>> array data preprocessing functions. The p-values are smaller and

> thereby

>> the PER and FDR are slightly more acceptable (although not much) when

>> using p-values from VSN normalization and lme-anova. I would like to

>> make sure that if deciding to use VSN in the way that I indicated

>> (please see the functions below), I am not over-normalizing my data
as

>> you indicated and most important that I am using a data normalization

>> fucntion that is as good as RMA. Thanks for your comments.

>> Roger

>>

>> Roger L. Vallejo, Ph.D.

>> Assist. Professor of Genomics & Bioinformatics

>> Genomics & Bioinformatics Laboratory

>> Department of Dairy & Animal Science

>> The Pennsylvania State University

>> 305 Henning Building

>> University Park, PA 16802

>> Phone:       (814) 865-1846

>> Email:        rvallejo at psu.edu

>>

>> -----Original Message-----

>> From: Rafael A. Irizarry [mailto:ririzarr at jhsph.edu]

>> Sent: Saturday, June 19, 2004 2:05 PM

>> To: Roger Vallejo

>> Cc: bioconductor at stat.math.ethz.ch

>> Subject: Re: [BioC] RMA vs VSN

>>

>> vsn and rma are not competitors. the first is a normalization

>> technique, the second is a way to obtain expression measures from
affy

>> arrays which includes background adjustment, normaliztion, and

>> summarization. rma uses quantile normalization as a default.

>> changing this to vsn yields, in general, very similar results.

>>

>> notice, some use rma to obtain an expression measure and then

>> use vsn to nromalize that, although i worry this could result in

>> over-normalization.

>>

>> On Sat, 19 Jun 2004, Roger Vallejo wrote:

>>

>>> We have a small experiment with high FDR (around 0.40): 8 affymetrix

>>> mouse genechips with 22k genes, 2 replications, saline and E. coli

>>> treated mammary tissue, evaluated at 24 hr and 48 hr post

> injections.

>>>

>>> I have run both data preprocessing functions via expresso. To

>>> subsequenctly run an lme-ANOVA.  As expected, I got lower FDR and

> much

>>> smaller p-values when using VSN. The FDR was estimated using QVALUE

>>> package. Obviously, I feel tempted to use VSN instead of RMA.

> However,

>>> before proceeding I would like to hear some comments from the

>>> Bioconductor group on this approach. The question is:

>>>

>>> Is VSN better than RMA?

>>>

>>> I have read the literature and both claim to be the function to be

>> used!

>>>

>>>

>>> Personally, I feel more towards the use of VSN. I might be wrong, so

> I

>>> would appreciate any suggestions or comments on this.

>>>

>>> These are the functions that I used:

>>>

>>> *************************************************************

>>>

>>> For RMA:

>>>

>>>> library(affy)

>>>

>>>> Data <- ReadAffy(widget=TRUE)

>>>

>>>

>>>> eset <- expresso(Data,bgcorrect.method="rma",

>>> normalize.method="quantiles", pmcorrect.method="pmonly",

>>> summary.method="medianpolish")

>>>

>>>

>>>

>>> ********************************************************************

>>>

>>> For VSN:

>>>

>>>> library(affy)

>>>

>>>> Data <- ReadAffy(widget=TRUE)

>>

>>>

>>>> library(vsn)

>>>

>>>> normalize.AffyBatch.methods <- c(normalize.AffyBatch.methods,

> "vsn")

>>>

>>>> eset = expresso(Data, bg.correct= FALSE, normalize.method = "vsn",

>>> pmcorrect.method = "pmonly", summary.method = "medianpolish")

>>>

>>>

>>>

>>>

>>

>
*********************************************************************** 

> *

>>> ************************************

>>>

>>> Thank you very much.

>>>

>>> Roger

>>>

>>>

>>>

>>>

>>>

>>> Roger L. Vallejo, Ph.D.

>>>

>>> Assist. Professor of Genomics & Bioinformatics

>>>

>>> Genomics & Bioinformatics Laboratory

>>>

>>> Department of Dairy & Animal Science

>>>

>>> The Pennsylvania State University

>>>

>>> 305 Henning Building

>>>

>>> University Park, PA 16802

>>>

>>> Phone:       (814) 865-1846

>>>

>>> Email:        rvallejo at psu.edu

>>>

>>>

>>>

>>>

>>>   [[alternative HTML version deleted]]

>>>

>>> _______________________________________________

>>> Bioconductor mailing list

>>> Bioconductor at stat.math.ethz.ch

>>> https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor

>>>

>>

>>

>

 


	[[alternative HTML version deleted]]



More information about the Bioconductor mailing list