[BioC] How GAGE handles missing values?

Wed Feb 2 22:56:33 CET 2011

Hi Nhan,
GAGE actually handles missing values pretty well. Here is a brief description from my previous answer to a similar question.
Before gene set differential expression tests, we calculate differential expression statistics (fold change or signal-to-noise ratio etc) for each gene. While other methods do group-on-group comparison (compare the whole experiment sample group vs the whole control group) in this step, hence missing value in any sample(s) may affect the calculation of per gene differential expression statistics. GAGE compares one experimental sample to one control sample at a time. Hence any missing expression value will produce NA fold change in that particular pair-wise comparison only but does not affect fold changes in other pair-wise comparisons. Meanwhile, for any particular pair-wise comparison, the produced NA fold change will be omitted in the gene set test hence will usually make little difference as long as we have enough effective genes in a gene set. 
I have done an experiment on missing values before. With 50% of all genes in a microarray dataset being randomly removed (replaced by NA), GAGE results were largely unaffected. You may try it out with your own data. Hope that helps.
Weijun

--- On Wed, 2/2/11, Nhan Thi Ho <nho at epi.msu.edu> wrote:

> From: Nhan Thi Ho <nho at epi.msu.edu>
> Subject: How GAGE handles missing values?
> To: "Luo Weijun" <luo_weijun at yahoo.com>
> Date: Wednesday, February 2, 2011, 9:25 AM
> Dear Dr Lou,
> We are facing another issue when using GAGE. As some of our
> arrays have some artifacts, we treat the regions with
> artifacts as missing values (just removing the values in the
> regions with artifacts and treat them as NA. We also did
> some imputation for the regions with artifacts but at this
> moment, we decide just to treat them as NA). 
> If we analyze cases vs. control as groups, missing values
> may be less problematic. But it is not what we want to do
> because our data are in matched pairs with a huge variation
> in sample storage time among pairs. 
> However, if we analyze data as pairs, if only one of the
> two arrays in a pair have mising values, that pair get
> affected. For a subset of our samples, many pairs get
> affected that way. 
> I have been looking into the GAGE manual and I have not
> found how GAGE handles missing values. 
> Could you please help me our with this?
> Thank you very much. 
> Nhan 
> 
> Nhan Thi HO, MD
> PhD Student
> Dept of Epidemiology
> Michigan State University,
> B601 West Fee Hall,
> East Lansing, 48824 MI, USA
> Office Phone: 517- 363 8263 ext 111
> Hand Phone: 517- 599 8775
> Email: nho at epi.msu.edu