[BioC] External RNA controls on Rat Gene ST 2.0 chip lfc ~ 1 after xps rma??

Wed Jun 25 17:26:26 CEST 2014

Matt,
Here are some comments that may be helpful, but they don't directly address your question... 

The "Subgroup B" ERCC spike-ins *should* have lfc=0. I like to look at that group first.  

I also like to look at the raw data "within subject" across the (log) concentration range and see where the linearly breaks down and the concentrations become indistinguishable (i.e., asymptotic parts of sigmoid curve); I am suspicious of any differences among groups for a gene with expression levels falling in those areas. 

You might also consider looking at a density plot for each sample with a rug plot showing the values of the ERCC controls. (Non-graphically, use ecdf() to see where they fall in  each sample.)  Are the upper tails dominated by ERCCs? If so, I would be concerned about using RMA because quantile normalization may be too strong in the presence of such (intentional) differences. For example, Mix 1 has a max concentration of 30,000 while Mix 2 only goes up to 15,000. Based on my understanding, if those controls are indeed the strongest signals in your samples, then by definition they would be equal after RMA. Indeed, Bolstad et al. (2003) mention this in their quantile normalization paper, which is one of the three papers that make up the RMA procedure:

"One possible problem with this method is that it forces
the values of quantiles to be equal. This would be
most problematic in the tails where it is possible that a
probe could have the same value across all the arrays.
However, in practice, since probeset expression measures
are typically computed using the value of multiple probes,
we have not found this to be a problem. "

Based on this, I would filter out ERCC controls that are in the non-linear range or dominant the tails; you want the ERCC used to be intermingled with "real" data to help avoid these problems.

Just some thoughts!

Wade

-----Original Message-----
From: Thornton, Matthew [mailto:Matthew.Thornton at med.usc.edu] 
Sent: Tuesday, June 24, 2014 2:39 PM
To: bioconductor at r-project.org
Subject: [BioC] External RNA controls on Rat Gene ST 2.0 chip lfc ~ 1 after xps rma??

Hello!

I am processing Affymetrix gene chip Rat Gene 2.0 ST chips with bioconductor package xps using rma normalization. I have included the ExFold ERCC external RNA controls with 2 mixes of different concentrations. I am able to pull out intensities for the ERCC controls at different points along the processing scheme. If I pull the ERCC raw intensities, order them by increasing concentration, and transform both the concentration and intensity by log base 2, I see a nice sigmoid curve that I can fit with a cubic polynomial.

However, when I pull out the ERCC controls after summarization, when I reorder by concentration, and roughly calculate the log-fold change they are all close to 1?? My supposition is that I am overfitting the data with RMA and that I need to find a better normalization scheme. Does anyone have any ideas for different normalization and summarization methods that I should look at? Like iter-PLIER or FARMS or ? Any advice or comments are welcome.

Thanks,

Matt

matthew.thornton at med.usc.edu