[BioC] External RNA controls on Rat Gene ST 2.0 chip lfc ~ 1 after xps rma??
Matthew.Thornton at med.usc.edu
Wed Jun 25 21:03:45 CEST 2014
Thank you for the suggestions! I will look at where the ERCC controls fall in the data. I am thinking to use a paired-down set of the ERCC controls in the 'linear' range and which are within my experimental data. I am planning to use the spike-in probes procedure in the vsn package. I will also try mas5 and try to iterate with the different processing procedures in xps. It is good to have an outside metric for assessing normalization. If I can get matching observed log-fold changes similar to my expected log-fold changes, it will give me a little more confidence in my data. When you process data with the ERCC controls, what normalization methods do you use?
matthew.thornton at med.usc.edu
From: Davis, Wade [davisjwa at health.missouri.edu]
Sent: Wednesday, June 25, 2014 8:26 AM
To: Thornton, Matthew; bioconductor at r-project.org
Subject: RE: [BioC] External RNA controls on Rat Gene ST 2.0 chip lfc ~ 1 after xps rma??
Here are some comments that may be helpful, but they don't directly address your question...
The "Subgroup B" ERCC spike-ins *should* have lfc=0. I like to look at that group first.
I also like to look at the raw data "within subject" across the (log) concentration range and see where the linearly breaks down and the concentrations become indistinguishable (i.e., asymptotic parts of sigmoid curve); I am suspicious of any differences among groups for a gene with expression levels falling in those areas.
You might also consider looking at a density plot for each sample with a rug plot showing the values of the ERCC controls. (Non-graphically, use ecdf() to see where they fall in each sample.) Are the upper tails dominated by ERCCs? If so, I would be concerned about using RMA because quantile normalization may be too strong in the presence of such (intentional) differences. For example, Mix 1 has a max concentration of 30,000 while Mix 2 only goes up to 15,000. Based on my understanding, if those controls are indeed the strongest signals in your samples, then by definition they would be equal after RMA. Indeed, Bolstad et al. (2003) mention this in their quantile normalization paper, which is one of the three papers that make up the RMA procedure:
"One possible problem with this method is that it forces
the values of quantiles to be equal. This would be
most problematic in the tails where it is possible that a
probe could have the same value across all the arrays.
However, in practice, since probeset expression measures
are typically computed using the value of multiple probes,
we have not found this to be a problem. "
Based on this, I would filter out ERCC controls that are in the non-linear range or dominant the tails; you want the ERCC used to be intermingled with "real" data to help avoid these problems.
Just some thoughts!
From: Thornton, Matthew [mailto:Matthew.Thornton at med.usc.edu]
Sent: Tuesday, June 24, 2014 2:39 PM
To: bioconductor at r-project.org
Subject: [BioC] External RNA controls on Rat Gene ST 2.0 chip lfc ~ 1 after xps rma??
I am processing Affymetrix gene chip Rat Gene 2.0 ST chips with bioconductor package xps using rma normalization. I have included the ExFold ERCC external RNA controls with 2 mixes of different concentrations. I am able to pull out intensities for the ERCC controls at different points along the processing scheme. If I pull the ERCC raw intensities, order them by increasing concentration, and transform both the concentration and intensity by log base 2, I see a nice sigmoid curve that I can fit with a cubic polynomial.
However, when I pull out the ERCC controls after summarization, when I reorder by concentration, and roughly calculate the log-fold change they are all close to 1?? My supposition is that I am overfitting the data with RMA and that I need to find a better normalization scheme. Does anyone have any ideas for different normalization and summarization methods that I should look at? Like iter-PLIER or FARMS or ? Any advice or comments are welcome.
matthew.thornton at med.usc.edu
More information about the Bioconductor