[BioC] HGU133Plus2 CDF vs hgu133plus2hsentrezgcdf CDF (30% difference in results)

Steve Lianoglou lianoglou.steve at gene.com
Sun Sep 14 15:51:26 CEST 2014


On Sat, Sep 13, 2014 at 11:31 AM, Mahes Muniandy [guest]
<guest at bioconductor.org> wrote:
> Hello,
> My name is Mahes Muniandy and I am a doctoral student. I have been analysing Affymetrix HGU133Plus2 cel files to determine differential expressions in twin pairs (within pair differences). I have used affy, gcrma, nsfilter and limma to do my analysis. I have run my analysis using the HGU133plus2 CDF available in biocondutor and then tried the whole analysis again using the HGU133plus2 cdf from Brainarray. The limma results differ significantly (2351 differentially expressed genes for the former and 2700  genes for the latter analysis). 630 genes (about 30%) from the 2351 genes do not exist in the list of 2700 genes.
> I have read "Evolving Gene/Transcript Definitions Significantly Alter the Interpretation of GeneChip Data  M. Dai  et al." and see some convincing arguments there. But, I am confused with which limma results to go with. Could you advise me on the guiding principles that I should follow in order to decide which cdf to use. I do realise that the onus is on me to decide but sadly, I am quite lost in this matter. I would appreciate any help available.

I'd start by investigating whether or not the genes included in one
analysis and not the other seem reasonable for your experiment (ie. do
some GO analysis on the differences and see if they are relevant to
the data/treatment you are studying).

Another thing to check is to plot the t-statistics against each other
from each analysis. Is the result you are finding a result of genes
dancing around thresholds of significance? If you define significance
by a certain FDR *and* a minimum absolute log-fold-change, it might be
that you have better concordance -- when this too isn't perfect
concordance, I'd go back and start looking at the differing genes and
try to interpret the differences to see which makes more sense than
the other.


Steve Lianoglou
Computational Biologist

More information about the Bioconductor mailing list