[BioC] HGU133Plus2 CDF vs hgu133plus2hsentrezgcdf CDF (30% difference in results)

Marcin Cieślik marcin.cieslik at gmail.com
Tue Sep 16 00:30:25 CEST 2014


Hi All,

>  you could re-align the probe sequences against the current genome and
see how many are still measuring the intended target.

I have done this recently for the human agilent whole-genome array. I
strongly suggest a spliced aligner (STAR worked very well). Bottom line the
number of probes mapped to a gene with an ENSEMB id or approved symbol
actually increased.


Yours,
Marcin



On Mon, Sep 15, 2014 at 10:58 AM, James W. MacDonald <jmacdon at uw.edu> wrote:

> Another thing to consider is that the probesets for that array are based on
> UniGene build 133, which was current somewhere around 10 years ago (if not
> longer). That is a long time ago, considering the speed with which the
> human genome has been updated, so there may be many probesets on that array
> that no longer measure anything recognizable.
>
> If you care to find out how bad (or good) the conventional Affymetrix
> probeset definitions are, you could re-align the probe sequences against
> the current genome and see how many are still measuring the intended
> target. Or you could assume that the updated alignments from MBNI are
> better, and just go with that (certainly easier, but you know what they say
> about assumptions...).
>
> Personally, I would go with option A, which would have two benefits. One,
> you would get to have some fun learning how to do something different. And
> really, who doesn't like that? Two, it would give you a rock-solid
> rationale for your choice of CDF, which should be impressive to your
> advisor because you a) thought about the problem and then b) did something
> to actively quantify the differences, so you can make an informed choice.
>
> Best,
>
> Jim
>
>
>
> On Sun, Sep 14, 2014 at 9:51 AM, Steve Lianoglou <lianoglou.steve at gene.com
> >
> wrote:
>
> > Hi,
> >
> > On Sat, Sep 13, 2014 at 11:31 AM, Mahes Muniandy [guest]
> > <guest at bioconductor.org> wrote:
> > > Hello,
> > > My name is Mahes Muniandy and I am a doctoral student. I have been
> > analysing Affymetrix HGU133Plus2 cel files to determine differential
> > expressions in twin pairs (within pair differences). I have used affy,
> > gcrma, nsfilter and limma to do my analysis. I have run my analysis using
> > the HGU133plus2 CDF available in biocondutor and then tried the whole
> > analysis again using the HGU133plus2 cdf from Brainarray. The limma
> results
> > differ significantly (2351 differentially expressed genes for the former
> > and 2700  genes for the latter analysis). 630 genes (about 30%) from the
> > 2351 genes do not exist in the list of 2700 genes.
> > >
> > > I have read "Evolving Gene/Transcript Definitions Significantly Alter
> > the Interpretation of GeneChip Data  M. Dai  et al." and see some
> > convincing arguments there. But, I am confused with which limma results
> to
> > go with. Could you advise me on the guiding principles that I should
> follow
> > in order to decide which cdf to use. I do realise that the onus is on me
> to
> > decide but sadly, I am quite lost in this matter. I would appreciate any
> > help available.
> >
> > I'd start by investigating whether or not the genes included in one
> > analysis and not the other seem reasonable for your experiment (ie. do
> > some GO analysis on the differences and see if they are relevant to
> > the data/treatment you are studying).
> >
> > Another thing to check is to plot the t-statistics against each other
> > from each analysis. Is the result you are finding a result of genes
> > dancing around thresholds of significance? If you define significance
> > by a certain FDR *and* a minimum absolute log-fold-change, it might be
> > that you have better concordance -- when this too isn't perfect
> > concordance, I'd go back and start looking at the differing genes and
> > try to interpret the differences to see which makes more sense than
> > the other.
> >
> > HTH,
> > -steve
> >
> > --
> > Steve Lianoglou
> > Computational Biologist
> > Genentech
> >
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor at r-project.org
> > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > Search the archives:
> > http://news.gmane.org/gmane.science.biology.informatics.conductor
> >
>
>
>
> --
> James W. MacDonald, M.S.
> Biostatistician
> University of Washington
> Environmental and Occupational Health Sciences
> 4225 Roosevelt Way NE, # 100
> Seattle WA 98105-6099
>
>         [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>

	[[alternative HTML version deleted]]



More information about the Bioconductor mailing list