[BioC] filtered Exon Arrays: Core vs Extended Dataset
mrobinson at wehi.EDU.AU
Thu May 7 02:39:11 CEST 2009
I can offer my view for what you are seeing.
So, the thing is, some of the 120,000 transcript clusters in the
extended set are represented in the core set, but just with more
probesets included in them. You might say the extended set is a super
set of the core set ... I'm assuming when you say extended, you really
mean core+extended. Because the extended set includes probesets based
on lower confidence annotation (e.g. EST only evidence), these extra
probes will be measuring background at a higher rate.
So, would a diff. expressed (DE) core transcript be DE in the extended
set? Some of the time. But, a lot of the time the extra probes that
make up the probeset will measure non-existent ESTs (i.e. background)
and dilute the ability to detect DE.
Of course, I could be wrong. You might verify this for yourself by
looking at the probe-level data for a transcript that is very DE in
the core set and not DE in the extended data ...
On 07/05/2009, at 6:55 AM, Lana Schaffer wrote:
> I have used Limma with both the core (~17,000) and extended (~120,000)
> Affymetrix datasets. Do you think that significant transcripts in
> core dataset would also be found to be significant in the extended
> I have found that ~88% of the significant expressed transcripts from
> core dataset are not found in the significant expressed transcripts
> the extended dataset.
> Furthermore, 86% (1352/1575) of those significant core transcripts are
> found in the
> filtered extended dataset (input to Limma), but are not found to be
> significant in the filtered extended dataset.
> Core Extended
> Limma:adj.pvalue=0.05 1575 1142
> overlap extended filtered dataset 1352 (86%)
> datasets 17,939 112,213
> filtered datasets 17,939 61,717
> Filtering was performed by standard deviation according to the
> following code.
> rs = rowSds(GL.un)
> lambda = 0.45
> filtered = GL.un[ rs > quantile(rs, lambda, na.rm=T), ]
> What are your suggestions for this discrepancy?
> Lana Schaffer
> The Scripps Research Institute
> DNA Array Core Facility
> La Jolla, CA 92037
> (858) 784-2263
> (858) 784-2994
> schaffer at scripps.edu
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
Epigenetics Laboratory, Garvan
Bioinformatics Division, WEHI
e: m.robinson at garvan.org.au
e: mrobinson at wehi.edu.au
p: +61 (0)3 9345 2628
f: +61 (0)3 9347 0852
More information about the Bioconductor