[BioC] question regarding differential expression

Ina Hoeschele inah at vbi.vt.edu
Mon Sep 27 23:27:04 CEST 2010


Hi,
  in a couple of months we will begin data collection on a major epigenomics project using the Illumina Infinium 450K platform. In the past I have used beadarray and methylumi to import the methylation data into R and perform some basic QC and analyses. My main concern now is the switch from the 27K to the 450K chip. When will the new chip be fully supported by revised packages (roughly)?
Thanks, Ina

----- Original Message -----
From: "Sean Davis" <sdavis2 at mail.nih.gov>
To: "Jack Luo" <jluo.rhelp at gmail.com>
Cc: "James W. MacDonald" <jmacdon at med.umich.edu>, bioconductor at stat.math.ethz.ch
Sent: Monday, September 27, 2010 3:06:56 PM
Subject: Re: [BioC] question regarding differential expression

On Mon, Sep 27, 2010 at 2:47 PM, Jack Luo <jluo.rhelp at gmail.com> wrote:

> Jim,
>
> Thanks for your detailed explanation on this, it's really helpful. I agree
> with you that the term "present/absent" might be problematic, perhaps a
> more
> accurate term is reliable/unreliable. I am not sure I agree that the
> technical and biological variability are completely confounded, it's well
> randomized experiment with disease/healthy status, not something like all
> disease in one day/batch/..., all healthy in another day/batch/ .... The
> last two paragraphs of your email answered my question very accurately
> (that's exactly what I am asking). Sorry to bother you with another
> question: do you think the difference is technical or biological? In our
> data, we have the same set of samples (say, 100 healthy vs. 100 disease)
> run
> using two different batches (batch difference could be due to lots of
> things
> like reagent, hybwash...), comparing the differential expression from one
> batch to another, I found many genes that are differentially expressed in
> the 1st batch that are like gene B: higher present% call in one group than
> the other group. However, in the 2nd batch, I found lots of them lose the
> present% difference between the two groups and also goes from
> differentially
> expressed to non-differentially expressed (I found this for both RMA and
> MAS5), which makes me wonder the differential expression in the 1st batch
> is
> due to technical reasons, not biological reasons (since the biology of the
> two batches are identical because they are from the same 200 samples).
>
>
Hi, Jun.

It is not unexpected for a differentially-expressed gene to show up having a
different number of P/A calls in one group than the other.  In the ideal
case, a gene is highly expressed in one group and not expressed at all in
the other.  Jim made the point that P/A calls only roughly measure actual
presence or absence, so take them with a grain of salt.  Also, just because
a gene has such a difference in P/A calls between groups does imply that a
gene is differentially expressed or that it is not.  The hypothesis test
that uses the measured signal is what is usually considered when looking at
differential expression.

Sean


> On Fri, Sep 24, 2010 at 9:40 AM, James W. MacDonald
> <jmacdon at med.umich.edu>wrote:
>
> > Hi Jack,
> >
> >
> > On 9/23/2010 4:45 PM, Jack Luo wrote:
> >
> >> Hi,
> >>
> >> This is a conceptual question related to microarray, instead of the
> usage
> >> of
> >> any Bioconductor package. I apologize if this bothers anyone.
> >>
> >> I am struggling to understand the concept of differential expression in
> >> terms of its resources (whether it is technical or biological). Suppose
> I
> >> have an experiment with two groups (healthy vs. disease) and try to find
> >> some differentially expressed genes, take two genes for example, both of
> >> them are differentially expressed (DE) between healthy and disease.
> >>
> >> Gene A has present detection call for all the samples under study (but
> the
> >> detection call p-value in the healthy group is in the order of 1e-2 ~
> >> 1e-3,
> >> the detection call p-value in the disease group is much more significant
> >> (say, 1e-10)).
> >> Gene B has 50% present call in healthy while 100% present call in
> cancer.
> >>
> >
> > First let's backtrack and talk about P/M/A calls, and what they mean. The
> > statistics underlying these calls are testing whether or not the PM
> probes
> > in aggregate appear to be different than the corresponding MM probes in a
> > given probeset. Others will disagree, but I think it is incorrect to
> assume
> > that an absent call means that the transcript being measured is absent.
> What
> > it really means is that we cannot say that the PM probes are binding more
> > transcript than the MM probes.
> >
> > If you make the assumption that the MM probes do a good job of measuring
> > background, then the absent call really means it is absent. However, a
> large
> > percentage of MM probes have higher fluorescence readings than the
> > corresponding PM probe (it varies by chip, but is usually > 30%. You can
> > check with your data to verify). In addition, the MM probe intensity will
> > increase with increasing amounts of transcript. These are two of the
> reasons
> > that Affy has abandoned the use of MM probes (more real estate on the
> chip
> > being a third), and why very few people use MAS5 for computing expression
> > values any more.
> >
> > So I would personally caution you against interpreting these p-values as
> > indicating presence or absence of the transcript.
> >
> > As to your question, technical and biological variability are completely
> > confounded here, so you have to set up your experiments in such a way
> that
> > the contribution from technical variability is minimized. For instance,
> if
> > you do all controls one day and diseased the next, you cannot possibly
> tell
> > if any differences were due to biology or to technical differences.
> However,
> > if you randomize sample types over days processed, then the technical
> > variability (which still exists, and is confounded with biological
> > variability), will tend to appear as noise, and be captured by the
> residual
> > term.
> >
> > Also, in my opinion there isn't any difference between the two situations
> > (assuming I understand situation B correctly). What I think you are
> asking
> > is this; are there any substantive differences between a situation where
> a
> > gene is apparently unexpressed in sample A but expressed to a certain
> degree
> > in sample B and a situation where a gene is expressed in both samples,
> but
> > at a two fold (or greater) level in B vs A.
> >
> > In my opinion, there is no difference between those scenarios. In each
> > situation, the gene is expressed at a much lower level in one sample
> versus
> > the other. The relative levels are unimportant, as the absolute accuracy
> of
> > our measuring device is not good.
> >
> > Best,
> >
> > Jim
> >
> >
> >
> >> My question is what's the correct interpretation in terms of whether the
> >> differential expression is due to technical or biological? Are they both
> >> DE
> >> due to technical, or A is DE due to biological and B is due to
> technical,
> >> or
> >> they are both DE due to biological?
> >>
> >> Thanks a bunch,
> >>
> >> -Jack
> >>
> >>        [[alternative HTML version deleted]]
> >>
> >> _______________________________________________
> >> Bioconductor mailing list
> >> Bioconductor at stat.math.ethz.ch
> >> https://stat.ethz.ch/mailman/listinfo/bioconductor
> >> Search the archives:
> >> http://news.gmane.org/gmane.science.biology.informatics.conductor
> >>
> >
> > --
> > James W. MacDonald, M.S.
> > Biostatistician
> > Douglas Lab
> > University of Michigan
> > Department of Human Genetics
> > 5912 Buhl
> > 1241 E. Catherine St.
> > Ann Arbor MI 48109-5618
> > 734-615-7826
> > **********************************************************
> > Electronic Mail is not secure, may not be read every day, and should not
> be
> > used for urgent or sensitive issues
> >
>
>        [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>

	[[alternative HTML version deleted]]

_______________________________________________
Bioconductor mailing list
Bioconductor at stat.math.ethz.ch
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list