[BioC] differences or 1-channel (was Re: Agilent Arrays)

Sat Jun 25 03:59:39 CEST 2005

So my understanding is that if there are no technical replicates one could 
do a single channel analysis in LIMMA using the duplicate correlation 
command to indicate the 2 samples on the same array.  This would be 
equivalent to having a random effect for array - hence allowing the 
simplicity of the single channel analysis with a statistically appropriate 
means of handling the within spot correlation.

--Naomi

At 09:48 PM 6/24/2005, Gordon K Smyth wrote:
>On Sat, June 25, 2005 12:36 am, Wolfgang Huber said:
> >> Basically, you're saying that if the arrays are very high quality, you can
> >> get away with an inefficient analysis.
> >
> > Gordon, I did not say that, it sounds stupid, please do not misquote
> > people.
>
>Actually I didn't quote you at all.  The word "Basically" in this context 
>is a signal that I am
>interpreting your comments and their consequences rather than quoting 
>you.  You can disagree with
>my interpretation or can argue that it is mistaken, as you do below, but 
>being misquoted is quite
>a different thing! :)
>
> >> Naomi is refering to what I call the "intraspot" correlation, see for
> >> example the intraspotCorrelation() function in the limma package, and it
> >> is critically important. The correlation isn't a bad thing, nor is it
> >> restricted to poor quality arrays. Rather it means that contrasts
> >> estimated within a spot are highly accurate.
> >
> > I agree that contrasts estimated from within one array are more
> > accurate than those from different arrays.
>
>And in order to combine these two types of contrasts efficiency in an 
>analysis, one needs to
>quantify the difference in accuracy.  Hence the need to estimate the 
>intraspot correlation.
>
> > Note that when I said
> > "treat a two-color array like two single-color arrays", this was in
> > the paragraph on how to normalize, not on differential expression. But
> > apparently this still triggered off a few people ...
>
>Part of the trouble is that you continued on in the next paragraph to 
>consider differential
>expression, and you seemed (to me at least) to be implying that the same 
>conclusions continue to
>apply with only one caveat.  Thanks for the clarification.
>
>As you know, I personally prefer to take advantage of the two-colour 
>technology even at the
>normalisation stage, but that's another matter.
>
> > Two aspects were raised by Claus' question that started this thread:
> > how to normalize these data, and how to identify differentially
> > expressed genes.  My experience is that multi-channel normalization
> > methods like vsn (or quantiles for that matter) work well for sets of
> > mass-produced two-color arrays. Then, it is still better to look at
> > contrasts within arrays. But it is at least possible (even if less
> > accurate / precise) to look at contrasts across arrays by directly
> > comparing the intensities, rather than always having to go through a
> > chain of log-ratios.
>
>Claus' asked what is specific to Agilent.  As I understand it, all your 
>comments here apply to any
>type of two-colour array.  Did you intend to say something specific about 
>Agilent arrays or am I
>still mis-understanding what you mean?
>
> >> Why not do it properly and get the full benefit of the high
> >> quality arrays? My experience is that high quality
> >> Agilent arrays can beat affy for accuracy if treated properly.
> >
> > Agreed. Do you think it's because of the two colors or of the longer
> > (and hence more specific) probes ?
>
>Well, Affy actually has more nucleotides per gene than Agilent when one 
>takes into account the
>multiple probes per probe set.  I don't want to speculate too much on the 
>reasons, but the fact
>that Agilent can reliably lay down 80mers rather than 25mers strongly 
>suggests that the deposition
>process is more accurate.  The two colours are certainly 
>important.  Calculations in our lab
>suggest that one typically loses around 70% of information in a two colour 
>experiment by going
>from direct to indirect comparisons, and 80-90% when going to single 
>channel comparisons across
>different arrays without taking the intraspot correlations into 
>account.  So Agilent may be well
>behind Affy if not treated optimally.
>
>Gordon
>
> > Best wishes
> >  Wolfgang
> >
> > <quote who="Gordon Smyth">
> >> Wolfgang,
> >>
> >> Naomi is refering to what I call the "intraspot" correlation, see for
> >> example the intraspotCorrelation() function in the limma package, and it
> >> is
> >> critically important. The correlation isn't a bad thing, nor is it
> >> restricted to poor quality arrays. Rather it means that contrasts
> >> estimated
> >> within a spot are highly accurate. It is what makes the two-colour
> >> technology intrinsically more accurate than one channel technology, other
> >> things being equal. See http://www.statsci.org/smyth/pubs/ISI2005-116.pdf
> >> for some discussion.
> >>
> >> Basically, you're saying that if the arrays are very high quality, you can
> >> get away with an inefficient analysis. Why not do it properly and get the
> >> full benefit of the high quality arrays? My experience is that high
> >> quality
> >> Agilent arrays can beat affy for accuracy if treated properly.
> >>
> >> Gordon
> >>
> >>>Date: Thu, 23 Jun 2005 15:29:38 +0100 (BST)
> >>>From: "Wolfgang Huber" <huber at ebi.ac.uk>
> >>>Subject: Re: [BioC] Agilent Arrays
> >>>To: "Naomi Altman" <naomi at stat.psu.edu>
> >>>Cc: bioconductor at stat.math.ethz.ch
> >>>
> >>>Hi Naomi,
> >>>
> >>>and why is that important? Also, what is the within gene correlation
> >>>between green foreground of array 1 and green foreground of array 2?
> >>>
> >>>Bw
> >>>  Wolfgang
> >>>
> >>><quote who="Naomi Altman">
> >>> > I am working with Agilent arrays on which we have spotted many
> >>> replicates
> >>> > of the control spots.
> >>> > The within gene correlation between red and green forground is about
> >>> 0.8
> >>> > for the unnormalized data - i.e. pretty high!
> >>> >
> >>> > --Naomi
> >>> >
> >>> > At 03:23 AM 6/23/2005, Wolfgang Huber wrote:
> >>> >>Hi Claus,
> >>> >>
> >>> >>for the normalization of arrays where the spotting etc. variability
> >>> >>between chips is not strong, you can treat the data from m two-colour
> >>> >>arrays as if it were 2*m single colour ones, and use methods like
> >>> >>"quantiles" or "vsn".
> >>> >>
> >>> >>Note that for almost all genes, the hybridization is not limited by
> >>> the
> >>> >>amount of probe DNA, hence the competition between red and gree target
> >>> is
> >>> >>negligible for almost all genes (execept possibly the most highly
> >>> >>expressed ones). This justifies treating a two-color array like two
> >>> >>single-color arrays.
> >>> >>
> >>> >>Only later when you consider the contrasts of interest for finding
> >>> >>differentially expressed genes, you want to make sure that these are
> >>> not
> >>> >>confounded with dye.
> >>> >>
> >>> >>PS, I think your question is very directly Bioconductor related!
> >>> >>
> >>> >>Best wishes
> >>> >>   Wolfgang
> >>> >>
> >>> >>
> >>> >><quote who="Claus Mayer">
> >>> >> > Dear all!
> >>> >> >
> >>> >> > Apologies for asking a question which is not directly Bioconductor
> >>> >> > related: After some experience with spotted 2-channel arrays and
> >>> >> > Affydata, I am currently analysing my first data set based on
> >>> Agilent
> >>> >> > arrays. I know that packages like marray or limma have facilities
> >>> to
> >>> >> > read these data and that they can be normalised and analysed like
> >>> any
> >>> >> > other 2-colour-arrays. On the other hand the printing technology of
> >>> >> > these arrays (using inkjet-printing of 60mer oligos) is closer in
> >>> >> spirit
> >>> >> > to Affy, if I understand this correctly. This seems to show in the
> >>> >> data
> >>> >> > as well. For example the strongest correlations I found in the
> >>> single
> >>> >> > channel (log-)intensities was not between the two channels observed
> >>> on
> >>> >> > the same slide (like with spotted arrays), but between the two
> >>> >> channels
> >>> >> > (differently dyed on different arrays in a loop design) that
> >>> contained
> >>> >> > the same sample (which is quite reassuring). This made me wonder
> >>> >> whether
> >>> >> > (once dye and array effects have been removed by some normalisation
> >>> >> > method) with Agilent arrays one might really use single channel
> >>> >> > intensities as measures of gene expression instead of reducing them
> >>> to
> >>> >> > the log-ratio only as is usually done for two-channel data.
> >>> >> >
> >>> >> > This would have consequences on the way these arrays should be
> >>> >> > normalised (rather by a multichip method than individually) and
> >>> also
> >>> >> > allow more flexibility in the design of experiments.
> >>> >> >
> >>> >> > As I said before this is my first Agilent data set, so I would be
> >>> >> > interested to hear opinions of others with more experience. Before
> >>> I
> >>> >> > start to re-invent the wheel here, I?d be also interested to know
> >>> >> > whether any of you is aware of tools, software, papers, etc?
> >>> dealing
> >>> >> > with the analysis of Agilent array data specifically (rather than
> >>> just
> >>> >> > applying standard methods for 2-coloured cDNA -arrays).
> >>> >> >
> >>> >> > Any help/comments appreciated
> >>> >> >
> >>> >> > Claus
> >>> >> >
> >>> >> > --
> >>> >> >
> >>> >>
> >>> 
> ***********************************************************************************
> >>> >> >  Claus-D. Mayer                       | http://www.bioss.ac.uk
> >>> >> >  Biomathematics & Statistics Scotland | email: claus at bioss.ac.uk
> >>> >> >  Rowett Research Institute            | Telephone: +44 (0) 1224
> >>> 716652
> >>> >> >  Aberdeen AB21 9SB, Scotland, UK.     | Fax: +44 (0) 1224 715349
> >>> >> >
> >>> >> > _______________________________________________
> >>> >> > Bioconductor mailing list
> >>> >> > Bioconductor at stat.math.ethz.ch
> >>> >> > https://stat.ethz.ch/mailman/listinfo/bioconductor
> >>> >> >
> >>> >> >
> >>> >>
> >>> >>
> >>> >>-------------------------------------
> >>> >>Wolfgang Huber
> >>> >>European Bioinformatics Institute
> >>> >>European Molecular Biology Laboratory
> >>> >>Cambridge CB10 1SD
> >>> >>England
> >>> >>Phone: +44 1223 494642
> >>> >>Http:  www.ebi.ac.uk/huber
> >>> >>
> >>> >>_______________________________________________
> >>> >>Bioconductor mailing list
> >>> >>Bioconductor at stat.math.ethz.ch
> >>> >>https://stat.ethz.ch/mailman/listinfo/bioconductor
> >>> >
> >>> > Naomi S. Altman                                814-865-3791 (voice)
> >>> > Associate Professor
> >>> > Bioinformatics Consulting Center
> >>> > Dept. of Statistics                              814-863-7114 (fax)
> >>> > Penn State University                         814-865-1348
> >>> (Statistics)
> >>> > University Park, PA 16802-2111

Naomi S. Altman                                814-865-3791 (voice)
Associate Professor
Bioinformatics Consulting Center
Dept. of Statistics                              814-863-7114 (fax)
Penn State University                         814-865-1348 (Statistics)
University Park, PA 16802-2111