[BioC] Agilent CGH data

Tue Sep 25 18:50:10 CEST 2007

Sean Davis wrote:
> jhs1jjm at leeds.ac.uk wrote:
>> R 2.5.0 on openSUSE 10.2 x86_64.
>>
>> Hi,
>>
>> I'm using the arrayQuality package to analyse 3 44k Agilent CGH arrays with the
>> aim of identifying regions of gain/loss.
>>
>> With the HTML report generated using the agQuality function i'm not getting the
>> coloured loess curve on the MA plot for raw M. Additionally i'm only getting 1
>> value for the dot plot of controls normalized M values (-)3xLv1 (n=330) and
>> likewise for the control A values. Alternatively when I run the maQualityPlots
>> function on my mraw object created in marray  I get these but don't get the
>> comparative box plot.
>>
>> Firstly is this important as I'm unsure of how useful the comparative boxplots
>> are as some values are NA? Secondly is this an appropriate tool to use and are
>> there any others that may be of more use both for quality control and for
>> analysis further down the line? Thankyou kindly for any input.
> 
> Hi, John.  Are these CGH arrays or expression arrays?  The two probably
> need some different treatment.  You imply you are using CGH arrays in
> looking for regions of gain/loss.  Is this the case?

And, then, of course, there is the subject, "Agilent CGH data"--SORRY!

In this case, you do not want to rely on loess or other non-linear
normalization methods.  Also, the MA plots for the best arrays DO show a
positive slope--this is totally expected and sought after.  In other
words, with higher M-values, we expect higher A-values.

We have found that a pretty good measure of quality of CGH arrays is the
 dlrs:

dlrs <-
  function(x) {
    nx <- length(x)
    if (nx<3) {
      stop("Vector length>2 needed for computation")
    }
    tmp <- embed(x,2)
    diffs <- tmp[,2]-tmp[,1]
    dlrs <- IQR(diffs)/(sqrt(2)*1.34)
    return(dlrs)
  }

Run this on the Log ratios (ordered by chromosome and position).  Good
values are less than 0.2 or so, but even some slightly higher can be used.

As for analysis, you may want to look into the snapCGH package, as it
allows multiple analyses to be run with the same data structures.

Sean