[BioC] quality assessment and preprocessing for tiling array-based CGH data

Wed Oct 22 19:29:58 CEST 2008

On Wed, Oct 22, 2008 at 1:14 PM, Leon Yee <yee.leon at gmail.com> wrote:
> Sean Davis wrote:
>>>
>>> Hi, Sean
>>>
>>>  Thanks for your advice. However, I have still several questions:
>>>
>>>  1. The input of dlrs is the log ratios, the log ration extracted from
>>> the
>>> text file produced by Feature Extraction? or calculated from RGlist -->
>>> MAlist ?  I have searched the mailist and seen a post of you mentioned
>>> the
>>> difference of log ration from Feature Extraction and the default M value
>>> from read.maimages.
>>
>> You can read the Agilent FE manual for more details, but you can
>> probably use either and come to very similar conclusions.  If you use
>> the MAlist version, make sure to use only median centering or none for
>> normalization.
>>
>>>  2. I can get the log ratios of all features including control type of -1
>>> and 1, but these features don't have chromosome positions, does this mean
>>> I
>>> don't need all of them for quality assessment?
>>
>> We have not routinely used these probes, no.  If an array fails
>> miserably, then these control probes might be useful for determining
>> the reason for the failure, though.
>>
>>>  3. Some probes with the name of "chr2_random:xxxxx-yyyyyy" will not get
>>> a
>>> proper mapping on the chromosome, so I should remove these values from
>>> the
>>> input of dlrs. Is it so?
>>
>> You can either remove them or treat chr2_random as a separate chromosome.
>>
>>>  4. How could I handle those 1000 probes repeating 3 times?  They will be
>>> mapped on the same chromosome position by three per group.
>>
>> You could choose one at random or use a mean or median of them.  My
>> guess is that they agree very closely with one another so the choice
>> should not affect the results much.
>
> Hi, Sean
>
>    Thank you very much for your detailed reply and help.
>
>    Where can I get the references or official documentations about dlrs
> method?

It is a standard robust estimator of the variance and is not specific
to microarrays.  If you look at the code, it simply subtracts the
difference between adjacent probes and then normalizes the result.  If
the array is "noisy", the dlrs will be high.  This assumes that the
contribution due to large copy number changes is negligible which is
likely true since even the most abnormal cancer samples have fewer
than 1000 breaks.

>    In addition, we have design our array with dye-swap [test-cy3 vs ref-cy5,
> and test-cy5 vs ref-cy3]. Is there any method for utilizing the information
> here for quality assessment?

Not that I know of, but you could certainly look at correlations
between replicates, etc.  Our experience with Agilent CGH arrays is
that the contribution due to dye bias is small compared to changes due
to copy number.

Sean

Sean