[BioC] CGH microarrays significance test

Wed Mar 21 17:06:51 CET 2007

Dear Joao,

On Wednesday 21 March 2007 16:33, João Fadista wrote:
> Dear list,
>
> I have a CGH microarray experiment where I compare male vs. female in each
> sample (3 technical replicates with dye swaps = 6 samples). So in theory I
> would expect to see a difference in log2ratios of the X chromosome compared
> to the autosomes. This experiment is made mainly to assess/optimize the
> reliability of the protocol and the in-house microarray platform for CGH
> microarrays experiments.
>
> I already used packages in Bioconductor that deal with CGH microarrays but
> I would also like to have a statistical test to see if there is a
> significance difference between the mean values of log2ratios from the X
> chromosome compared to the autosomes. I already did a two-sample T-test and
> a Wilcox.test where the log2ratios for autosomal clones represent the first
> sample and log2ratios for clones from chromosome X represent the second
> sample.
>

I get confused here. It is not clear to me whether you want to compare between 
males and females (as you say in the first paragraph) or between autosomal 
and the X. I think the later, so here are some thoughts:

1. First, you have something like a paired design: for each subject you 
measure both autosomal and the X. Since these are all arrayed in the same 
glass, etc, you definitely want to account for this. More or less like the 
logic behind a paired t test.

2. You do not only have one value for the X and one value for the autosomal, 
but actually a collection of each. And the autosomals come in 22 packages.

3. My first thought would be to use a mixed effects models (with package nlme) 
including terms for subject and, possibly, chromosome (within the 
autosomals); the chromosome random effect might be crossed with subject or 
nested within subject. I'd be inclined to nest it within subject.

4. By using the mixed-effects model you can also include your technical 
replicates as technical replicates by adding a term for biological sample. 

5. A simpler, direct, approach, would be to just take the average of all 
autosomals and all the X within subject, average this over tech. replicates, 
and do a paired t-test. But I would not recommend it.

6. With nlme and mixed effects models in general there are a battery of 
diagnostics; in addition, you have very large sample sizes relative to the 
number of (random and fixed) effects you are modeling.

7. You can also use heteroscedastic models with mixed effects to account for 
the differences in variances between samples, thus performing the weighting 
you refer to.

8. (You have gene information; technically, you might want to incorporate a 
crossed gene effect. But you will then probably have difficulties fiting the 
model, and you'll end up with a huge number of terms).

These are some half-cooked ideas. I do not think the above will be a simple, 
10 minute, walk in the woods, but I think it might be a worthwile modelling 
exercise.

A different approach: since you have used some of the CGH packages, you 
probably have estimates of regions of gains and loss. Thus, a different type 
of analysis would be not to use the log2ratios, but use instead the 
inferences about gains and losses, by arguing that the later are actually 
denoised versions of the former (and, thus, "better things to" base your 
downstream inferences upon).

Best,

R.

> 1 - Should have done another more robust test? Is there any other kind of
> statistical tests that I can perform to assess the reliability of my
> experiment (assuming that the pre-processing and normalization is already
> optimized)?
>
> 2 - Is it statistical acceptable to average my technical replicates (the
> average is a weighted average where the arrays with "more quality" have a
> higher weight) in order to reduce the variance?
>
>
> Med venlig hilsen / Regards
>
> João Fadista
> Ph.d. studerende / Ph.d. student
>
>
>
>  	 AARHUS UNIVERSITET / UNIVERSITY OF AARHUS
> Det Jordbrugsvidenskabelige Fakultet / Faculty of Agricultural Sciences
> Forskningscenter Foulum / Research Centre Foulum
> Genetik og Bioteknologi / Dept. of Genetics and Biotechnology
> Blichers Allé 20, P.O. BOX 50
> DK-8830 Tjele
>
> Tel:	 +45 8999 1900
> Direct:	 +45 8999 1900
> Mobile:	 +45
> E-mail:	 Joao.Fadista at agrsci.dk <mailto:Joao.Fadista at agrsci.dk>
> Web:	 www.agrsci.dk <http://www.agrsci.dk/>
> ________________________________
>
> Tilmeld dig DJF's nyhedsbrev / Subscribe Faculty of Agricultural Sciences
> Newsletter <http://www.agrsci.dk/user/register?lan=dan-DK> .
>
> Denne email kan indeholde fortrolig information. Enhver brug eller
> offentliggørelse af denne email uden skriftlig tilladelse fra DJF er ikke
> tilladt. Hvis De ikke er den tiltænkte adressat, bedes De venligst straks
> underrette DJF samt slette emailen.
>
> This email may contain information that is confidential. Any use or
> publication of this email without written permission from Faculty of
> Agricultural Sciences is not allowed. If you are not the intended
> recipient, please notify Faculty of Agricultural Sciences immediately and
> delete this email.
>
>
>
> 	[[alternative HTML version deleted]]

-- 
Ramón Díaz-Uriarte
Statistical Computing Team
Centro Nacional de Investigaciones Oncológicas (CNIO)
(Spanish National Cancer Center)
Melchor Fernández Almagro, 3
28029 Madrid (Spain)
Fax: +-34-91-224-6972
Phone: +-34-91-224-6900

http://ligarto.org/rdiaz
PGP KeyID: 0xE89B3462
(http://ligarto.org/rdiaz/0xE89B3462.asc)

**NOTA DE CONFIDENCIALIDAD** Este correo electrónico, y en s...{{dropped}}