[BioC] Combining replicate spots in CGH data

Thu Dec 7 13:55:26 CET 2006

Dear Ramon,

Thanks for the insights about the replicate spots.

About the RJaCGH package, I would like to know what are the main features of your heterogeneous HMM algorithm. I am asking this because I would like to compare it with the only other heterogeneous HMM algorithm that I know that was made for CGH analysis.

This algorithm is implemented in snapCGH package and it is called BioHMM. It incorporates the distance between clones into the model assigning a higher probability of state change to clones that are a larger distance apart on a chromosome.

Best regards

João Fadista
Ph.d. student

Danish Institute of Agricultural Sciences
Research Centre Foulum
Dept. of Genetics and Biotechnology
Blichers Allé 20, P.O. BOX 50
DK-8830 Tjele

Phone:   +45 8999 1900
Direct:  +45 8999 8999

E-mail:  Joao.Fadista at agrsci.dk
Web:	   http://www.agrsci.org				

This email may contain information that is confidential.
Any use or publication of this email without written permission from DIAS is not allowed.
If you are not the intended recipient, please notify DIAS immediately and delete this email.

-----Original Message-----
From: Ramon Diaz-Uriarte [mailto:rdiaz at cnio.es] 
Sent: Thursday, December 07, 2006 12:18 PM
To: bioconductor at stat.math.ethz.ch
Cc: João Fadista
Subject: Re: [BioC] Combining replicate spots in CGH data

On Wednesday 06 December 2006 17:12, João Fadista wrote:
> Dear all,
>
> I was wondering if there are other methods for combining replicate 
> spots other than the average or the median. I am asking this in 
> concern with CGH data analysis because I do not know how, and if, we 
> can take advantage of the genomic structure of the array CGH data for combining replicate spots.
>
> For the sake of the argument I put below two hypothetical examples:
> - Combining replicate spots in a different way depending on what 
> region of the chromosome or genome they are; - Or give more weight to 
> spots that we know that have more reliability.
>
> Something like this if you know what I mean.

Dear Joao,

This is nothing ellaborate; just a couple of thoughts.

1. I assume you mean true replicate spots. In other words, these are the exact same DNA piece, and they map to exactly the same locations in the chromosome.

2. Ideally, I'd like a method that can deal with replicate spots without even asking you to take the mean or the median. One problem I find with means or medians is that, if you do not have the exact same number of replicates for all locations, then you are estimating a value that has different variances over different locations.

I think (non-homogeneous) HMMs and related techniques are suited for dealing with arbitrary (and different) number of replicate spots: at location "t" you happen to have more than one observation, and you are fitting a model where those observed log ratios come from an emission function, blablabla. By not taking means/medians/whatever, you do not violate assumptions related to the variance of the emission functions. In other words, conditional on being on state "k" you are log ratios are, say, ~ N(mu, sigma). 

(I'll admit we have a "hidden agenda", with our RJaCGH package :-).

R.

>
>
> Best regards
>
> João Fadista
> Ph.d. student
>
>
>
>  	 Danish Institute of Agricultural Sciences Research Centre Foulum 
> Dept. of Genetics and Biotechnology Blichers Allé 20, P.O. BOX 50 
> DK-8830 Tjele
>
> Phone:	 +45 8999 1900
> Direct:	 +45 8999 8999
> E-mail:	 Joao.Fadista at agrsci.dk <mailto:Joao.Fadista at agrsci.dk>
> Web:	 www.agrsci.org <http://www.agrsci.org/>
> ________________________________
>
> News and news media <http://www.agrsci.org/navigation/nyheder_og_presse> .
>
> This email may contain information that is confidential. Any use or 
> publication of this email without written permission from DIAS is not 
> allowed. If you are not the intended recipient, please notify DIAS 
> immediately and delete this email.
>
>
> 	[[alternative HTML version deleted]]

--
Ramón Díaz-Uriarte
Bioinformatics
Centro Nacional de Investigaciones Oncológicas (CNIO) (Spanish National Cancer Center) Melchor Fernández Almagro, 3
28029 Madrid (Spain)
Fax: +-34-91-224-6972
Phone: +-34-91-224-6900

http://ligarto.org/rdiaz
PGP KeyID: 0xE89B3462
(http://ligarto.org/rdiaz/0xE89B3462.asc)

**NOTA DE CONFIDENCIALIDAD** Este correo electrónico, y en s...{{dropped}}