[BioC] segmentation aCGH data

Thu Oct 11 10:17:31 CEST 2007

On Wednesday 10 October 2007 17:04, jhs1jjm at leeds.ac.uk wrote:
> Hi Ramon,
>
> Ah, of course, I'd forgotten I'd performed that step. I'm still getting
> some segments with high means corresponding to single genes but this'll
> be because they are represented by more than 1 probe I guess. The
> DNAcopy document has a step in it to remove local trends in the data.
> I'm undoing splits that are not at least 3 SDs apart as set out in the
> document.
>

Ah, OK. I thought you were referring to other trends (I've heard people 
mention waves, and relations to CG content, etc ---the later, I think, 
commonly done in Affy).

> To summarize then,I might use DNA copy to identify regions but in order
> to look at single probe aberrations I'd want to use one of the other
> methods i.e HMM
>

We often analyze data with four or five different methods (our own HMM in 
RJaCGH, Olshen's CBS, HMM as in Marioni et al., Piccard's et al CGHseg, and 
Hsu et al. wavelet-based smoothing) because different approaches are 
sensitive to different features of the data (or can be misled by different 
features of the data). (Of course, we do think our approach is the best 
overall performer, but this way we can keep learning about relative strengths 
of different methods and/or detect bugs in the code).

Detecting single point aberrations might be trickier than, say, detecting a 
long alteration that involves tens of probes. But then, inability to detect 
single gene alterations can be very relevant in some studies (e.g., IIRC, 
Aguirre et al., in PNAS 2004, in their study of pancreatic adenocarcinoma, 
have some discussion not detecting the loss of the tumor supressor SMAD4). 

As for the need for validation, etc, if you have a gene covered by a bunch of 
probes and only a single probe is being called aberrant then I'd be more 
concerned; but you might be averaging over probes, or use platforms where 
some genes only have a probe, etc.  In general, many/most of the current aCGH 
studies are really exploratory studies (i.e., they are in the "copy number 
differences discovery" stage, not "copy number association studies" stage) 
with results that need to be validated further (other aCGH platforms, other 
molecular techniques); there are several papers in the July 2007 issue of 
Nature Genetics (volume 39) that go into these issues.

Best,

R.

> Thanks
> John
>
> Quoting Ramon Diaz-Uriarte <rdiaz at cnio.es> on Wed 10 Oct 2007 15:22:22
>
> BST:
> > Dear John,
> >
> > On Wednesday 10 October 2007 15:52, jhs1jjm at leeds.ac.uk wrote:
> > > Hi list,
> > >
> > > I've been looking at 3*44k and 2*244k agilent CGH arrays. To date
> >
> > I've
> >
> > > used limma to read in the processed signals (no background
> >
> > correction
> >
> > > or normalization as this has been done), then the DNAcopy package
> >
> > for
> >
> > > segmentation as well as the snapCGH package to employ other
> > > segmentation methods rather than use each segmentation package
> > > individually.
> > >
> > > Firstly using the DNAcopy segmentation I can see a significant
> >
> > pattern
> >
> > > across my 3*44k arrays which disappears when I perform the step to
> > > remove unnecessary change points due to trends in the data. As
> >
> > these
> >
> > How exactly are you removing "unnecesary change points due to trends
> > in the
> > data"?
> >
> > > are in the same locations across the 3 arrays then is it likely
> >
> > that
> >
> > > this is biologically significant rather than being a trend?
> >
> > Obviously
> >
> > > others do not have a definitive answer for this but I wondered if
> > > anyone had seen similar results in a different scenario.
> > >
> > > Additionally I'm wondering what segmentation methods people have
> >
> > tended
> >
> > > to employ. The heterogeneous nature of my data means that I need to
> > > identify  single probe as well as larger region aberrations and I'd
> > > read that the CBS algorithm is not particular suited to doing this?
> >
> > If you run the "smooth.CNA" function (in the DNAcopy package), as it
> > is
> > recommended in the documentation for DNAcopy (IIRC), then single
> > probe
> > aberrations are not detectable (you are smoothing them away).
> >
> > Single probe aberrations might be detected with the HMM model in
> > snapCGH or
> > our HMM model in RJaCGH, available from CRAN
> > (http://cran.r-project.org/src/contrib/Descriptions/RJaCGH.html).
> > (Details of
> > the method available from the paper:
>
> http://compbiol.plosjournals.org/perlserv/?request=get-document&doi=10.1371
>%2Fjournal.pcbi.0030122).
>
> > Best,
> >
> > R.
> >
> > > Apologies   if this is a bit vague.
> > >
> > > Thanks for any input,
> > >
> > > John
> > >
> > > _______________________________________________
> > > Bioconductor mailing list
> > > Bioconductor at stat.math.ethz.ch
> > > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > > Search the archives:
> > > http://news.gmane.org/gmane.science.biology.informatics.conductor
> >
> > --
> > Ramón Díaz-Uriarte
> > Statistical Computing Team
> > Centro Nacional de Investigaciones Oncológicas (CNIO)
> > (Spanish National Cancer Center)
> > Melchor Fernández Almagro, 3
> > 28029 Madrid (Spain)
> > Fax: +-34-91-224-6972
> > Phone: +-34-91-224-6900
> >
> > http://ligarto.org/rdiaz
> > PGP KeyID: 0xE89B3462
> > (http://ligarto.org/rdiaz/0xE89B3462.asc)
> >
> >
> >
> > **NOTA DE CONFIDENCIALIDAD** Este correo electrónico, y en su caso
> > los ficheros adjuntos, pueden contener información protegida para el
> > uso exclusivo de su destinatario. Se prohíbe la distribución,
> > reproducción o cualquier otro tipo de transmisión por parte de otra
> > persona que no sea el destinatario. Si usted recibe por error este
> > correo, se ruega comunicarlo al remitente y borrar el mensaje
> > recibido.
> > **CONFIDENTIALITY NOTICE** This email communication and any
> > attachments may contain confidential and privileged information for
> > the sole use of the designated recipient named above. Distribution,
> > reproduction or any other use of this transmission by any party other
> > than the intended recipient is prohibited. If you are not the
> > intended recipient please contact the sender and delete all copies.

-- 
Ramón Díaz-Uriarte
Statistical Computing Team
Centro Nacional de Investigaciones Oncológicas (CNIO)
(Spanish National Cancer Center)
Melchor Fernández Almagro, 3
28029 Madrid (Spain)
Fax: +-34-91-224-6972
Phone: +-34-91-224-6900

http://ligarto.org/rdiaz
PGP KeyID: 0xE89B3462
(http://ligarto.org/rdiaz/0xE89B3462.asc)

**NOTA DE CONFIDENCIALIDAD** Este correo electrónico, y ...{{dropped:3}}