[BioC] segmentation aCGH data

Thu Oct 11 00:44:03 CEST 2007

Sean,

Thanks, that helps a lot. I've purposely stayed away from using the
Agilent software, one as I'm not on campus (this is where it is) and
secondly I wanted to do the analysis using R and bioconductor and any
other open source software I can get my hands on. I was also wondering
whether its the case that a lot of the packages and the algorithms they
use are found in bioconductor/R first and it may take time to implement
them on commercial platform i.e with a nice GUI?

I wonder also if you could help on another matter. At the moment I'm
exporting the DNAcopy segment output as csv file then opening it in
open office calc and correlating the map position with the agilent text
file to find the corresponding genes. This is fine for the 44k arrays
but I'm unable to see all the rows for the 244k text file in calc so
cannot correlate the map position with genes.

Regards

John

Quoting Sean Davis <sdavis2 at mail.nih.gov> on Wed 10 Oct 2007 17:15:52
BST:

> jhs1jjm at leeds.ac.uk wrote:
> > Hi Sean,
> >
> > As its 2 colour so I'm looking at relative amounts wouldn't that
> mean I
> > wouldn't see copy number variants, would they not be in both my
> > samples? I was also pondering the advantages of using R and
> > bioconductor, vs say Agilent's z score, for the purposes of my
> > discussion. Is the simple answer simply a flexible approach to
> these
> > matters? Also if possible could you expand a bit in regards to the
> > single probes argument.
>
> If using Agilent CGHAnalytics, you will probably want to use ADM-1,
> not
> z-score.  For the 44k arrays, a threshold of around 6 is probably
> appropriate.  For the 244k arrays, something closer to 10 or 11 is
> more
> appropriate.  ADM-1 is exquisitely sensitive to single probes that
> are
> extreme values.  These may represent real signal, or may be noise.
> There is no way to tell without validation, in my opinion.  However,
> If
> there are two or more probes behaving similarly, then you can be more
> assured of real biology.  The real biology could be directly
> disease-related or not.  The ones that are not are copy number
> variants
> (although there is now plenty of evidence that copy number variants
> can
> be disease-associated, as well).  When using high-resolution oligo
> arrays, you will need to become familiar with copy number
> polymorphism
> and databases for annotating them.  CGHAnalytics contains a catalog
> of
> those built-in.
>
> As for R/Bioc versus commercial packages, that will be dictated by
> the
> questions you want to ask.  We find that we routinely need and want
> to
> ask questions that are not easily answered by commercial packages.
> That
> said, a good visualization tool for CGH is HIGHLY useful, and there
> are
> now several available.
>
> Sean
>