[BioC] Call for comments on analyzing aCGH data with huge number of probes on a single chromosome

Sean Davis sdavis2 at mail.nih.gov
Fri Apr 4 18:35:31 CEST 2008


On Fri, Apr 4, 2008 at 12:09 PM, pingzhao Hu <phu at sickkids.ca> wrote:
>
>  Sean,
>  Thanks!
>  The gold is to identify copy number variation from normal human samples.
>  I have tried CBS, cghFLasso
>  (http://biostatistics.oxfordjournals.org/cgi/content/abstract/kxm013v1)
>  our own method
>  (http://biostatistics.oxfordjournals.org/cgi/content/abstract/kxl035v1),
>  etc methods.

You probably have a few options.  First, you could try "smoothing" the
data by using a moving window average or some such thing to reduce
noise and reduce the number of probes.  I think Nimblegen does this
for data that they give back to customers when they do CGH for
service.  With the reduced-dimensionality data, you could then apply
your method of choice.  Obviously, you loose resolution doing this.
Another alternative is an algorithm called "stepgram" developed by
Doron Lipson.  It is used in the CGHAnalytics commercial package
available from Agilent (where it is called ADM-1).  It is also
available as a windows executable from here:

http://bioinfo.cs.technion.ac.il/stepgram/

I have an R package that uses that algorithm that, unfortunately, I am
not allowed to distribute.  That said, it is by far the fastest
algorithm that I have tested for CGH analysis.  For comparison, for
200k probes, Stepgram runs in 4 seconds, aCGH in about 50 seconds,
DNAcopy (CBS) and GLAD in about 400 seconds.

Hope that helps,

Sean


>  Pingzhao
>
>
>  At 11:45 AM 4/4/2008, Sean Davis wrote:
>  >On Fri, Apr 4, 2008 at 11:38 AM, pingzhao Hu <phu at sickkids.ca> wrote:
>  > >
>  > >  Hi All,
>  > >  I have a question about analyzing aCGH data with huge number of
>  > >  probes on a single chromosome.
>  > >  We have a set of customized NimbleGen aCGH human sample data. Each sample
>  > >  has 40 million probes. Even a single chromosome has >3M probes.
>  > >
>  > >  I tried some R-based and Matlab-based aCGH analysis software to
>  > >  analyze just a single chromosome in
>  > >  a single sample using our supercomputer, but no hopes! Some software
>  > >  just show error messages (works fine for small
>  > >  data sets) and some software can not complete the analysis even after
>  > >  1-2 days CPU time.
>  > >
>  > >  I am wondering whether any people in the list have experience in
>  > >  analyzing the aCGH data with such a scale.
>  > >  If you have, can you share some your experience with me?
>  > >
>  > >  Will it be a good idea to first divide the chromosome into some small
>  > >  pieces (say each pieice has 10,000 probes) and then run the algorithm
>  > >  on each piece of the chromosome?
>  >
>  >What are the goals of the analysis?  What types of samples (cancer,
>  >comparative genomics, normal DNA)?  And what methods have you tried?
>  >
>  >Sean
>
>
>
>  ========================================
>  Pingzhao Hu
>  Statistical Analysis Facility
>  The Centre for Applied Genomics (TCAG)
>  The Hospital for Sick Children Research Institute
>  MaRS Centre - East Tower
>  101 College Street, Room 15-705
>  Toronto, Ontario, M5G 1L7, Canada
>  Tel.: (416) 813-7654 x6016
>  Email: phu at sickkids.ca
>  Web: http://www.tcag.ca/statisticalAnalysis.html
>
>  _______________________________________________
>  Bioconductor mailing list
>  Bioconductor at stat.math.ethz.ch
>  https://stat.ethz.ch/mailman/listinfo/bioconductor
>  Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>



More information about the Bioconductor mailing list