[BioC] Using aCGH library on Affymetrix Cytogenetics 2.7M microarray data

Ryan Goosen [guest] guest at bioconductor.org
Mon Nov 19 11:33:09 CET 2012


Dear Bioconductor mailing list,

I am in the process of trying to use your R/Bioconductor "aCGH" library to process my copy number data. 

In particular, I have copy number data (log2ratios) generated from analysis of Affymetrix Cytogenetics 2.7M arrays (http://media.affymetrix.com/support/technical/datasheets/cytogenetics_research_solution.pdf), which have ~2 million copy number probes, and ~400,000 SNP probes for detecting LOH.

I have written a script in R to retrieve the ~2million copy number probe data in the form of log2ratios. These data are generated using apt-copynumber-cyto (part of Affymetrix Powertools) to produce .CYCHP.txt files. I have determined that the copy number log2ratios start from line 549 and continue for 2141465 rows in the aforementioned text files.

Original object after parsing:
> str(cnData)
'data.frame':	2141465 obs. of  23 variables:
 $ ProbeSetName: Factor w/ 2141465 levels "C-00IGZ","C-00IHI",..: 580623 580624 580625 580626 580627 580628 580629 1967674 580630 580631 ...
 $ Chromosome  : Factor w/ 24 levels "1","10","11",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ Position    : int  712577 713263 714145 714635 718604 750062 752757 754192 755354 760401 ...
 $ 10T         : num  -0.268 -0.324 0.486 -0.672 -0.191 ...
 $ 13T         : num  -1.032 -0.522 0.414 -0.552 -0.901 ...
 $ 14T         : num  -0.917 -0.698 0.723 -1.475 -0.771 ...
 $ 15T         : num  -0.541 -0.161 0.248 -0.529 -0.859 ...
 $ 16T         : num  -0.469 -0.43 0.129 -0.317 -1.051 ...
 $ 23T         : num  -0.0257 0.0107 0.2888 0.3228 0.1635 ...
 $ 33T         : num  0.071 0.959 0.422 -0.019 0.35 ...
 $ 34T         : num  -0.846 -0.471 0.48 -1.141 -0.466 ...
 $ 37T         : num  -1.014 -0.279 0.327 -0.796 -0.485 ...
 $ 3T          : num  -0.46 -0.221 0.117 0.423 -0.266 ...
 $ 41T         : num  -2.021 -0.7997 0.4713 -0.0937 -1.1054 ...
 $ 44T         : num  -0.7501 -0.2017 0.0135 -1.1092 -0.356 ...
 $ 4T          : num  0.00255 -0.05183 0.09327 -0.2049 -0.07572 ...
 $ 55T         : num  -0.2161 -0.5777 0.1861 -0.0936 -0.1689 ...
 $ 56T         : num  0.0622 0.1612 0.2907 0.3115 0.2649 ...
 $ 60T         : num  0.0222 0.0937 -0.1307 0.3206 -0.0847 ...
 $ 61T         : num  0.1707 -0.0255 0.2095 -0.0505 -0.1473 ...
 $ 63T         : num  0.00136 -0.01699 -0.15279 -0.2546 0.06513 ...
 $ 8T          : num  -0.146 -0.101 0.389 -0.465 -0.357 ...
 $ IT          : num  -0.2524 0.2645 0.7298 -0.563 -0.0915 ...


With regards to trying to use the aCGH library- I have attempted to subset my data in such a way to create a valid aCGH object through the create.aCGH() method which seems to have worked. 

The R statements I used were as follows:

aCGH.object = create.aCGH(log2.ratios = cnData[4:23], clones.info = cnData[0:3])

colnames(aCGH.object$clones.info)[1] = "Clone"
colnames(aCGH.object$clones.info)[2] = "Chrom"
colnames(aCGH.object$clones.info)[3] = "kb"

aCGH.object$clones.info$Chrom = as.integer(aCGH.object$clones.info$Chrom)


The resultant object is as follows (each column in the $log2.ratios data-frame is a unique sample):

> str(aCGH.object)
List of 4
 $ log2.ratios        :'data.frame':	2141465 obs. of  20 variables:
  ..$ 10T: num [1:2141465] -0.268 -0.324 0.486 -0.672 -0.191 ...
  ..$ 13T: num [1:2141465] -1.032 -0.522 0.414 -0.552 -0.901 ...
  ..$ 14T: num [1:2141465] -0.917 -0.698 0.723 -1.475 -0.771 ...
  ..$ 15T: num [1:2141465] -0.541 -0.161 0.248 -0.529 -0.859 ...
  ..$ 16T: num [1:2141465] -0.469 -0.43 0.129 -0.317 -1.051 ...
  ..$ 23T: num [1:2141465] -0.0257 0.0107 0.2888 0.3228 0.1635 ...
  ..$ 33T: num [1:2141465] 0.071 0.959 0.422 -0.019 0.35 ...
  ..$ 34T: num [1:2141465] -0.846 -0.471 0.48 -1.141 -0.466 ...
  ..$ 37T: num [1:2141465] -1.014 -0.279 0.327 -0.796 -0.485 ...
  ..$ 3T : num [1:2141465] -0.46 -0.221 0.117 0.423 -0.266 ...
  ..$ 41T: num [1:2141465] -2.021 -0.7997 0.4713 -0.0937 -1.1054 ...
  ..$ 44T: num [1:2141465] -0.7501 -0.2017 0.0135 -1.1092 -0.356 ...
  ..$ 4T : num [1:2141465] 0.00255 -0.05183 0.09327 -0.2049 -0.07572 ...
  ..$ 55T: num [1:2141465] -0.2161 -0.5777 0.1861 -0.0936 -0.1689 ...
  ..$ 56T: num [1:2141465] 0.0622 0.1612 0.2907 0.3115 0.2649 ...
  ..$ 60T: num [1:2141465] 0.0222 0.0937 -0.1307 0.3206 -0.0847 ...
  ..$ 61T: num [1:2141465] 0.1707 -0.0255 0.2095 -0.0505 -0.1473 ...
  ..$ 63T: num [1:2141465] 0.00136 -0.01699 -0.15279 -0.2546 0.06513 ...
  ..$ 8T : num [1:2141465] -0.146 -0.101 0.389 -0.465 -0.357 ...
  ..$ IT : num [1:2141465] -0.2524 0.2645 0.7298 -0.563 -0.0915 ...
 $ clones.info        :'data.frame':	2141465 obs. of  3 variables:
  ..$ Clone: Factor w/ 2141465 levels "C-00IGZ","C-00IHI",..: 580623 580624 580625 580626 580627 580628 580629 1967674 580630 580631 ...
  ..$ Chrom: int [1:2141465] 1 1 1 1 1 1 1 1 1 1 ...
  ..$ kb   : int [1:2141465] 712577 713263 714145 714635 718604 750062 752757 754192 755354 760401 ...
 $ phenotype          : NULL


I have tried to make this object resemble the structure, and data types, as reported in the aCGH vignette example data sets. The only column I see missing is the aCGH.object$clones.info$Target column. I am unsure of what the latter is meant to detail.

When I attempt to generate basic plots of my data, via: plot(aCGH.object), plotGenome(aCGH.object), or plotFreqStat(aCGH.object), then I get graphs that appear overly noisy, and in which the chromosomal markers appear not to be linked to the dataset correctly as they are all bunched towards the left-handside of the graphs. Copies of the graphs are here:

https://www.dropbox.com/s/9z3n70arvxir2iu/aCGH.cn.plot.png
https://www.dropbox.com/s/uzoont4mvratqsz/aCGH.cn.plotFreqStats.png
https://www.dropbox.com/s/12fn3lwnfosaqpc/aCGH.cn.plotGenome.png


As such, my question essentially is: Have I created the aCGH object correctly or am I missing something?


Many thanks for your time and assistance.

Yours sincerely,
Ryan




 -- output of sessionInfo(): 

> sessionInfo()
R version 2.15.2 (2012-10-26)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=C                 LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

--
Sent via the guest posting facility at bioconductor.org.



More information about the Bioconductor mailing list