[BioC] bead-level data from Infinium methylation arrays

Tim Triche, Jr. ttriche at usc.edu
Thu Jul 9 20:07:08 CEST 2009


On Wed, Jul 8, 2009 at 2:16 AM, Mark Dunning<mark.dunning at gmail.com> wrote:
> Hi Tim,
>
> Do you know what scanning software was used to create these bead-level
> data? BeadScan or the newer iScan system? I'm wondering if the format
> of the files has changed since we wrote readIllumina. When the object
> 'dat1' is created in readIllumina it assumes a set number of columns
> in the bead-level text files (4,6 or 7) so if the number of columns is
> something different then this dat1 object will not be created causing
> the function to error.

I confirmed with the staff of the data production facility that my
files are from BeadScan.  I don't yet have a copy of the settings.xml
file in use, or changes to it, but I'll get one.  I have attached
other files suggested by you and Dr. Carey, along with a feeble patch
I wrote.

The files I have are chipnumber_array_color.(idat|xml|locs|tif),
chipnumber_array.txt, and chipnumber.sdf for each chip, along with a
Metrics.txt file, a manifest file (Excel, but I converted it to CSV in
hopes of turning it into an annotation package), and a targets.txt
file which I wrote in the format shown by the example bead-level-data
in the vignette.

The .txt files with which I am provided have only the columns 'Code',
'Grn', and 'Red' (all with integer-valued contents).  If I'm not hosed
-- if the .txt and .tif files are enough -- could anyone provide a bit
of guidance in terms of where I should start hacking?  I'm not averse
to monkeying around in the C code but I don't know where I should look
first.

I did write a simple kludge to read in Infinium two-channel data.  It
is not clever, just a small patch to readIllumina to deal with the
3-column format I have.  Nonetheless it causes the package to inspect
the .tif files, putting quite a strain on my pokey laptop.  Then an
error (and not the one I added as a checkpoint) is thrown:

Error in data[, 2] = bgCorrectSingleArray(fg = greenIntensities[[5]],  :
  replacement has length zero

I didn't request background correction, for what that's worth.

The lack of useful X,Y location information seems to be the culprit
here.  I am not sure how best to fix this.  Files with the extension
.locs are provided, but I could not find useful specs on this file
format.  Am I stymied with regards to accessing the bead-level data?
(A presentation by Matt Ritchie at Cambridge hinted that this may be
the case.  Dr. Carey's reply suggested that perhaps the oft-changing
Illumina file formats might also be involved.)

I could request that the core facility not default to these
proprietary formats, if that is an insurmountable obstacle.  Have
others found themselves in this situation before?

Thanks for any suggestions,

--tim
-------------- next part --------------
Code	Grn	Red
10008	106	1847
10008	139	1680
10008	135	1675
10008	52	1315
10008	59	1832
10008	96	1250
10008	65	1314
10008	66	1457
10008	85	1560
-------------- next part --------------
4321207025_A_Grn.idat
4321207025_A_Grn.locs
4321207025_A_Grn.tif
4321207025_A_Grn.xml
4321207025_A_Red.idat
4321207025_A_Red.locs
4321207025_A_Red.tif
4321207025_A_Red.xml
4321207025_A.txt
4321207025_B_Grn.idat
4321207025_B_Grn.locs
4321207025_B_Grn.tif
4321207025_B_Grn.xml
4321207025_B_Red.idat
4321207025_B_Red.locs
4321207025_B_Red.tif
4321207025_B_Red.xml
4321207025_B.txt
4321207025.sdf
files.txt
Metrics.txt
probe_sequences.csv
readIllumina.diff
readIllumina.orig.R
readIllumina.patched.R
targets.txt
-------------- next part --------------
R version 2.10.0 Under development (unstable) (2009-06-25 r48836) 
i686-pc-linux-gnu 

locale:
 [1] LC_CTYPE=en_US.utf8       LC_NUMERIC=C             
 [3] LC_TIME=en_US.utf8        LC_COLLATE=en_US.utf8    
 [5] LC_MONETARY=C             LC_MESSAGES=en_US.utf8   
 [7] LC_PAPER=en_US.utf8       LC_NAME=C                
 [9] LC_ADDRESS=C              LC_TELEPHONE=C           
[11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C      

attached base packages:
[1] splines   stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
[1] beadarray_1.13.4 Biobase_2.5.4    sandwich_2.2-1   zoo_1.5-6       
[5] Design_2.2-0     survival_2.35-4  Hmisc_3.6-0     

loaded via a namespace (and not attached):
[1] cluster_1.12.0  grid_2.10.0     hwriter_1.1     lattice_0.17-25
[5] limma_2.19.2    tools_2.10.0   


More information about the Bioconductor mailing list