[BioC] Reading Illumina IDAT files

Thu Jun 24 10:58:09 CEST 2010

Hi Mark,

The function you refer to is only able to handle idat files from current
version Infinium genotyping arrays.  Idats from expression arrays are not
supported - it is my understanding that the information in these files is
encrypted, which adds an extra layer of complexity to the whole operation.

Idats from Infinium arrays scanned using older scanner settings also
produce errors - we have added a check for this in the devel version of
crlmm.

Best wishes,

Matt

> Dear list,
>
> I'd like to be able to parse Illumina gene expression IDAT files & i've
> been playing with the crlmm:::readIDAT function, which is designed to read
> Illumina Infinium IDAT files.  This function dies on about the 9th line or
> so because 'nFields' is a very large negative number (see below). I'm
> trying to read in a MouseRef-8_V2_0_R1_11278551_A.bgx.xml type of array,
> but would like to be able to read all types of gene expression arrays.
>
> Here is the output that I get
>
> library(ff)
> library(crlmm)
> f <- "4687778079_A_Grn.idat"
> debug(crlmm:::readIDAT)
> crlmm:::readIDAT(f)
> #<snip>
> Browse[2]>
> debug: fileSize <- file.info(idatFile)$size
> Browse[2]>
> debug: tempCon <- file(idatFile, "rb")
> Browse[2]>
> debug: prefixCheck <- readChar(tempCon, 4)
> Browse[2]>
> debug: if (prefixCheck != "IDAT") {
> }
> Browse[2]> prefixCheck
> [1] "IDAT"
> Browse[2]>
> debug: NULL
> Browse[2]>
> debug: versionNumber <- readBin(tempCon, "integer", n = 1, size = 8,
>     endian = "little", signed = FALSE)
> Browse[2]>
> debug: nFields <- readBin(tempCon, "integer", n = 1, size = 4, endian =
> "little",
>     signed = FALSE)
> Browse[2]> versionNumber
> [1] 1
> Browse[2]>
> debug: fields <- matrix(0, nFields, 3)
> Browse[2]> nFields
> [1] -1398219826
> Browse[2]>
> Error in matrix(0, nFields, 3) : invalid 'nrow' value (< 0)
>
>
> I've also come across the illumina.py file within the glu-genetics project
> at googlecode, which as far as I can tell is python code to parse illumina
> arrays, based upon this crlmm code. Between crlmm's code & the
> glu-genetics code, I gather that the readIDAT function only reads IDAT
> version 3 files, whereas i'm pretty sure mine are IDAT version 1 (as
> indicated by the versionNumber value above
>
> I don't know whether Infinium IDAT's are indeed a different version to
> gene expression IDAT's, but I was hoping someone could point me in the
> right direction. Does anyone have a parser for generic IDAT files, or does
> anyone know how to reverse engineer binary files?
>
> cheers,
> Mark
>
> ----------------------------------------------------------------------
> Mark Cowley, PhD
>
> Peter Wills Bioinformatics Centre
> Garvan Institute of Medical Research
> ----------------------------------------------------------------------
>
> sessionInfo()
> R version 2.11.0 (2010-04-22)
> x86_64-apple-darwin9.8.0
>
> locale:
> [1] en_AU.UTF-8/en_AU.UTF-8/C/C/en_AU.UTF-8/en_AU.UTF-8
>
> attached base packages:
> [1] tools     stats     graphics  grDevices utils     datasets  methods
> base
>
> other attached packages:
> [1] crlmm_1.6.2         oligoClasses_1.10.0 Biobase_2.8.0       ff_2.1-2
>          bit_1.1-4
>
> loaded via a namespace (and not attached):
>  [1] affyio_1.16.0         annotate_1.26.0       AnnotationDbi_1.10.1
> Biostrings_2.16.2
>  [5] DBI_0.2-5             ellipse_0.3-5         genefilter_1.30.0
> IRanges_1.6.4
>  [9] mvtnorm_0.9-9         preprocessCore_1.10.0 RSQLite_0.9-0
> splines_2.11.0
> [13] survival_2.35-8       xtable_1.5-6

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}