[BioC] beadarray readIllumina suggestions

Mon Jun 25 15:57:51 CEST 2007

Dear Keith,

Thanks for your mail and suggestions.  You're right, there is an
inconsistency in the code for the path argument.  It was fixed a little
while ago in the developmental version of beadarray (1.5.1 from memory), so
if you upgrade to the latest developmental version of the package, the
'path' argument should work as described.

I'd also recommend specifying the arrays which you want to read in
explicitly using the 'arrayNames' argument of readIllumina().  I always do
this to ensure the arrays are read in in the same order as they appear in my
targets file (which contains sample information), rather than the default
order. 

And yes, the column names of the .txt or .csv files do vary with the version
of BeadScan, and the type of array (single-channel/two-colour).  There is
some checking in readIllumina to see how many columns there are and from
memory 4 columns are supported.  If the format you have doesn't read in
properly, perhaps you can send us some example files for us to have a look
at.

Best wishes,

Matt

On 25/6/07 14:07, "Keith James" <kdj at sanger.ac.uk> wrote:

> 
> I am reading single channel bead level data using the readIllumina function.
> The docs indicate that there is a path parameter:
> 
> path  character string specifying the location of files to be read by the
> function
> 
> Calling the function with a path argument results in an error:
> 
> readIllumina(path = "/path/to/data", txtType = ".txt")
> Error in strtrim(x, width) : invalid 'width' argument
>> traceback()
> 4: strtrim(xyFiles, nchar(xyFiles) - 4)
> 3: as.vector(y)
> 2: intersect(strtrim(GImages, nchar(GImages) - 8), strtrim(xyFiles,
>        nchar(xyFiles) - 4))
> 1: readIllumina(path = "/path/to/data", txtType = ".txt")
> 
> I've seen in another post that readIllumina expects the files to be in the
> working directory, and this is the case since this line in the function
> relies on the default path for dir calls:
> 
> GImages = dir(pattern = "_Grn.tif")
> 
> At first I took this to be a documentation bug, but in fact the path argument
> is honoured for loading the csv files:
> 
>  file = csv_files[i]
>         if (!is.null(path))
>             file = file.path(path, file)
> 
> and the annotation (.opa file):
> 
>  if (!is.null(path))
>             annoFile = file.path(path, annoFile)
> 
>  but apparently not for loading the metrics file:
> 
>  if (metrics) {
>          metrics = dir(pattern = metricsFile)
> 
> My suggestion is to make the behaviour consistent across all the data, i.e. to
> honour a path argument for tif and metrics files.
> 
> It is probably worth noting in the docs the assumptions the function makes
> about the files it expects in the data directory. i.e. that *all* tif images
> will be loaded and *all* .txt files. My data directory contained other .tif
> and .txt files ("targets.txt",  "notes.txt") which caused the function to
> choke. I think that it is optimistic to assume that users will have no other
> such files present.
> 
> In addition, I wonder whether the column names in the csv/txt files vary with
> the version of the scanner or scanner software. Instead of
> 
> ProbeID G Gb GrnX GrnY
> 
> or similar, we have
> 
> Code    Grn     GrnX    GrnY
> 
> So, 4 columns rather than 3 or 5.
> 
> Finally, we appreciate all the work you've done in enabling us to work with
> raw Illumina data. Many thanks.
> 
>> sessionInfo()
> R version 2.5.0 (2007-04-23)
> i686-pc-linux-gnu
> 
> locale:
> LC_CTYPE=en_GB;LC_NUMERIC=C;LC_TIME=en_GB;LC_COLLATE=en_GB;LC_MONETARY=en_GB;L
> C_MESSAGES=en_GB;LC_PAPER=en_GB;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASU
> REMENT=en_GB;LC_IDENTIFICATION=C
> 
> attached base packages:
> [1] "grid"      "tools"     "stats"     "graphics"  "grDevices" "utils"
> [7] "datasets"  "methods"   "base"
> 
> other attached packages:
>    beadarray beadarraySNP  quantsmooth      lodplot     quantreg      SparseM
>      "1.4.0"      "1.2.0"      "1.2.0"        "1.1"       "4.06"       "0.73"
>         affy       affyio  geneplotter      lattice     annotate      Biobase
>     "1.14.0"      "1.4.0"     "1.14.0"     "0.15-4"     "1.14.1"     "1.14.0"
>        limma 
>     "2.10.0" 
>