[BioC] Question about 'updateCel' from package 'affxparser'

Henrik Bengtsson hb at biostat.ucsf.edu
Sun Aug 28 02:58:10 CEST 2011


Hi,

sorry for the late response on this.  Yes, if there is a bug, it could
to be related updateCel() of affxparser.  Though, I cross my fingers
that there is another reason for all this - something that is not a
bug but a simple mistake.

I am aware of the note in help("seek") saying "Use of seek on Windows
is discouraged. We have found so many errors in the Windows
implementation of file positioning that users are advised to use it
only at their own risk, and asked not to waste the R developers' time
with bug reports on Windows' deficiencies.".  Unfortunately, there is
no other option than using seek(), but of course, it should not give
errors.

FYI, updateCel() has been used a lots on Windows for many years
without any issues, although I acknowledge that issues as those you
are reporting could indeed be missed.  One way to know that the
problem is with updateCel() you could add the following, e.g.

for (j in 1:9) {
  filename <- filenames[j];
  y <- exp(output[,j]);

  # Update CEL file
  updateCel(filename, indices=indices, intensities=y);

  # Assert correctness
  y2 <- readCel(filename, indices=indices, readIntensities=TRUE,
readHeader=FALSE, readStdvs=FALSE, readPixels=FALSE, readXY=FALSE,
readOutliers=FALSE, readMasked=FALSE)$intensities;

  if (!all.equal(y2, y)) {
     # Report the data we tried to write
     str(list(filename=filename, indices=indices, intensities=y,
rereadIntensities=y2));
     stop("Error in updateCel()!").
  }
} # for (j ...)

One could do an even fancier version where one compare the complete
CEL file before and after, which then would detect incorrectly written
values outside the file areas that 'indices' points to.

Also before troubleshooting more, what is your sessionInfo()?

/Henrik
(co-author of affxparser)


On Fri, Jul 29, 2011 at 1:27 AM, Steve Pederson
<stephen.pederson at adelaide.edu.au> wrote:
> Hi,
>
>
>
> I'm building an array analysis package using an MCMC process for each gene &
> am writing the key values (posterior quantiles, mean, sd etc) for each gene
> to disk using CEL files and the aroma.affymetrix architecture. However I've
> noticed that on rare occasion, the incorrect values are written to the CEL
> files.
>
>
>
> As an example of what goes wrong, if I have a vector of 8 values (e.g.
> intensities=1:8) and am updating 8 cells (e.g. indices=221632:221639) with
> these 8 values, occasionally the first value (i.e. intensities[1]) would be
> incorrectly given to all 8 cells on the CEL file. This has happened in an
> obvious manner only to 2 different genes in 2 of 3 datasets & I can't spot a
> connection in the data structures that may cause the misfire. If I can see
> these 2 obvious problems, I'm unsure as to what less obvious errors there
> may be lurking in the dataset & hence my confidence in the results is
> somewhat diminished.
>
>
>
> The code from the relevant section of my function (where I write batches of
> genes at a time) is:
>
> monoUgcMap <- getUnitGroupCellMap(monoCdf, unit=batchUnits) # Get the
> UnitGroupCellMap from a monocell cdf for the current batch units
>
> indices <- monoUgcMap$cell
>
> for (j in 1:9) { # The output after the MCMC process will always have 9
> columns, which are written to separate CEL files
>
>      updateCel(filenames[j], indices=indices, intensities=exp(output[,j]))
> #update the CEL file for the current batch
>
> }
>
>
>
> I am wondering if the issue arises from the usage of 'seek' within the
> function 'updateCel', as I am on a Windows 7 x64 system? The line I'm
> referring to  from the 'updateCel' source code is:
>
>
>
> seek(con, origin="start", where=offset, rw="read")
>
>
>
> Reading the 'seek' help page didn't really shed much light (to my flawed
> brain) on what this step does, but I did spot the warning about this
> function under Windows.  I'm still a bit unclear as to what this step does,
> so am looking for either a clarification as to the point of this line of
> code, or any suggestions for a simple workaround.
>
>
>
> As it appears to be a random & rare misfire, I'll have to make changes & run
> a few datasets (over multiple days) just to check it so may be a bit slow
> with any responses...
>
>
>
> Thanks in advance,
>
>
>
> Steve
>
>
>        [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>



More information about the Bioconductor mailing list