[BioC] Biocore response to Affymetrix data format changes

Kulp, David david_kulp at affymetrix.com
Thu Jul 3 10:18:49 MEST 2003


Ahh, it's the Xeotron chip that got my attention!  Because of that we've
changed our mind.  :->

But in all seriousness, there's no need to hassle your local sales rep; I
received the BioCore's email without any transmission errors.  Although it
wasn't mentioned on this list, since that email we've had offline
discussions with some of the core team, too.  And we've agreed to publish
the binary format specifications.

For the typical software developer, we still encourage the use of a stable
programming API that we provide.  In theory, it should shield developers
from the hassles of format changes.  And when you're developing to one or a
few operating systems and/or distributing binaries to your customers, then
using the API is almost certainly the preferred alternative.  

Some of the file formats are wrinkled with legacy decisions that mean they
may not be elegantly implemented.  And Affymetrix can't promise that formats
will remain stable, but we'll publish what those changes are -- warts and
all -- for those that want them.

We prefer that BioConductor software -- or any other program -- not create
Affymetrix files to minimize confusion and customer support problems.  If
you share BioConductor data, consider distributing R objects or use the
MAGE-ML public exchange format, since that's what it's for.

Affymetrix will make a more formal announcement about the release of file
format specifications soon, but July is a very busy month for our software
team, so full "roll-out" (online docs) will take some time.

I also would encourage BioConductor developers to join the Affymetrix
Developer Network.  Info is at
http://www.affymetrix.com/support/developer/index.affx.  The "ADN" hosts
regular meetings where issues such as the release of file formats have been
openly discussed in the past.  Your voice is welcome.

Cheers,
David Kulp

> -----Original Message-----
> From: James MacDonald [mailto:jmacdon at med.umich.edu] 
> Sent: Monday, June 30, 2003 1:20 PM
> To: isaac.neuhaus at bms.com; stvjc at channing.harvard.edu
> Cc: bioconductor at stat.math.ethz.ch
> Subject: Re: [BioC] Biocore response to Affymetrix data format changes
> 
> 
> I think Rafael's suggestion is probably the most effective. 
> If we as end
> users let Affymetrix know that we are unhappy with the coming 
> changes in
> their policy, maybe we can get them to change.
> 
> Unfortunately, right now Affy has a monopoly in the market and they
> know full well that any complaints will likely not be reflected in
> decreased sales. I am sure they are worried that Bioconductor 
> will have
> an effect on the sales of MAS6 (which undoubtedly will 
> generate a probe
> summary identical to rma), and the best way to eliminate that 
> threat is
> to make Bioconductor as difficult to use as possible.
> 
> However, letting your friendly local Affy sales rep know that you are
> not too keen on their proposed changes may have an effect. In 
> addition,
> letting them know that you are VERY interested in the Xeotron 
> chips may
> have an even greater effect ;-D
> 
> Jim
> 
> 
> 
> James W. MacDonald
> UMCCC Microarray Core Facility
> 1500 E. Medical Center Drive
> 7410 CCGC
> Ann Arbor MI 48109
> 734-647-5623
> 
> >>> Isaac Neuhaus <isaac.neuhaus at bms.com> 06/30/03 02:48PM >>>
> Many of us work in the pharmaceutical industry and have been taking
> advantage of your excellent tools.
> We are also 'monetarily speaking' important Affymetrix customers. I
> would like to know how WE, in the
> pharmaceutical industry could help and facilitate your continuing
> effort in developing these useful
> tools.
> 
> Isaac
> 
> Vincent Carey 525-2265 wrote:
> 
> > D. Kulp of Affymetrix commented on the upcoming proprietary GeneChip
> > data formats in a Bioconductor mailing list post of 25 June 2003.
> > He notes that Windows/Java linkable libraries will be provided
> > for reading the binary GeneChip format, and that MAGE/ML
> > exports will be available.  He proposes
> >  1) Bioconductor can provide free compiled libraries using
> > the API and the affymetrix linkable libraries
> >  2) Bioconductor applications use MAGE/ML, as data bloat is
> > not noteworthy and the export contains 'all the CEL data you
> > expect'.
> >
> > Kulp comments that these observations show that the details
> > of the change are "fairly simple".  In fact, the change has
> > far-reaching implications for those who work with Bioconductor
> > software and affymetrix data.
> >
> > The Bioconductor project has adopted a policy of programming
> > only to public and open APIs.  Primary reasons:
> >  a) R is free software under the GPL.  Although we have made
> > an effort to release the main Bioc components under LGPL, as
> > a collaborative gesture towards commercial entities who wish
> > to use our tools, R itself is GPL.  It is not possible to
> > legally distribute tools that combine compilations of
> > non-free software with GPL software.
> >  b) Beyond the restrictions of the GPL in relation to R,
> > the Digital Millenium Copyright Act (DMCA) creates legal
> > complications for those who create compilations of mixed
> > free and proprietary software.  We have no resources to spend
> > on legal advice or on adapting our research to a complex
> > legal landscape.  Commitment to public and open APIs allows
> > us to carry on research in a natural and efficient way largely
> > independently of DMCA restriction and interpretation in
> > the complex area of reverse engineering.
> >  c) Commitment to public and open APIs leverages the user
> > community's capabilities to discover problems and to
> > fix them.  While distribution of compiled libraries with
> > open components as interfaces to proprietary formats may
> > SEEM consistent with open source software methodology,
> > this is an illusion.  We have benefited from user-contributed
> > bug fixes and would cease to do so under the regimen proposed
> > by Kulp, because users would lack access to key elements of
> > the interface.
> >  d) Commitment to public and open APIs sharply reduces
> > effort required to support multiple platforms.  When compiled
> > libraries are distributed one frequently encounters conflicts
> > with resident versions of supporting libraries and one
> > needs to introduce substantial technology for bridging
> > distributed objects to platforms whose resources may be
> > out of date or noncompliant with basic standards.  Time spent
> > on nonstandard portability methodology is time subtracted
> > from research on computational biology.  As researchers
> > we cannot accept this additional cost.
> >  e) Commitment to public and open APIs is the only approach
> > compatible with the recognition that microarray analysis
> > technology is immature and must be fully open to scrutiny
> > if science is to advance in an efficient way.  Comparisons
> > of MAS4, MAS5, Li and Wong's MBEI and RMA probe-level
> > analyses indicate that the procedures yield different results.
> > Users have a right to expect that results from different
> > methodologies can be fully rationalized, and this can only
> > occur with open implementations.
> >
> > These five points respond to Kulp's suggestion that we
> > provide free binaries to the user community.  The suggestion
> > seems simple and positive but it is not feasible at all.
> >
> > Kulp's second suggestion is to employ the MAGE-ML format.
> > It does appear that this constitutes a public and open API
> > and one that we could program to.  However it does appear
> > that there will be significant information restrictions and
> > performance costs if we are forced to go in this direction.
> > We have one report of significant data bloat with the
> > current embodiments of this technology.  A 7 megabyte
> > cell file had a 30 MB XML representation, and a 21 MB
> > CDF file had a 400 MB XML representation.  Kulp suggests
> > that XML bloat does not occur, and that may be due to
> > his access to newer forms of the transformation.  We
> > believe that compliant MAGE-ML representations will be
> > massive.  Requiring Bioconductor to work from MAGE-ML
> > will lead to additional burdens on users that will
> > impede research progress.
> >
> > In summary, Bioconductor's commitment to open and public
> > APIs is dictated by legal and scientific considerations.
> > Affymetrix' transition to closed file formats is difficult
> > to understand.  No one questions the technical utility of
> > a change to a binary format.  Making it secret has no
> > utility that we can discern.  Bioconductor and its users
> > have provided R&D to affymetrix essentially free of charge.
> > The upcoming Affymetrix GeneChip Microarray Low-Level Workshop
> > ( http://eci-events.com/AffyGeneChip/ ) is proof that Affymetrix
> > appreciates and is open to these contributions.
> > Accommodating a non-public, non-open API for Affymetrix data
> > would constitute a precedent that might impact methods
> > adopted by other companies in this field.  We respectfully
> > ask that Affymetrix make a rather different precedent:
> > open the new file format to support and encourage research
> > and development in the microarray analysis domain.
> > An open format will clearly benefit both Affymetrix and
> > the scientific community.
> >
> > Sincerely,
> > The Bioconductor Core Team
> >
> >     * Douglas Bates, University of Wisconsin, USA.
> >     * Vince Carey, Harvard Medical School, USA.
> >     * Marcel Dettling, Federal Inst. Technology, Switzerland.
> >     * Sandrine Dudoit, Division of Biostatistics, UC Berkeley, USA.
> >     * Byron Ellis, Harvard Department of Statistics, USA.
> >     * Laurent Gautier, Technial University of Denmark, Denmark.
> >     * Robert Gentleman, Harvard Medical School, USA.
> >     * Jeff Gentry, Dana-Farber Cancer Institute, USA.
> >     * Kurt Hornik, Technische Universitat Wien, Austria.
> >     * Torsten Hothorn, Institut fuer Medizininformatik, 
> Biometrie und
> Epidemiologie, Germany.
> >     * Wolfgang Huber, DKFZ Heidelberg, Molecular Genome Analysis,
> Germany.
> >     * Stefano Iacus, University of Milan, Italy
> >     * Rafael Irizarry, Department of Biostatistics (JHU), USA.
> >     * Friedrich Leisch, Technische Universitat Wien, Austria.
> >     * Martin Maechler, Federal Inst. Technology, Switzerland.
> >     * Gordon Smyth, Walter and Eliza Hall Institute, Australia.
> >     * Anthony Rossini, University of Washington and the Fred
> Hutchinson Cancer Research Center, USA.
> >     * Gunther Sawitzki, Institute fur Angewandte Mathematik,
> Germany.
> >     * Luke Tierney, University of Iowa, USA.
> >     * Jean Yee Hwa Yang, University of California, San Francisco,
> USA.
> >     * Jianhua (John) Zhang, Dana-Farber Cancer Institute, USA.
> >



More information about the Bioconductor mailing list