[BioC] affy : ReadAffy() fails on HuGene-1_1-st-v1 chips

Groot, Philip de philip.degroot at wur.nl
Fri Jan 27 09:18:20 CET 2012


Dear James,

I apologize for the email from yesterday. I totally agree with your points. In addition, I really appreciate the effort that is undertaken in establishing the oligo package. I am using the library myself regularly!

However, there is a reason why I reacted this way. The affy problem has been reported by me previously. And it was fixed in Bioconductor 2.8! Now it is broken again. Can happen, but this line really annoyed me:

(quote): " I have not made the change in the release version because it isn't a bug."

Definitely, it IS a bug. However, it does not affect analysis of Affymetrix arrays because the 1st and 2nd generation arrays are square. So you don't notice the problem and this is fine. This is also the reason why I am in doubt whether the fix will really stay in with the next release. It has been removed before without good reason...

In addition, people have invested some time to properly create CDF's for the geneTitan plates that do properly work with the affy library and provide (at least) identical RMA results with oligo. To my opinion, the great success and support of Bioconductor is for a significant part based on the affy library and the solutions that it offered when microarray analysis was at its infancy: it contributed in evolving Bioconductor to its current state! I think that the Bioconductor project should allow a "transitional period" where both affy and oligo can be utilized for analysing the most recent Affymetrix arrays. In addition, a lot of publications and tutorials are available that point people to the affy library and hence stimulate people to try it in the first place! Eventually, we should use oligo. No doubt about it, but the process should be a smooth transition. Currently, this is not the case. In addition, I am trying to help and I have the feeling that this is not well appreciated.

In summary: the oligo library has my full support, but I do hope that the affy-issue will be fixed because it is a good thing for the Bioconductor community.

Regards,

Dr. Philip de Groot
Bioinformatician / Microarray analysis expert

Wageningen University / TIFN
Netherlands Nutrigenomics Center (NNC)
Nutrition, Metabolism & Genomics Group
Division of Human Nutrition
PO Box 8129, 6700 EV Wageningen
Visiting Address:
"De Valk" ("Erfelijkheidsleer"),
Building 304,
Verbindingsweg 4, 6703 HC Wageningen
Room: 0052a
T: 0317 485786
F: 0317 483342
E-mail: Philip.deGroot at wur.nl
I:         http://humannutrition.wur.nl
           https://madmax.bioinformatics.nl
           http://www.nutrigenomicsconsortium.nl




-----Original Message-----
From: James W. MacDonald [mailto:jmacdon at med.umich.edu]
Sent: donderdag 26 januari 2012 15:32
To: Groot, Philip de
Cc: 'Osselaer, Steven [JRDBE Extern]'; Goehlmann, Hinrich [JRDBE]; bioconductor at r-project.org
Subject: Re: [BioC] affy : ReadAffy() fails on HuGene-1_1-st-v1 chips

Hi Philip,

On 1/26/2012 4:28 AM, Groot, Philip de wrote:
> Hello all,
>
> Just to be sure:
>
>> If you follow the discussion that Mike linked to, this has been corrected in the devel version of the affy package. I made this change because>it didn't have an adverse effect on the intended target of the affy package, which is the 3' biased arrays. I have not made the change in the>release version because it isn't a bug.
> I think it is not nice that the problem will reoccur everytime a new release is present? So I do hope that the patch is included in the next Bioconductor release? Please acknowledge!

For those who don't understand how the BioC release cycle works, here is a short primer. At any one time there are two versions; the release version, which is considered to be stable, and the devel version, upon which developers are still working.

At each release the developers finish up all changes they have made to their packages, and the devel version is then split into a new release branch, which is then 'released'. This new release is then considered to be stable, and only bug fixes of sufficient gravity can be made. Since this patch doesn't fix a bug, it was not applied to the release version.

Therefore, by definition, all changes made to code in the devel version will make their way into the next release.

>
> In addition, I severely tested affy and oligo RMA normalization using either the CDF (http://nmg-r.bioinformatics.nl/NuGO_R.html) or the pd.mapping (Bioconductor oligo) libraries. The RMA results are identical upon to last digit!
>
> In conclusion: it works in both ways, so let's support it properly then! Note: I do agree that the oligo package is better suited for handling 3rd generation Affymetrix arrays, but intentionally sabotaging the affy library ((sorry, but it just looks like this) is not the way to force people to move to oligo. Just my 2 cents.

That is a pretty harsh condemnation, and I will assume that you don't really mean it like it sounds, so I will try to show restraint.

A little background; several years ago it became clear that Affymetrix was going to have many more types of chips than just the original 3'
biased chips for which the affy/makecdfenv pipeline was developed. After some discussion, Rafael Irizarry decided that rather than trying to reverse engineer an already existing and popular package to support all these new chips (in the six month span between releases), it would be better to create an entirely new pipeline that is intended to support ALL chips that Affy produces. The amount of time it took to get oligo/pdInfoBuilder to the current matured state is testament to the wisdom of that choice. Trying to 'fix' affy in six months would have been a disaster.

So, three points;

1.) Characterizing this as sabotage is (arrogant, ignorant, foolish, infuriating). I leave it to others to decide which.
2.) The affy and makecdfenv packages are open source. If you (or anybody else, for that matter) wants to fork the code into your own package that supports all and sundry, please feel free to do so.
3.) The original plan was for the affy package to be deprecated, and then removed from BioC. In deference to the vast user base who use this package, and the existing personal code that is based on affy, it was not deprecated. In addition, we have made changes where we can to make affy accomodate these new chips, even when it isn't in anybody's interest to do so. This, I believe, invalidates your accusation that people are being 'forced to move to oligo'.

Best,

Jim


>
> Regards,
>
> Dr. Philip de Groot
> Bioinformatician / Microarray analysis expert
>
> Wageningen University / TIFN
> Netherlands Nutrigenomics Center (NNC) Nutrition, Metabolism& Genomics
> Group Division of Human Nutrition PO Box 8129, 6700 EV Wageningen
> Visiting Address:
> "De Valk" ("Erfelijkheidsleer"),
> Building 304,
> Verbindingsweg 4, 6703 HC Wageningen
> Room: 0052a
> T: 0317 485786
> F: 0317 483342
> E-mail: Philip.deGroot at wur.nl
> I:         http://humannutrition.wur.nl
>             https://madmax.bioinformatics.nl
>             http://www.nutrigenomicsconsortium.nl
>
>
>
> -----Original Message-----
> From: Osselaer, Steven [JRDBE Extern] [mailto:SOSSELAE at ITS.JNJ.COM]
> Sent: dinsdag 24 januari 2012 15:22
> To: James W. MacDonald
> Cc: Goehlmann, Hinrich [JRDBE]; bioconductor at r-project.org
> Subject: Re: [BioC] affy : ReadAffy() fails on HuGene-1_1-st-v1 chips
>
> Thank you for this information, James.
> We will look into this and try start using 'oligo' for these types of arrays.
>
> Kind regards,
> Steven
>
> -----Original Message-----
> From: James W. MacDonald [mailto:jmacdon at med.umich.edu]
> Sent: Tuesday, 24 January 2012 15:18
> To: Osselaer, Steven [JRDBE Extern]
> Cc: Mike Smith; bioconductor at r-project.org
> Subject: Re: [BioC] affy : ReadAffy() fails on HuGene-1_1-st-v1 chips
>
> Hi Steven,
>
> If you follow the discussion that Mike linked to, this has been corrected in the devel version of the affy package. I made this change because it didn't have an adverse effect on the intended target of the affy package, which is the 3' biased arrays. I have not made the change in the release version because it isn't a bug.
>
> I also made this change because people seem to want to use the affy
> package for analyzing the Gene ST chips even though it was never
> intended for this purpose, and doesn't really do a good job. The oligo
> package is intended to be used with these chips, and that is the
> package
>
> we recommend you use.
>
> I think some of the hesitation to use oligo stems from the fact that
> it had a long development cycle, and in earlier incarnations was not
> completely documented. This is no longer true, and I would recommend
> you
>
> at least take a look.
>
> Best,
>
> Jim
>
>
>
> On 1/24/2012 9:01 AM, Osselaer, Steven [JRDBE Extern] wrote:
>> Thanks a lot, Mike.
>>
>> Applying the patch makes the ReadAffy() call functional again for
> these
>> types of chips.
>>
>>
>>
>> Kind regards,
>>
>> Steven Osselaer
>>
>>
>>
>> From: Mike Smith [mailto:grimbough at gmail.com]
>> Sent: Tuesday, 24 January 2012 14:30
>> To: Osselaer, Steven [JRDBE Extern]
>> Cc: bioconductor at r-project.org
>> Subject: Re: [BioC] affy : ReadAffy() fails on HuGene-1_1-st-v1 chips
>>
>>
>>
>> Hi Steven,
>>
>>
>>
>> I think this may be related to a problem that was raised on the
>> Bioc-devel mailing list a couple of months ago:
>>
>>
>>
>> https://stat.ethz.ch/pipermail/bioc-devel/2011-November/002955.html
>>
>>
>>
>> If indeed it's the same issue then the discussion above indicates it
> was
>> patch from affy version 1.33.1
>>
>>
>>
>> Mike
>>
>>
>>
>> On Tue, Jan 24, 2012 at 1:07 PM, Osselaer, Steven [JRDBE Extern]
>> <SOSSELAE at its.jnj.com>   wrote:
>>
>> Dear Wolfgang,
>>
>> I was under the impression that it was a problem with the software as
> I
>> can read the same CEL files with the R 2.13.1 software : see
> transcript
>> for the same code but run under R 2.13.1 below.
>>
>> Kind regards,
>> Steven
>>
>> R version 2.13.1 (2011-07-08)
>>
>> Copyright (C) 2011 The R Foundation for Statistical Computing ISBN
>> 3-900051-07-0
>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>
>> R is free software and comes with ABSOLUTELY NO WARRANTY.
>> You are welcome to redistribute it under certain conditions.
>> Type 'license()' or 'licence()' for distribution details.
>>
>>    Natural language support but running in an English locale
>>
>> R is a collaborative project with many contributors.
>> Type 'contributors()' for more information and 'citation()' on how to
>> cite R or R packages in publications.
>>
>> Type 'demo()' for some demos, 'help()' for on-line help, or
>> 'help.start()' for an HTML browser interface to help.
>> Type 'q()' to quit R.
>>
>>> library(affy)
>> Loading required package: Biobase
>>
>> Welcome to Bioconductor
>>
>>    Vignettes contain introductory material. To view, type
>>    'browseVignettes()'. To cite Bioconductor, see
>>    'citation("Biobase")' and for packages 'citation("pkgname")'.
>>
>>> sessionInfo()
>> R version 2.13.1 (2011-07-08)
>>
>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>
>> locale:
>>    [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>>    [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>>
>>    [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
>>    [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>>
>>    [9] LC_ADDRESS=C               LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>
>> other attached packages:
>>
>> [1] affy_1.30.0    Biobase_2.12.2
>>
>>
>> loaded via a namespace (and not attached):
>>
>> [1] affyio_1.20.0         preprocessCore_1.14.0
>>
>>> celFiles<- list.files(pattern="CEL$") celFiles
>>    [1] "27002.CEL" "27003.CEL" "27004.CEL" "27005.CEL" "27006.CEL"
>> "27007.CEL"
>>    [7] "27008.CEL" "27009.CEL" "27010.CEL" "27011.CEL" "27012.CEL"
>> "27013.CEL"
>> [13] "27014.CEL" "27015.CEL" "27016.CEL" "27017.CEL" "27018.CEL"
>> "27019.CEL"
>> [19] "27020.CEL" "27021.CEL" "27022.CEL" "27023.CEL" "27024.CEL"
>> "27025.CEL"
>> [25] "27026.CEL" "27027.CEL" "27028.CEL" "27029.CEL" "27030.CEL"
>> "27031.CEL"
>> [31] "27032.CEL" "27033.CEL" "27034.CEL" "27035.CEL" "27036.CEL"
>> "27037.CEL"
>> [37] "27038.CEL" "27039.CEL" "27040.CEL" "27041.CEL" "27042.CEL"
>> "27043.CEL"
>> [43] "27044.CEL" "27045.CEL" "27046.CEL" "27047.CEL" "27048.CEL"
>> "27049.CEL"
>> [49] "27050.CEL" "27051.CEL" "27052.CEL" "27053.CEL" "27054.CEL"
>> "27055.CEL"
>> [55] "27056.CEL" "27057.CEL" "27058.CEL" "27059.CEL" "27060.CEL"
>> "27061.CEL"
>> [61] "27062.CEL" "27063.CEL" "27064.CEL" "27065.CEL" "27066.CEL"
>> "27067.CEL"
>> [67] "27068.CEL" "27069.CEL" "27070.CEL" "27071.CEL" "27072.CEL"
>> "27073.CEL"
>> [73] "27074.CEL" "27075.CEL" "27076.CEL" "27077.CEL" "27078.CEL"
>> "27079.CEL"
>> [79] "27080.CEL" "27081.CEL" "27082.CEL" "27083.CEL" "27084.CEL"
>> "27085.CEL"
>> [85] "27086.CEL" "27087.CEL" "27088.CEL" "27089.CEL" "27090.CEL"
>> "27091.CEL"
>> [91] "27092.CEL" "27093.CEL" "27094.CEL" "27095.CEL" "27096.CEL"
>>> rawData<- ReadAffy(filenames=celFiles)
>>>
>>> q()
>>
>> -----Original Message-----
>> From: bioconductor-bounces at r-project.org
>> [mailto:bioconductor-bounces at r-project.org] On Behalf Of Wolfgang
> Huber
>> Sent: Tuesday, 24 January 2012 13:59
>> To: bioconductor at r-project.org
>> Subject: Re: [BioC] affy : ReadAffy() fails on HuGene-1_1-st-v1 chips
>>
>>
>> Dear Steven
>>
>> thank you. What is your question, or why and how do you think someone
>> other than the party who gave you the apparently faulty CEL file can
>> help you?
>>
>>          Best wishes
>>          Wolfgang
>>
>>
>>
>>
>> Steven Osselaer [guest] scripsit 01/24/2012 10:42 AM:
>>> Reading HuGene-1_1-st-v1 CEL files results in an error message about
>> incorrect dimensions of the first CEL file of the list
>>> TRANSCRIPT :
>>>
>>> R version 2.14.1 (2011-12-22)
>>> Copyright (C) 2011 The R Foundation for Statistical Computing ISBN
>>> 3-900051-07-0
>>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>>
>>> R is free software and comes with ABSOLUTELY NO WARRANTY.
>>> You are welcome to redistribute it under certain conditions.
>>> Type 'license()' or 'licence()' for distribution details.
>>>
>>>      Natural language support but running in an English locale
>>>
>>> R is a collaborative project with many contributors.
>>> Type 'contributors()' for more information and 'citation()' on how
>>> to cite R or R packages in publications.
>>>
>>> Type 'demo()' for some demos, 'help()' for on-line help, or
>>> 'help.start()' for an HTML browser interface to help.
>>> Type 'q()' to quit R.
>>>
>>>> library(affy)
>>> Loading required package: Biobase
>>>
>>> Welcome to Bioconductor
>>>
>>>      Vignettes contain introductory material. To view, type
>>>      'browseVignettes()'. To cite Bioconductor, see
>>>      'citation("Biobase")' and for packages 'citation("pkgname")'.
>>>
>>>> sessionInfo()
>>> R version 2.14.1 (2011-12-22)
>>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>>
>>> locale:
>>>     [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>>>     [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>>>     [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>>>     [7] LC_PAPER=C                 LC_NAME=C
>>>     [9] LC_ADDRESS=C               LC_TELEPHONE=C
>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>>
>>> attached base packages:
>>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>>
>>> other attached packages:
>>> [1] affy_1.32.0    Biobase_2.14.0
>>>
>>> loaded via a namespace (and not attached):
>>> [1] affyio_1.22.0         BiocInstaller_1.2.1   preprocessCore_1.16.0
>>> [4] zlibbioc_1.0.0
>>>> celFiles<- list.files(pattern="CEL$") celFiles
>>>     [1] "27002.CEL" "27003.CEL" "27004.CEL" "27005.CEL" "27006.CEL"
>> "27007.CEL"
>>>     [7] "27008.CEL" "27009.CEL" "27010.CEL" "27011.CEL" "27012.CEL"
>> "27013.CEL"
>>> [13] "27014.CEL" "27015.CEL" "27016.CEL" "27017.CEL" "27018.CEL"
>> "27019.CEL"
>>> [19] "27020.CEL" "27021.CEL" "27022.CEL" "27023.CEL" "27024.CEL"
>> "27025.CEL"
>>> [25] "27026.CEL" "27027.CEL" "27028.CEL" "27029.CEL" "27030.CEL"
>> "27031.CEL"
>>> [31] "27032.CEL" "27033.CEL" "27034.CEL" "27035.CEL" "27036.CEL"
>> "27037.CEL"
>>> [37] "27038.CEL" "27039.CEL" "27040.CEL" "27041.CEL" "27042.CEL"
>> "27043.CEL"
>>> [43] "27044.CEL" "27045.CEL" "27046.CEL" "27047.CEL" "27048.CEL"
>> "27049.CEL"
>>> [49] "27050.CEL" "27051.CEL" "27052.CEL" "27053.CEL" "27054.CEL"
>> "27055.CEL"
>>> [55] "27056.CEL" "27057.CEL" "27058.CEL" "27059.CEL" "27060.CEL"
>> "27061.CEL"
>>> [61] "27062.CEL" "27063.CEL" "27064.CEL" "27065.CEL" "27066.CEL"
>> "27067.CEL"
>>> [67] "27068.CEL" "27069.CEL" "27070.CEL" "27071.CEL" "27072.CEL"
>> "27073.CEL"
>>> [73] "27074.CEL" "27075.CEL" "27076.CEL" "27077.CEL" "27078.CEL"
>> "27079.CEL"
>>> [79] "27080.CEL" "27081.CEL" "27082.CEL" "27083.CEL" "27084.CEL"
>> "27085.CEL"
>>> [85] "27086.CEL" "27087.CEL" "27088.CEL" "27089.CEL" "27090.CEL"
>> "27091.CEL"
>>> [91] "27092.CEL" "27093.CEL" "27094.CEL" "27095.CEL" "27096.CEL"
>>>> rawData<- ReadAffy(filenames=celFiles)
>>> Error in read.affybatch(filenames = l$filenames, phenoData =
>> l$phenoData,  :
>>>      Cel file 27002.CEL does not seem to have the correct dimensions
>>>> traceback()
>>> 3: .Call("read_abatch", filenames, rm.mask, rm.outliers, rm.extra,
>>>           ref.cdfName, dim.intensity[c("Rows", "Cols")], verbose,
>> PACKAGE = "affyio")
>>> 2: read.affybatch(filenames = l$filenames, phenoData = l$phenoData,
>>>           description = l$description, notes = notes, compress =
>> compress,
>>>           rm.mask = rm.mask, rm.outliers = rm.outliers, rm.extra =
>> rm.extra,
>>>           verbose = verbose, sd = sd, cdfname = cdfname)
>>> 1: ReadAffy(filenames = celFiles)
>>>
>>>
>>>     -- output of sessionInfo():
>>>
>>> R version 2.14.1 (2011-12-22)
>>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>>
>>> locale:
>>>     [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>>>     [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>>>     [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>>>     [7] LC_PAPER=C                 LC_NAME=C
>>>     [9] LC_ADDRESS=C               LC_TELEPHONE=C
>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>>
>>> attached base packages:
>>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>>
>>> other attached packages:
>>> [1] affy_1.32.0    Biobase_2.14.0
>>>
>>> loaded via a namespace (and not attached):
>>> [1] affyio_1.22.0         BiocInstaller_1.2.1   preprocessCore_1.16.0
>>> [4] zlibbioc_1.0.0
>>>
>>>
>>> --
>>> Sent via the guest posting facility at bioconductor.org.
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>>
>> --
>> Best wishes
>>          Wolfgang
>>
>> Wolfgang Huber
>> EMBL
>> http://www.embl.de/research/units/genome_biology/huber
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>>
>>
>>
>>
>>
>>
> --
> James W. MacDonald, M.S.
> Biostatistician
> Douglas Lab
> University of Michigan
> Department of Human Genetics
> 5912 Buhl
> 1241 E. Catherine St.
> Ann Arbor MI 48109-5618
> 734-615-7826
>
> **********************************************************
> Electronic Mail is not secure, may not be read every day, and should
> not be used for urgent or sensitive issues
>
>
>

--
James W. MacDonald, M.S.
Biostatistician
Douglas Lab
University of Michigan
Department of Human Genetics
5912 Buhl
1241 E. Catherine St.
Ann Arbor MI 48109-5618
734-615-7826

**********************************************************
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues



More information about the Bioconductor mailing list