[BioC] xps: hugene11 chip gives problems

cstrato cstrato at aon.at
Sat Jan 12 00:10:47 CET 2013


Dear Philip,

I am glad to hear that using 'celnames' could solve your problem.

It is interesting to hear that you have never had problems with names of 
CEL-files. Personally I prefer to change the names, especially the names 
of the CEL-files from GEO which are simply numbers with a prefix.

Have a nice weekend, too.
Christian


On 1/11/13 10:34 PM, Groot, Philip de wrote:
> Dear Christian,
>
> Thank you very much! I was thinking that it must have been something in the CEL-file itself, but it turns out to be the filename! I'll adapt the script on our production server to fix the issue. I have to mention that we use xps for quite some years now. We never encountered this issue before!
>
> I worked through your recommendations from yesterday. I could indeed properly load the affymetrix sample data. And changing the location of the root-scheme did not fix the issue either! Fortunately, we do understand this now!
>
> And you are right: if xps is updated, I need to recreate the schemes too. This needs only to be done once every 6 months (usually) and is not a big problem. And it also forces me to check the Affymetrix site for updated annotations etc. I just feel more comfortable if the schemes are created by the current running version of xps.
>
> Have a nice weekend.
>
> Regards,
>
>
> Dr. Philip de Groot Ph.D.
> Bioinformatics Researcher
>
> Wageningen University / TIFN
> Nutrigenomics Consortium
> Nutrition, Metabolism & Genomics Group
> Division of Human Nutrition
> PO Box 8129, 6700 EV Wageningen
> Visiting Address: Erfelijkheidsleer: De Valk, Building 304
> Dreijenweg 2, 6703 HA  Wageningen
> Room: 0052a
> T: +31-317-485786
> F: +31-317-483342
> E-mail:   Philip.deGroot at wur.nl
> Internet: http://www.nutrigenomicsconsortium.nl
>               http://humannutrition.wur.nl/
>               https://madmax.bioinformatics.nl/
> ________________________________________
> From: cstrato [cstrato at aon.at]
> Sent: 11 January 2013 21:05
> To: Groot, Philip de
> Cc: bioconductor at r-project.org
> Subject: Re: [BioC] xps: hugene11 chip gives problems
>
> Dear Philip,
>
> Meanwhile I did another test and renamed my CEL-files to mimic your
> names. This is what I get:
>   > celfiles <- c("Brain_01_1.1.CEL","Prostate_01_1.1.CEL")
>   > data.genome11 <- import.data(scheme.hugene11, "tmp_HuBrPr",
> filedir=datdir, celdir=celdir, celfiles=celfiles)
> Opening file
> </Volumes/MitziData/CRAN/Workspaces/hugene11/na33/hugene11stv1.root> in
> <READ> mode...
> Creating new temporary file
> </Volumes/MitziData/CRAN/Workspaces/hugene11/tmp_HuBrPr_cel.root>...
> Importing
> </Volumes/MitziData/CRAN/Workspaces/hugene11/celtest/Brain_01_1.1.CEL>
> as <Brain_01_1.1.cel>...
>      hybridization statistics:
>         1 cells with minimal intensity 17.5
>         1 cells with maximal intensity 22402.1
> New dataset <DataSet> is added to Content...
> Importing
> </Volumes/MitziData/CRAN/Workspaces/hugene11/celtest/Prostate_01_1.1.CEL> as
> <Prostate_01_1.1.cel>...
>      hybridization statistics:
>         2 cells with minimal intensity 14.5
>         1 cells with maximal intensity 23266.3
>   > for (i in 1:length(rawCELName(data.genome11, fullpath = FALSE)))
> +    cat(sprintf("%s\n", rawCELName(data.genome11, fullpath = FALSE)[i]))
> Error: Tree set <> could not be found in file content
> Error: Tree set <> could not be found in file content
>
>
> As you can see  I can now replicate your error.
>
> The solution is simple, i.e. use parameter 'celnames'. Now the result is:
>   > celfiles <- c("Brain_01_1.1.CEL","Prostate_01_1.1.CEL")
>   > celnames <- c("Brain01","Prostate01")
>   > data.genome11 <- import.data(scheme.hugene11, "tmp_HuBrPr",
> filedir=datdir, celdir=celdir, celfiles=celfiles, celnames=celnames)
> Opening file
> </Volumes/MitziData/CRAN/Workspaces/hugene11/na33/hugene11stv1.root> in
> <READ> mode...
> Creating new temporary file
> </Volumes/MitziData/CRAN/Workspaces/hugene11/tmp_HuBrPr_cel.root>...
> Importing
> </Volumes/MitziData/CRAN/Workspaces/hugene11/celtest/Brain_01_1.1.CEL>
> as <Brain01.cel>...
>      hybridization statistics:
>         1 cells with minimal intensity 17.5
>         1 cells with maximal intensity 22402.1
> New dataset <DataSet> is added to Content...
> Importing
> </Volumes/MitziData/CRAN/Workspaces/hugene11/celtest/Prostate_01_1.1.CEL> as
> <Prostate01.cel>...
>      hybridization statistics:
>         2 cells with minimal intensity 14.5
>         1 cells with maximal intensity 23266.3
>   > for (i in 1:length(rawCELName(data.genome11, fullpath = FALSE)))
> +    cat(sprintf("%s\n", rawCELName(data.genome11, fullpath = FALSE)[i]))
> Brain_01_1.1.CEL
> Prostate_01_1.1.CEL
>
> As you can see, now everything works fine. The reason for introducing
> parameter 'celnames' was from the beginning to allow alternative names
> w/o the need to change the names of the original CEL-files, since often
> CEL-files had names such as 'Breast_tissue;24/08/1999;batch-1,lot-2.1.CEL'.
>
> I hope that using parameter 'celnames' does solve your problem.
>
> Best regards,
> Christian
>
>
> On 1/10/13 9:10 PM, cstrato wrote:
>> Dear Philip,
>>
>> I have just tried a subset of CEL-files from the Affymetrix
>> "gene_1_1_st_ap_tissue_sample_data" for HuGene_1.1 array, but I cannot
>> repeat the error you get. Here is my output for one CEL-file only:
>>
>>   > library(xps)
>>
>> Welcome to xps version 1.19.1
>>       an R wrapper for XPS - eXpression Profiling System
>>       (c) Copyright 2001-2013 by Christian Stratowa
>>
>>   > scheme <- root.scheme("./na33/hugene11stv1.root")
>>   > x.xps <- import.data(scheme, "tmp_x", celdir = "./cel", celfiles =
>> "HumanBrain_1.CEL", verbose = TRUE)
>> Opening file <./na33/hugene11stv1.root> in <READ> mode...
>> Creating new temporary file
>> </Volumes/MitziData/CRAN/Workspaces/hugene11/tmp_x_cel.root>...
>> Importing <./cel/HumanBrain_1.CEL> as <HumanBrain_1.cel>...
>>      hybridization statistics:
>>         1 cells with minimal intensity 17.5
>>         1 cells with maximal intensity 22402.1
>> New dataset <DataSet> is added to Content...
>>   > cat("The loaded .CEL-files are:\n");
>> The loaded .CEL-files are:
>>   > for (i in 1:length(rawCELName(x.xps, fullpath = FALSE)))
>> +   cat(sprintf("%s\n", rawCELName(x.xps, fullpath = FALSE)[i]));
>> HumanBrain_1.CEL
>>   >
>>   > sessionInfo()
>> R version 2.15.0 (2012-03-30)
>> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
>>
>> locale:
>> [1] C
>>
>> attached base packages:
>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>
>> other attached packages:
>> [1] xps_1.19.1
>>
>> loaded via a namespace (and not attached):
>> [1] tools_2.15.0
>>   >
>>
>>
>> As you see everything is ok. I did also run the triplicates of the Brain
>> and Prostate samples and could do RMA w/o problems.
>>
>> Could you please try the following two options:
>>
>> 1, Could you try to use the CEL-files from the Affymetrix dataset to
>> make sure that there is no problem with the CEL-files.
>>
>> 2, I see that you did create the ROOT scheme files in directory:
>> scmdir <- paste(.path.package("xps"), "schemes/", sep = "/")
>>
>> I must admit that I have never tried to store the scheme files in the
>> package directory, since I have the feeling that this may cause
>> troubles, especially when you update R and/or the xps package to a new
>> version.
>> Could you please try to save your file "hugene11stv1.root" in a
>> different directory such as '/home/degroot/schemes' or better to create
>> this file in this directory, and then try if you still get the problem.
>>
>> Best regards,
>> Christian
>>
>>
>> On 1/10/13 1:03 PM, Groot, Philip de wrote:
>>> Hi Christian,
>>>
>>> I am trying to do an analysis using xps and the hugene11 chip. However,
>>> I run into problems for which I need your help.
>>>
>>> I created a small test-script to demonstrate the problem:
>>>
>>> library(xps)
>>>
>>> scheme <-
>>> root.scheme("/local2/R-2.15.2/library/xps/schemes/hugene11stv1.root")
>>>
>>> x.xps <- import.data(scheme, "tmp_x", celdir = ".", celfiles =
>>> "G092_A05_01_1.1.CEL", verbose = TRUE)
>>>
>>> cat("The loaded .CEL-files are:\n");
>>>
>>> for (i in 1:length(rawCELName(x.xps, fullpath = FALSE)))
>>>
>>>     cat(sprintf("%s\n", rawCELName(x.xps, fullpath = FALSE)[i]));
>>>
>>> Upon execution, I get:
>>>
>>>> library(xps)
>>>
>>> Welcome to xps version 1.18.1
>>>
>>>       an R wrapper for XPS - eXpression Profiling System
>>>
>>>       (c) Copyright 2001-2012 by Christian Stratowa
>>>
>>>> scheme <-
>>>> root.scheme("/local2/R-2.15.2/library/xps/schemes/hugene11stv1.root")
>>>
>>>> x.xps <- import.data(scheme, "tmp_x", celdir = ".", celfiles =
>>>> "G092_A05_01_1.1.CEL", verbose = TRUE)
>>>
>>> Opening file </local2/R-2.15.2/library/xps/schemes/hugene11stv1.root> in
>>> <READ> mode...
>>>
>>> Creating new temporary file
>>> </mnt/geninf16/home/guests/pdegroot/dataanalysis/PHILIPG/tmp_x_cel.root>...
>>>
>>>
>>> Importing <./G092_A05_01_1.1.CEL> as <G092_A05_01_1.1.cel>...
>>>
>>>      hybridization statistics:
>>>
>>>         1 cells with minimal intensity 19
>>>
>>>         1 cells with maximal intensity 21364.4
>>>
>>> New dataset <DataSet> is added to Content...
>>>
>>>>
>>>
>>>> cat("The loaded .CEL-files are:\n");
>>>
>>> The loaded .CEL-files are:
>>>
>>>> for (i in 1:length(rawCELName(x.xps, fullpath = FALSE)))
>>>
>>> +   cat(sprintf("%s\n", rawCELName(x.xps, fullpath = FALSE)[i]));
>>>
>>> Error: Tree set <> could not be found in file content
>>>
>>> Error: Tree set <> could not be found in file content
>>>
>>> NA
>>>
>>> The weird thing is: I only have this problem with the hugene11 chip. As
>>> far as I can see, al other chips work properly (still na32 based).
>>>
>>> This effects all other steps, because there is no “content” to normalise
>>> etc.
>>>
>>> I created the root-scheme as follows:
>>>
>>> scmdir <- paste(.path.package("xps"), "schemes/", sep = "/")
>>>
>>> scheme <- import.exon.scheme("hugene11stv1",filedir=scmdir,
>>> layoutfile=paste(libdir, "HuGene-1_1-st-v1.r4.clf", sep="/"),
>>> schemefile=paste(libdir,"HuGene-1_1-st-v1.r4.pgf", sep="/"),
>>> probeset=paste(anndir,"HuGene-1_1-st-v1.na33.1.hg19.probeset.csv",
>>> sep="/"),
>>> transcript=paste(anndir,"HuGene-1_1-st-v1.na33.1.hg19.transcript.csv",
>>> sep="/"), add.mask = TRUE)
>>>
>>> (libdir and anndir are also defined off course).
>>>
>>> I even updated the na32 annotation to the latest Affymetrix version
>>> (na33) the exclude a problem there. It does not fix the issue.
>>>
>>> Please note that I am running root version 5.32/04 as version 5.32/01 is
>>> no longer available for download. Root works properly as far as I can
>>> see.
>>>
>>> Do you have any clue where this problem originates from? Thank you!
>>>
>>> sessionInfo():
>>>
>>>> sessionInfo()
>>>
>>> R version 2.15.2 (2012-10-26)
>>>
>>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>>
>>> locale:
>>>
>>> [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>>>
>>>    [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>>>
>>>    [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>>>
>>>    [7] LC_PAPER=C                 LC_NAME=C
>>>
>>>    [9] LC_ADDRESS=C               LC_TELEPHONE=C
>>>
>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>>
>>> attached base packages:
>>>
>>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>>
>>> other attached packages:
>>>
>>> [1] xps_1.18.1
>>>
>>> loaded via a namespace (and not attached):
>>>
>>> [1] tools_2.15.2
>>>
>>> Regards,
>>>
>>> *Dr. Philip de Groot
>>> Bioinformatician / Microarray analysis expert*
>>>
>>> Wageningen University / TIFN
>>> Netherlands Nutrigenomics Center (NNC)
>>>
>>> Nutrition, Metabolism & Genomics Group
>>> Division of Human Nutrition
>>> PO Box 8129, 6700 EV Wageningen
>>> Visiting Address:
>>>
>>> "De Valk" ("Erfelijkheidsleer"),
>>>
>>> Building 304,
>>> Verbindingsweg 4, 6703 HC Wageningen
>>> Room: 0052a
>>> T: 0317 485786
>>> F: 0317 483342
>>> E-mail: Philip.deGroot at wur.nl <mailto:Philip.deGroot at wur.nl>
>>> I: http://humannutrition.wur.nl <http://humannutrition.wur.nl/>
>>>
>>> https://madmax.bioinformatics.nl
>>>
>>> http://www.nutrigenomicsconsortium.nl
>>> <http://www.nutrigenomicsconsortium.nl/>
>>>
>>>
>>>
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
>



More information about the Bioconductor mailing list