[BioC] Problem reading Cel files - Oligo Package

Henrik Bengtsson hb at biostat.ucsf.edu
Wed Aug 28 18:09:03 CEST 2013


Hi Jim, sure. /Henrik

On Wed, Aug 28, 2013 at 8:13 AM, Henrik Bengtsson <hb at biostat.ucsf.edu> wrote:
> On Wed, Aug 28, 2013 at 7:44 AM, James W. MacDonald <jmacdon at uw.edu> wrote:
>> Hi Atul,
>>
>>
>> On 8/27/2013 11:18 PM, Atul wrote:
>>>
>>> Hi All,
>>>
>>> I am trying to read four *.Cel files into oligo and getting this error:
>>>
>>> > celFiles <- list.celfiles()
>>> > celFiles
>>> [1] "Iris.CEL" "Liv1.CEL" "Liv2.CEL" "Liv3.CEL"
>>> > AF_data = read.celfiles(celFiles)
>>> All the CEL files must be of the same type.
>>> Error: checkChipTypes(filenames, verbose, "affymetrix", TRUE) is not TRUE
>>>
>>> Then I tried reading files separately (one by one) and found that one
>>> sample (Iris.CEL) shows annotation package as 'pd.huex.1.0.st.v1' while rest
>>> (Liv1,Liv2,Liv3) are 'pd.huex.1.0.st.v2'. I checked on GEO and found that
>>> though all the samples are from different studies but were generated using
>>> same chip - Human Exon 1.0 ST Arrays and the one which is giving error
>>> (Iris.cel )have 'HuEx-1_0-st-v2.r2.dt1.hg18.core.ps' mentioned under data
>>> processing description, that means it is also version2 of HuEx 1.0ST.
>>>
>>> So I explicitly mentioned annotation package 'pd.huex.1.0.st.v2' instead
>>> of the one recognized by oligo ('pd.huex.1.0.st.v1') and file is read
>>> without any problem:
>>>
>>> > celFiles <- list.celfiles()
>>> > celFiles
>>> [1] "Iris.CEL"
>>> > AF_data = read.celfiles(celFiles,pkgname='pd.huex.1.0.st.v2')
>>> Platform design info loaded.
>>> Reading in : Iris.CEL
>>>
>>> But if I add other files and try same thing, than the error is back:
>>> > celFiles <- list.celfiles()
>>> > celFiles
>>> [1] "Iris.CEL" "Liv1.CEL" "Liv2.CEL" "Liv3.CEL"
>>> > AF_data = read.celfiles(celFiles,pkgname='pd.huex.1.0.st.v2')
>>> All the CEL files must be of the same type.
>>> Error: checkChipTypes(filenames, verbose, "affymetrix", TRUE) is not TRUE
>>>
>>>
>>> Can anybody please tell me why annotation package for Iris.cel which is
>>> from HuEx 1.0ST v2 (from NCBI GEO description) is recognized as
>>> 'pd.huex.1.0.st.v1'? If explicitly mention package name pd.huex.1.0.st.v2
>>> and try to read Iris.cel alone, it works. But if read with other cel files
>>> with same annotation (pd.huex.1.0.st.v2) it gives error??
>>
>>
>> The Iris.cel file is a HuEx-1_0-st-v1, according to the header in that file:
>>
>>> sapply(fls, oligo:::getCelChipType, useAffyio=T)
>> GSM1008547_02_V-2_Pool-Normal-Iris_11-18-09_S1.CEL.gz
>>                                      "HuEx-1_0-st-v1"
>>                                      GSM486433.CEL.gz
>>                                      "HuEx-1_0-st-v2"
>>
>> And the others you are trying to read are version 2. It doesn't really
>> matter what GEO says, as the information on GEO come from the submitter, and
>> they evidently made a mistake.
>>
>> I don't know what, if any, differences there are between the two versions.
>> In addition, there isn't anything I can see on the Affy website that says
>> what differences there may be. Certainly they have the same number of probes
>> and the probe IDs are all the same.
>
> I have some old notes on this at
> http://aroma-project.org/chipTypes/HuEx-1_0-st-v2;
>
> "Note II: Older CEL files for this chip type, may be reported to have
> chip type 'HuEx-1_0-st-v1'.  This chip is slightly different from the
> 'HuEx-1_0-st-v2' chip.  According to Affymetrix support, the
> difference is only in the control probes; "There is only a minor
> difference between the v1 and the v2 library files and it has to do
> with the manufacturing controls on the array. There is no difference
> with the probes interrogating the exons between v1 and v2.", cf.
> Thread 'Discussion on affymetrix-defined-transcript-clusters' (Nov
> 25-Dec 2, 2008).  We don't have details on the exact differences and
> we don't have access to the HuEx-1_0-st.v1.CDF (please fwd if you have
> it), but from Affymetrix' feedback it sounds like one could use the
> new HuEx-1_0-st-v2.CDF. "
>
> I guess one could compare the probe sequences for the two to
> ultimately find out how they differ.
>
> /Henrik
>
>> So you can combine:
>>
>>> fls <- dir(pattern = "CEL.gz")
>>> dat1 <- read.celfiles(fls[1], pkgname="pd.huex.1.0.st.v2")
>> Loading required package: pd.huex.1.0.st.v2
>> Loading required package: RSQLite
>> Loading required package: DBI
>> Platform design info loaded.
>> Reading in : GSM1008547_02_V-2_Pool-Normal-Iris_11-18-09_S1.CEL.gz
>>> dat2 <- read.celfiles(fls[2]) ## note that you would use all three of the
>>> other celfiles for this step
>> Platform design info loaded.
>> Reading in : GSM486433.CEL.gz
>>> dat <- combine(dat1, dat2)
>> Warning messages:
>> 1: In alleq(levels(x[[nm]]), levels(y[[nm]])) : 1 string mismatch
>> 2: data frame column 'exprs' levels not all.equal
>> 3: In alleq(levels(x[[nm]]), levels(y[[nm]])) : 1 string mismatch
>> 4: data frame column 'dates' levels not all.equal
>>> all.equal(featureNames(dat1), featureNames(dat2))
>> [1] TRUE
>>> dat
>> ExonFeatureSet (storageMode: lockedEnvironment)
>> assayData: 6553600 features, 2 samples
>>   element names: exprs
>> protocolData
>>   rowNames: GSM1008547_02_V-2_Pool-Normal-Iris_11-18-09_S1.CEL.gz
>>     GSM486433.CEL.gz
>>   varLabels: exprs dates
>>   varMetadata: labelDescription channel
>> phenoData
>>   rowNames: GSM1008547_02_V-2_Pool-Normal-Iris_11-18-09_S1.CEL.gz
>>     GSM486433.CEL.gz
>>   varLabels: index
>>   varMetadata: labelDescription channel
>> featureData: none
>> experimentData: use 'experimentData(object)'
>> Annotation: pd.huex.1.0.st.v2
>>
>> You should note however that this isn't a recommendation on my part that you
>> should do this. I don't know what these data are, nor what you are planning
>> to do with them. In general combining data from two or more completely
>> different experiments is a very tricky endeavor. Using something like fRMA
>> (if there are frozen estimates for this chip type) might be a better way to
>> go.
>>
>> Best,
>>
>> Jim
>>
>>
>>
>>>
>>> NCBI GEO ID:
>>> Iris.cel - GSM1008547
>>> Liv1/2/3 - GSM486433/GSM486434/GSM486435
>>>
>>> Awaiting help.
>>>
>>> AK
>>>
>>>
>>> Session Info:
>>>
>>> > sessionInfo()
>>> R version 3.0.1 (2013-05-16)
>>> Platform: x86_64-pc-linux-gnu (64-bit)
>>>
>>> locale:
>>>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C LC_TIME=en_US.UTF-8
>>> LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8
>>>  [6] LC_MESSAGES=en_US.UTF-8    LC_PAPER=C LC_NAME=C
>>> LC_ADDRESS=C LC_TELEPHONE=C
>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>>
>>> attached base packages:
>>> [1] parallel  stats     graphics  grDevices utils     datasets methods
>>> base
>>>
>>> other attached packages:
>>> [1] pd.huex.1.0.st.v2_3.8.0 RSQLite_0.11.4 DBI_0.2-7 oligo_1.24.2
>>> Biobase_2.20.1          oligoClasses_1.22.0
>>> [7] BiocGenerics_0.6.0
>>>
>>> loaded via a namespace (and not attached):
>>>  [1] affxparser_1.32.3     affyio_1.28.0 BiocInstaller_1.10.1
>>> Biostrings_2.28.0     bit_1.1-10 codetools_0.2-8
>>>  [7] ff_2.2-11             foreach_1.4.0 GenomicRanges_1.12.4
>>> IRanges_1.18.1        iterators_1.0.6 preprocessCore_1.22.0
>>> [13] splines_3.0.1         stats4_3.0.1 tools_3.0.1 zlibbioc_1.6.0
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>>
>> --
>> James W. MacDonald, M.S.
>> Biostatistician
>> University of Washington
>> Environmental and Occupational Health Sciences
>> 4225 Roosevelt Way NE, # 100
>> Seattle WA 98105-6099
>>
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list