[BioC] a problem in reading in cel files

James F. Reid james.reid at ifom-ieo-campus.it
Fri Feb 10 14:44:32 CET 2012


Hi Manuela,

it looks like GSM310016.CEL starts with a blank line before the [CEL] 
header, no idea why this is so ?!?
Removing this first empty line solves the issue, maybe check the other 
CEL files too.

J.

On 10/02/12 10:40, Manuela Di Russo wrote:
> Dear all,
> I am learning to analyse Affymetrix microarray data but I have a problem in reading .cel files in.
> I downloaded from GEO the raw data provided as supplementary files (GSE12345_RAW.tar), than I have extracted the cel files in a directory which I have set as my working directory.
> Here is the R code I used:
>
>> setwd("C:/BACKUP/Dati/Progetti/Landi/meta-analisi MPM/GSE12345_RAW")
>> library(affy)
> Carico il pacchetto richiesto: Biobase
>
> Welcome to Bioconductor
>
>    Vignettes contain introductory material. To view, type
>    'browseVignettes()'. To cite Bioconductor, see
>    'citation("Biobase")' and for packages 'citation("pkgname")'.
>
>> dir()
>   [1] "data analysis.txt"     "E-GEOD-12345.sdrf.txt" "E-GEOD-12345.sdrf.xls"
>   [4] "GSM309986.CEL"         "GSM309987.CEL"         "GSM309988.CEL"
>   [7] "GSM309989.CEL"         "GSM309990.CEL"         "GSM309991.CEL"
> [10] "GSM310012.CEL"         "GSM310013.CEL"         "GSM310014.CEL"
> [13] "GSM310015.CEL"         "GSM310016.CEL"         "GSM310068.CEL"
> [16] "GSM310070.CEL"         "target.txt"            "target.xls"
>> pd<- read.AnnotatedDataFrame("target.txt",header=TRUE,row.names=1,as.is=TRUE)
>> pData(pd)
>           FileName              Target
> N1  GSM309986.CEL      pleural tissue
> N2  GSM309987.CEL      pleural tissue
> N3  GSM309988.CEL      pleural tissue
> N4  GSM309989.CEL      pleural tissue
> MM1 GSM309990.CEL mesothelioma tissue
> MM2 GSM309991.CEL mesothelioma tissue
> MM3 GSM310012.CEL mesothelioma tissue
> MM4 GSM310013.CEL mesothelioma tissue
> MM5 GSM310014.CEL mesothelioma tissue
> MM6 GSM310015.CEL mesothelioma tissue
> MM7 GSM310016.CEL mesothelioma tissue
> MM8 GSM310068.CEL mesothelioma tissue
> MM9 GSM310070.CEL mesothelioma tissue
>> rawData<- read.affybatch(filenames=pData(pd)$FileName,phenoData=pd)
> Error in try(.Call("ReadHeaderDetailed", filename, PACKAGE = "affyio")) :
>    Is GSM310016.CEL really a CEL file? tried reading as text, gzipped text, binary, gzipped binary, command console and gzipped command console formats.
>
> Errore in read.celfile.header(filenames[i], info = "full") :
>    Failed to get full header information for GSM310016.CEL
>> rawData1<-ReadAffy()
> Error in try(.Call("ReadHeaderDetailed", filename, PACKAGE = "affyio")) :
>    Is C:/BACKUP/Dati/Progetti/Landi/meta-analisi MPM/GSE12345_RAW/GSM310016.CEL really a CEL file? tried reading as text, gzipped text, binary, gzipped binary, command console and gzipped command console formats.
>
> Errore in read.celfile.header(filenames[i], info = "full") :
>    Failed to get full header information for C:/BACKUP/Dati/Progetti/Landi/meta-analisi MPM/GSE12345_RAW/GSM310016.CEL
>> sessionInfo()
> R version 2.14.1 (2011-12-22)
> Platform: i386-pc-mingw32/i386 (32-bit)
>
> locale:
> [1] LC_COLLATE=Italian_Italy.1252  LC_CTYPE=Italian_Italy.1252
> [3] LC_MONETARY=Italian_Italy.1252 LC_NUMERIC=C
> [5] LC_TIME=Italian_Italy.1252
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] affy_1.32.1    Biobase_2.14.0
>
> loaded via a namespace (and not attached):
> [1] affyio_1.22.0         BiocInstaller_1.2.1   preprocessCore_1.16.0
> [4] zlibbioc_1.0.0
>> traceback()
> 7: stop("Failed to get full header information for ", filename)
> 6: read.celfile.header(filenames[i], info = "full")
> 5: FUN(1:13[[11L]], ...)
> 4: lapply(X = X, FUN = FUN, ...)
> 3: sapply(seq_len(length(filenames)), function(i) {
>         sdate<- read.celfile.header(filenames[i], info = "full")[["ScanDate"]]
>         if (is.null(sdate) || length(sdate) == 0)
>             NA_character_
>         else sdate
>     })
> 2: read.affybatch(filenames = l$filenames, phenoData = l$phenoData,
>         description = l$description, notes = notes, compress = compress,
>         rm.mask = rm.mask, rm.outliers = rm.outliers, rm.extra = rm.extra,
>         verbose = verbose, sd = sd, cdfname = cdfname)
> 1: ReadAffy()
>
> May be there is a problem in reading the cel file header, so I opened one of the cel files with a text-editor but it seems correct.
> Can anyone help me?
> Thank you very much!
> Manuela
>
> ----------------------------------------------------------------------------------------
> Manuela Di Russo, Ph.D. Student
> Department of Experimental Pathology, MBIE
> University of Pisa
> Pisa, Italy
> e-mail: manuela.dirusso at for.unipi.it
> tel: +39050993538
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>



More information about the Bioconductor mailing list