[BioC] Working flow_own CEL file reading problem

James W. MacDonald jmacdon at med.umich.edu
Fri May 20 21:11:56 CEST 2005


Junshi Yazaki wrote:
> Hi Jim, Seth, Reddy, Paul,
> 
> Thank you  very much for your suggestion.  I may be make cdf 
> environment. Could you please help me how to confirm the env is OK or 
> No? Next I tried cel file reading and normalize from our custom affy 
> array. If my working flow are useful for affy beginner like me, could 
> you please help me?
> 
> At first, I typed below...
> 
>> source("http://www.bioconductor.org/getBioC.R")
>> getBioC("all")
>> library(makecdfenv)
>> Library(affy)
>> make.cdf.package ("arabidopsistlgF.cdf")
> 
> 
> And move to Terminal on my Mac,
> 
>> R CMD INSTALL arabidopsistlgFcdf

It should actually be arabidopsistlgfcdf. Note that the F is lower case.

> 
> Return to R,
> 
>> arabidopsistlgF = make.cdf.env("arabidopsistlgF.cdf")
> 

This is an unnecessary step - you already made and installed the package.

> 
> And I shut down my Mac. Is these step correct for making cdf 
> environment? And then I started again.
> 
>>  source("http://www.bioconductor.org/getBioC.R")
>>  getBioC()
>>  library(affy)
>> Data <- readAffy()

At this point, try

cleancdfname(cdfName(Data))

if the result is not arabidopsistlgfcdf, then you need to make your 
cdfenv again, using the cleancdfname().

I am betting the cleancdfname will be arabidopsistlgf4xcdf, so you will 
need to do

make.cdf.package("arabidopsistlgF.cdf", packagename="arabidopsistlgf4xcdf")

And then install using R CMD INSTALL


>> eset <- rma(data)
> 
> 
> I got Error below,
> ***********
> Note: You did not specify a download type.  Using a default value of: 
> Source
> This will be fine for almost all users
> 
> Error in getCdfInfo(object) : Could not obtain CDF environment, problems 
> encountered:
> Specified environment specified did not contain arabidopsis_tlgF_4x
> Library - package arabidopsistlgf4xcdf not installed
> Data for package affy did not contain arabidopsis_tlgF_4x
> Bioconductor - arabidopsistlgf4xcdf not available
> *********
> Q1. I have question. Do I need typing below every time after restart? If 
> I need the typing every time for making cdf env, I need lot of time for 
> this step (cdf file is big).

No. If you install correctly, it should be there for you every time you 
run R.

> **********
> 
>> source("http://www.bioconductor.org/getBioC.R")
>> getBioC("all")
>> library(makecdfenv)
>> Library(affy)
>> make.cdf.package ("arabidopsistlgF.cdf")
> 
> **********
> And next, I tried makecdfenv again like below,
> 
>>  env =  make.cdf.env("arabidopsistlgF.cdf")
>>  library(makecdfenv)
>>  env =  make.cdf.env("arabidopsistlgF.cdf")
>>  cel.files=list.files(pattern=".CEL$")
>>  data=ReadAffy(filenames=cel.files)
>>  pname<- cleancdfname(whatcdf("J_HpaII_Wt_10uM.CEL"))
>>  temp=rma(data)
> 
> 
> I got Error below,
> ******
> Note: You did not specify a download type.  Using a default value of: 
> Source
> This will be fine for almost all users
> 
> Error in getCdfInfo(object) : Could not obtain CDF environment, problems 
> encountered:
> Specified environment specified did not contain arabidopsis_tlgF_4x
> Library - package arabidopsistlgf4xcdf not installed
> Data for package affy did not contain arabidopsis_tlgF_4x
> Bioconductor - arabidopsistlgf4xcdf not available
> *********
> So I made copy of "arabidopsistlgF.cdf", and change name 
> "arabidopsistlgF4x". And continue,
> 
>>   env =  make.cdf.env("arabidopsistlgF4x.cdf")
>>  cel.files=list.files(pattern=".CEL$")
>>  data=ReadAffy(filenames=cel.files)
>>
>>  pname<- cleancdfname(whatcdf("J_HpaII_Wt_10uM.CEL"))
> 
> 
> I got Error again,
> ********
> Error in whatcdf("J_HpaII_Wt_10uM.CEL") : Could not open file 
> J_HpaII_Wt_10uM.CEL
> ********
> 
> I thought I may be need  for cel file normalization, below,
> 
>>  library(gcrma)
> 
> Loading required package: matchprobes
> 
>>  Data <- ReadAffy()
>>  eset <- gcrma(Data)
> 
> 
> I got Error again,
> ********
> Computing affinities[1] "Checking to see if your internet connection 
> works..."
> Warning message:
> unable to connect to 'www.bioconductor.org' on port 80.
> Note: http://www.bioconductor.org/repository/devel/package/Source does 
> not seem to have a valid repository, skipping
> Warning messages:
> 1: Failed to read replisting at 
> http://www.bioconductor.org/repository/devel/package/Source in: 
> getReplisting(repURL, repFile, method = method)
> 2: unable to connect to 'www.bioconductor.org' on port 80.
> Note: http://www.bioconductor.org/repository/devel/package/Win32 does 
> not seem to have a valid repository, skipping
> Note: You did not specify a download type.  Using a default value of: 
> Source
> This will be fine for almost all users
> 
> Error in getCDF(cdfpackagename) : Environment arabidopsistlgf4xcdf was 
> not found in the Bioconductor repository.
> In addition: Warning message:
> Failed to read replisting at 
> http://www.bioconductor.org/repository/devel/package/Win32 in: 
> getReplisting(repURL, repFile, method = method)
> ********
> Q2. I can not read my cel file now. Our cdf file name is 
> "arabidopsistlgF.cdf" . But cif file name is "arabidopsistlgF_4x.cif". 
> Do I need to use same name for cif and cdf? Because cel file include cif 
> file name. And how can I start to read cel file?

Once you have the cdfenv installed correctly, you can read celfiles 
using ReadAffy().
> 
> Q3. And also I would like to read cel file and normalization using a lot 
> of cel files. Could you please suggest me what package is better for 
> reading and normalization of affy custom array? and which is better  rma 
> (Robust Multi-Array Average expression measure) or gcrma (Background 
> adjustment using sequence information)?

Which are better, apples or oranges? I guess it all depends on who you ask.


> 
> Q4. If our array has over 3 million data, how long do I need for reading 
> and normalization for 1 data (depend on machine power?)?  Do you have 
> some speculation for calculation efficiency? I need to read cdf file for 
> about 20min.

If it takes that long to read in the cdf I am betting you are using 
virtual memory. In that case, you really need to get more RAM or things 
will be crushingly slow.

HTH,

Jim


> 
> Thank you very much,
> Junshi


-- 
James W. MacDonald
Affymetrix and cDNA Microarray Core
University of Michigan Cancer Center
1500 E. Medical Center Drive
7410 CCGC
Ann Arbor MI 48109
734-647-5623



More information about the Bioconductor mailing list