[BioC] Anyone Know how to make a fake CEL file?

Wed Oct 15 16:24:50 MEST 2003

Hi Johannes,
I am going to try and use real values from experimental cel files to make various fake cel files w/ varying percentages of genes changing over time; i.e. having 20% of the genes change w/ a 2-5 fc by the third time point. The point of all of this is to try and understand what happens during normalization. With data analyses, anywhere between 2-30, normalizing seems to be fine, but when you start incorporating very different experimental conditions in very large data groups (60+), normalizing seems to minimize the differences between the conditions. 

We would like to create an analysis of 100+ chips of real data to understand various cell  types, but with chips that are very different and that many different chips, normalization seems to severly limit the ability to see the differences between the conditions. That is why it would be nice to see the effects of rma (normalization and background correcting) and a comparison with MAS 5.0 values with spline normalization on a large set of "fake" cel data. 

> i2xy <- function(i) cbind((i-1) %% 640, (i-1) %/% 640) 
  #this is for HGU-95Av2 chips
Does anyone know the corresponding i2xy function that would be needed for the mu72av2 chip? 

I would appreciate any feedback from the bioconductor community. I haven't found anything  on the internet or literature that addresses this problem. 

thanks everyone, 
Richard Park 

-----Original Message-----
From: Johannes Freudenberg [mailto:mai98ftu at studserv.uni-leipzig.de]
Sent: Tuesday, October 14, 2003 16:6 PM
To: Park, Richard
Subject: Re: [BioC] Anyone Know how to make a fake CEL file?

Hi,

> how do you know the corresponding x and y locations on
> the chip that correspond with the various affy ids?

The information on the probe locations is stored in the cdf environments and 
can be accessed as follows:

> env <- getCdfInfo(Dilution) #get the CDF environment
> 
> #get the probe locations
> loc <- apply(matrix(ls(env = env)), 1, get, env = env) 
> 
> #That's how it's done in S-Plus
> #loc <- getCdfInfo(Dilution)
> 
> loc[[1]] # show the probe locations of the first gene
          pm     mm
 [1,] 175218 175858
 [2,] 356689 357329
 [3,] 227696 228336
 [4,] 237919 238559
 ...

These indices refer to the rows of the intensity matrix which is stored in the 
@exprs slot of the affybatch object.  In order to get the corresponding x and y 
coordinates you can use the i2xy() function:

> i2xy <- function(i) cbind((i-1) %% 640, (i-1) %/% 640) 
  #this is for HGU-95Av2 chips
  #corrected version, older BioC version incorrect!
  #search BioC mailing list archive for more details

Out of curiosity, may I ask how you are going to 'fake' the different 
treatments?  Are you using real data or simulated data?

Best wishes,
Johannes

Quoting "Park, Richard" <Richard.Park at joslin.harvard.edu>:

> I am trying to make a couple of fake cel files to represent a time
> course treatment between three time points. 
> I am trying to test the effects of normalization on various possible
> treatments. 
> Is there a way to make a fake CEL file? 
> and if there is, how do you know the corresponding x and y locations on
> the chip that correspond with the various affy ids? I know that this
> information is located in the various cdf files, but I am unaware of how
> to access that information. 
> 
> Thanks for any help, 
> 
> 
> Richard Park 
> Immunology - Computational Data Analyzer
> Joslin Diabetes Center
> Ph: 617-732-2482
> Richard.Park at joslin.harvard.edu
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor
>