[BioC] How to permute the CEL files

Martin Morgan mtmorgan at fhcrc.org
Sun Jun 21 05:58:21 CEST 2009


Hi Javier --

Have you tried to use Rprof() to profile your code, and see where it
is slow? my guess is below...

Javier Pérez Florido <jpflorido at gmail.com> writes:

> Hello everybody,
> I'd like to permutate the values (intensity raw data) of the genes
> within a CEL file of an Affymetrix experiment for using the "new" CEL
> file as a control. The key idea is that I want to keep the order of
> the genes in the chip, but permutate the intensities of the genes,
> that is, asign to each gene contained in the array the intensity of
> other gene selected randomly (if and only if the number of probes is
> the same). To do so, and using only AFFX genes, the code is:
>
> data<-Dilution
> dataBackup<-Dilution
>
> ->geneNames<-featureNames(data) # Get gene names
> ->AFFX_genes<-grep("^AFFX",geneNames) # Get positions of AFFX genes
> ->numArrays<-length(sampleNames(data)) # Num of arrays in the experiment
>
>
> ->for (i in 1:numArrays) #Permutate each array
> ->{
>
>    ->permutation_AFFX<-sample(1:numAFFX_genes) # Get a permutation for
> AFFX genes
>
>    # Indexes of PMs and MMs probes for AFFX genes
>    ->indexes_pm<-indexProbes(data,which="pm",genenames=geneNames[AFFX_genes])
>    ->indexes_mm<-indexProbes(data,which="mm",genenames=geneNames[AFFX_genes])
>      # Indexes of PMs and MMs probes for the AFFX genes selected randomly
>    ->indexes_pm_permutation<-indexProbes(data,which="pm",genenames=geneNames[AFFX_genes[permutation_AFFX]]) 
>
>    ->indexes_mm_permutation<-indexProbes(data,which="mm",genenames=geneNames[AFFX_genes[permutation_AFFX]]) 
>
>
>     -> for (j in 1:numAFFX_genes)# Change the intensity value of each
> AFFX gene (the intensity value of each probe) using the intensities of
> other AFFX gene selected randomly...
>     ->{
>        # ...if and only if the number of PM and MM probes for the
> current gene and the one used for permutation is the same
>       -> if(length(indexes_pm[[j]])==length(indexes_pm_permutation[[j]]))
>       ->{
> intensity(data)[indexes_pm[[j]],i]<-intensity(dataBackup)[indexes_pm_permutation[[j]],i]
> # Assign new intensity to the PM probes
>           -> 
> intensity(data)[indexes_mm[[j]],i]<-intensity(dataBackup)[indexes_mm_permutation[[j]],i]
> # Assign new intensity to the MM probes
>        ->}
> ->}

I suspect the slow part is this 'for'. I did the test as

  ok <- sapply(indexes_pm, length) == sapply(indexes_mm, length)

To perform the assignment I made a helper function

  ulist <- function(x) unlist(x, uses.names=FALSE)

and then

  intensity(data)[ulist(indexes_pm[ok]),i] <-
      intensity(dataBackup)[ulist(indexes_pm_permutation[ok]),i]
  intensity(data)[ulist(indexes_mm[ok]),i] <-
      intensity(dataBackup)[ulist(indexes_mm_permutation[ok]),i]

your code wasn't directly reproducible (data Dilution not loaded,
numAFFX_genes not defined, unbalanced }, -> at start of lines, long
comment lines wrapped by your or my email client) so I don't know
whether this actually works, but hopefully gets you going.

Martin

> The problem is that, this way, it takes so much time to perform the
> operation.
> I've also tried using lapply functions, but the are not improvements
> in time. Is there another way of changing the intensities rather than
> using indexProbes and intensity R functions?
>
> Thanks in advance,
> Javier
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793



More information about the Bioconductor mailing list