[R] How can I avoid the for and If loops in my function?

Thu Jun 19 17:58:17 CEST 2014

Now that it is simplified, we see that you are incrementing the size
of an array at each step.  This is
a very inefficient practice in R.  For comparison,

## what you are doing
system.time({
  tmp <- NULL
  for (i in 1:10000) tmp <- rbind(tmp, 1:2)
})
##   user  system elapsed
##  0.622   0.297   1.256

## much faster
system.time({
  tmp <- matrix(nrow=10000, ncol=2)
  for (i in 1:10000) tmp[i,] <- 1:2
})
##   user  system elapsed
##  0.027   0.002   0.030

On Thu, Jun 19, 2014 at 11:04 AM, Mramba,Lazarus K <lmramba at ufl.edu> wrote:
> Hi Martin,
>
> Thanks for the useful comments. It has greatly improved the efficiency of my
> functions. I am sorry that I forgot to paste the functions varG and others
> which I have done here for reproducibility.
>
>
> #############################
> ## a function to calculate h2
> ## (heritability)
> ###############################
> herit<-function(varG,varR=1)
> {
>   h<-4*varG/(varG+varR)
>   h
> }
> ###################################
> # function to calculate random error
> varR<-function(varG,h2)
> {
>   varR<- varG*(4-h2)/h2
>   varR
> }
>
> ##########################################
> # function to calculate treatment variance
> varG<-function(varR=1,h2)
> {
>   varG<-varR*h2/(4-h2)
>   varG
> }
>
> ###############################
>
> I used the package "gmp" so that I can use the function "is.whole()".
>
> I have managed to reduce the complexity of the function that I needed help.
> It now takes half the time it used to take before, but it is several hours
> and wish I can cut these hours down by doing away with the for and if loops
> in this code.
> Any help will be appreciated.
>
> newfunF<- function(matrix0,n,traceI)
> {
>   newmatdf<-matrix0
>   trace<-traceI
>   mat <- NULL
>   Design_best <- newmatdf
>   mat <- rbind(mat, c(value = trace, iterations = 0))
>   Des<-list()
>   newmatdf<-swapsimple(matrix0)
>   for(i in 2:n){
>      newmatdf<-swapsimple(newmatdf)
>
>     Des[[i]]<-newmatdf
>     if(swapmainF(newmatdf) < trace){
>       Design_best <- Des[[i]]<-newmatdf
>       trace<- swapmainF(newmatdf)
>       mat <- rbind(mat, c(trace = trace, iterations = i))
>     }
>    }
>  list(mat=mat,Design_best=Design_best)
>
> }
>
>
>
>
>
> On Thu, 19 Jun 2014 16:50:15 +0200, Martin Maechler wrote:
>>>>>>>
>>>>>>> lmramba  <lmramba at ufl.edu>
>>>>>>>     on Wed, 18 Jun 2014 20:00:15 +0000 writes:
>>
>>
>>     > Hi Jim. If I avoid the dataframe, how can use the function
>> model.matrix() to
>>     > build the incident matrices X, And Z? I tried saving the design
>> as matrix
>>     > but ghen I got the wrong design matrix.
>>
>> I think you are entirely right here, Laz.
>> That indeed you have data frame and a formula --> model.matrix()
>> to get the matrix.
>>
>> I have no time currently to delve into your
>> example, and I see
>> - it is not reproducible {you use a function
>>   varG() that is undefined}
>> - you use foreach just so you can use %do% in one place which I
>>   think makes no sense
>> - you use package 'gmp' which I don't think you'd use, but I
>>   don't know as your code is not reproducible ....
>> - you use "<<-" in quite a few places in your code, which is
>>   considered really bad programming style and makes it very hard
>>   to understand the code by reading it ...
>>
>> ... *but* .. after all that ...
>> ...
>> as maintainer of the Matrix package
>> I'm close to absolutely sure that you want to work with *sparse*
>> matrices as Matrix provides.
>>
>> So in fact, do use
>>
>> ## "require":  just so you are not tempted to call a package a "library"
>> require(Matrix)
>>
>> help(sparse.model.matrix)
>>
>> and then do
>> - use  sparse.model.matrix() instead of model.matrix().
>>
>> Further, do
>> - use Diagonal() instead of  diag()  for *constructing* diagonal matrices.
>>
>> Please let us know if this helps
>>
>> [and maybe fix your example to become reproducible: do
>>
>>    rm(list=ls(all=TRUE))
>>
>>  before  source(...) ing the reproducible example script...
>> ]
>>
>> Martin Maechler, ETH Zurich
>>
>>
>>     > Thanks.
>>
>>     > Laz
>>
>>
>>     > Sent from my LG Optimus G™, an AT&T 4G LTE smartphone
>>     > ------ Original message ------
>>     > From: jim holtman
>>     > Date: 6/18/2014 3:49 PM
>>     > To: Laz;
>>     > Cc: R mailing list;
>>     > Subject:Re: [R] How can I avoid the for and If loops in my function?
>>
>>     > First order of business, without looking in detail at the code,
>> is to avoid
>>     > the use of dataframes.  If all your values are numerics, then
>> use a matrix.
>>     > It will be faster execution.
>>     > I did see the following statements:
>>     > newmatdf<-Des[[i]]
>>     > Des[[i]]<-newmatdf
>>     > why are you just putting back what you pulled out of the list?
>>
>>     > Jim Holtman
>>     > Data Munger Guru
>>
>>     > What is the problem that you are trying to solve?
>>     > Tell me what you want to do, not how you want to do it.
>>     > On Wed, Jun 18, 2014 at 12:41 PM, Laz <[1]lmramba at ufl.edu> wrote:
>>
>>     > Dear R-users,
>>     > I have a 3200 by 3200 matrix that was build from a data frame that
>> had
>>     > 180 observations,  with variables: x, y, blocks (6 blocks) and
>>     > treatments (values range from 1 to 180) I am working on. I build
>> other
>>     > functions that seem to work well. However, I have one function that
>> has
>>     > many If loops and a long For loop that delays my results for over 10
>>     > hours ! I need your help to avoid these loops.
>>     > ########################################################
>>     > ## I need to avoid these for loops and if loops here :
>>     > ########################################################
>>     > ### swapsimple() is a function that takes in a dataframe,
>> randomly swaps
>>     > two elements from the same block in a data frame and generates a new
>>     > dataframe called newmatdf
>>     > ### swapmainF() is a function that calculates the trace of the final
>> N
>>     > by N matrix considering the incident matrices and blocks and
>> treatments
>>     > and residual errors in a linear mixed model framework using
>> Henderson
>>     > approach.
>>     > funF<- function(newmatdf, n, traceI)
>>     > {
>>     > # n = number of iterations (swaps to be made on pairs of
>> elements of the
>>     > dataframe, called newmatdf)
>>     > # newmatdf : is the original dataframe with N rows, and 4 variables
>>     > (x,y,blocks,genotypes)
>>     > matrix0<-newmatdf
>>     > trace<-traceI  ##  sum of the diagonal elements of the N by N matrix
>>     > (generated outside this loop) from the original newmatdf dataframe
>>     > res <- list(mat = NULL, Design_best = newmatdf, Original_design =
>>     > matrix0) # store our output of interest
>>     > res$mat <- rbind(res$mat, c(value = trace, iterations = 0)) #
>>     > initialized values
>>     > Des<-list()
>>     > for(i in seq_len(n)){
>>     > ifelse(i==1,
>>     > newmatdf<-swapsimple(matrix0),newmatdf<-swapsimple(newmatdf))
>>     > Des[[i]]<-newmatdf
>>     > if(swapmainF(newmatdf) < trace){
>>     > newmatdf<-Des[[i]]
>>     > Des[[i]]<-newmatdf
>>     > trace<- swapmainF(newmatdf)
>>     > res$mat <- rbind(res$mat, c(trace = trace, iterations = i))
>>     > res$Design_best <- newmatdf
>>     > }
>>     > if(swapmainF(newmatdf) > trace & nrow(res$mat)<=1){
>>     > newmatdf<-matrix0
>>     > Des[[i]]<-matrix0
>>     > res$Design_best<-matrix0
>>     > }
>>     > if(swapmainF(newmatdf)> trace & nrow(res$mat)>1){
>>     > newmatdf<-Des[[length(Des)-1]]
>>     > Des[[i]]<-newmatdf
>>     > res$Design_best<-newmatdf
>>     > }
>>     > }
>>     > res
>>     > }
>>     > The above function was created to:
>>     > Take  an original matrix, called matrix0, calculate its trace.
>>     > Generate a new matrix, called newmatdf after  swapping two
>> elements of the
>>     > old one and  calculate the trace. If the trace of the newmatrix is
>>     > smaller than
>>     > that of the previous matrix, store both the current trace together
>>     > with the older trace and their  iteration values. If the newer
>> matrix has
>>     > a trace larger than the previous trace, drop this trace and drop
>> this
>>     > matrix too (but count its iteration).
>>     > Re-swap the old matrix that you stored previously and recalculate
>> the
>>     > trace. Repeat the
>>     > process many times, say 10,000. The final results should be a list
>>     > with the original initial matrix and its trace, the final best
>>     > matrix that had the smallest trace after the 10000 simulations and a
>>     > dataframe  showing the values of the accepted traces that
>>     > were smaller than the previous and their respective iterations.
>>     > $Original_design
>>     > x  y block genotypes
>>     > 1    1  1     1        29
>>     > 7    1  2     1         2
>>     > 13   1  3     1         8
>>     > 19   1  4     1        10
>>     > 25   1  5     1         9
>>     > 31   1  6     2        29
>>     > 37   1  7     2         4
>>     > 43   1  8     2        22
>>     > 49   1  9     2         3
>>     > 55   1 10     2        26
>>     > 61   1 11     3        18
>>     > 67   1 12     3        19
>>     > 73   1 13     3        28
>>     > 79   1 14     3        10
>>     > ------truncated ----
>>     > the final results after running  funF<-
>>     > function(newmatdf,n,traceI)  given below looks like this:
>>     > ans1
>>     > $mat
>>     > value iterations
>>     > [1,] 1.474952          0
>>     > [2,] 1.474748          1
>>     > [3,] 1.474590          2
>>     > [4,] 1.474473          3
>>     > [5,] 1.474411          5
>>     > [6,] 1.474294         10
>>     > [7,] 1.474182         16
>>     > [8,] 1.474058         17
>>     > [9,] 1.473998         19
>>     > [10,] 1.473993         22
>>     > ---truncated
>>     > $Design_best
>>     > x  y block genotypes
>>     > 1    1  1     1        29
>>     > 7    1  2     1         2
>>     > 13   1  3     1        18
>>     > 19   1  4     1        10
>>     > 25   1  5     1         9
>>     > 31   1  6     2        29
>>     > 37   1  7     2        21
>>     > 43   1  8     2         6
>>     > 49   1  9     2         3
>>     > 55   1 10     2        26
>>     > ---- truncated
>>     > $Original_design
>>     > x  y block genotypes
>>     > 1    1  1     1        29
>>     > 7    1  2     1         2
>>     > 13   1  3     1         8
>>     > 19   1  4     1        10
>>     > 25   1  5     1         9
>>     > 31   1  6     2        29
>>     > 37   1  7     2         4
>>     > 43   1  8     2        22
>>     > 49   1  9     2         3
>>     > 55   1 10     2        26
>>     > 61   1 11     3        18
>>     > 67   1 12     3        19
>>     > 73   1 13     3        28
>>     > 79   1 14     3        10
>>     > ------truncated
>>     > Regards,
>>     > Laz
>>     > [[alternative HTML version deleted]]
>>     > ______________________________________________
>>     > [2]R-help at r-project.org mailing list
>>     > [3]https://stat.ethz.ch/mailman/listinfo/r-help
>>     > PLEASE do read the posting
>>     > guide [4]http://www.R-project.org/posting-guide.html
>>     > and provide commented, minimal, self-contained, reproducible code.
>>
>>     > References
>>
>>     > 1. mailto:lmramba at ufl.edu
>>     > 2. mailto:R-help at r-project.org
>>     > 3. https://stat.ethz.ch/mailman/listinfo/r-help
>>     > 4. http://www.R-project.org/posting-guide.html
>>     > ______________________________________________
>>     > R-help at r-project.org mailing list
>>     > https://stat.ethz.ch/mailman/listinfo/r-help
>>     > PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>>     > and provide commented, minimal, self-contained, reproducible code.
>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.