[R] How can I avoid the for and If loops in my function?

Thu Jun 19 16:50:15 CEST 2014

>>>>> lmramba  <lmramba at ufl.edu>
>>>>>     on Wed, 18 Jun 2014 20:00:15 +0000 writes:

    > Hi Jim. If I avoid the dataframe, how can use the function model.matrix() to
    > build the incident matrices X, And Z? I tried saving the design as matrix
    > but ghen I got the wrong design matrix.

I think you are entirely right here, Laz.
That indeed you have data frame and a formula --> model.matrix()
to get the matrix.

I have no time currently to delve into your 
example, and I see 
- it is not reproducible {you use a function
  varG() that is undefined} 
- you use foreach just so you can use %do% in one place which I
  think makes no sense
- you use package 'gmp' which I don't think you'd use, but I
  don't know as your code is not reproducible ....
- you use "<<-" in quite a few places in your code, which is
  considered really bad programming style and makes it very hard
  to understand the code by reading it ...

... *but* .. after all that ...
...
as maintainer of the Matrix package 
I'm close to absolutely sure that you want to work with *sparse*
matrices as Matrix provides.

So in fact, do use

## "require":  just so you are not tempted to call a package a "library"
require(Matrix)

help(sparse.model.matrix)

and then do 
- use  sparse.model.matrix() instead of model.matrix().

Further, do 
- use Diagonal() instead of  diag()  for *constructing* diagonal matrices.

Please let us know if this helps

[and maybe fix your example to become reproducible: do

   rm(list=ls(all=TRUE))

 before  source(...) ing the reproducible example script...
]

Martin Maechler, ETH Zurich

    > Thanks.

    > Laz

    > Sent from my LG Optimus G™, an AT&T 4G LTE smartphone
    > ------ Original message ------
    > From: jim holtman
    > Date: 6/18/2014 3:49 PM
    > To: Laz;
    > Cc: R mailing list;
    > Subject:Re: [R] How can I avoid the for and If loops in my function?

    > First order of business, without looking in detail at the code, is to avoid
    > the use of dataframes.  If all your values are numerics, then use a matrix.
    > It will be faster execution.
    > I did see the following statements:
    > newmatdf<-Des[[i]]
    > Des[[i]]<-newmatdf
    > why are you just putting back what you pulled out of the list?

    > Jim Holtman
    > Data Munger Guru

    > What is the problem that you are trying to solve?
    > Tell me what you want to do, not how you want to do it.
    > On Wed, Jun 18, 2014 at 12:41 PM, Laz <[1]lmramba at ufl.edu> wrote:

    > Dear R-users,
    > I have a 3200 by 3200 matrix that was build from a data frame that had
    > 180 observations,  with variables: x, y, blocks (6 blocks) and
    > treatments (values range from 1 to 180) I am working on. I build other
    > functions that seem to work well. However, I have one function that has
    > many If loops and a long For loop that delays my results for over 10
    > hours ! I need your help to avoid these loops.
    > ########################################################
    > ## I need to avoid these for loops and if loops here :
    > ########################################################
    > ### swapsimple() is a function that takes in a dataframe, randomly swaps
    > two elements from the same block in a data frame and generates a new
    > dataframe called newmatdf
    > ### swapmainF() is a function that calculates the trace of the final N
    > by N matrix considering the incident matrices and blocks and treatments
    > and residual errors in a linear mixed model framework using Henderson
    > approach.
    > funF<- function(newmatdf, n, traceI)
    > {
    > # n = number of iterations (swaps to be made on pairs of elements of the
    > dataframe, called newmatdf)
    > # newmatdf : is the original dataframe with N rows, and 4 variables
    > (x,y,blocks,genotypes)
    > matrix0<-newmatdf
    > trace<-traceI  ##  sum of the diagonal elements of the N by N matrix
    > (generated outside this loop) from the original newmatdf dataframe
    > res <- list(mat = NULL, Design_best = newmatdf, Original_design =
    > matrix0) # store our output of interest
    > res$mat <- rbind(res$mat, c(value = trace, iterations = 0)) #
    > initialized values
    > Des<-list()
    > for(i in seq_len(n)){
    > ifelse(i==1,
    > newmatdf<-swapsimple(matrix0),newmatdf<-swapsimple(newmatdf))
    > Des[[i]]<-newmatdf
    > if(swapmainF(newmatdf) < trace){
    > newmatdf<-Des[[i]]
    > Des[[i]]<-newmatdf
    > trace<- swapmainF(newmatdf)
    > res$mat <- rbind(res$mat, c(trace = trace, iterations = i))
    > res$Design_best <- newmatdf
    > }
    > if(swapmainF(newmatdf) > trace & nrow(res$mat)<=1){
    > newmatdf<-matrix0
    > Des[[i]]<-matrix0
    > res$Design_best<-matrix0
    > }
    > if(swapmainF(newmatdf)> trace & nrow(res$mat)>1){
    > newmatdf<-Des[[length(Des)-1]]
    > Des[[i]]<-newmatdf
    > res$Design_best<-newmatdf
    > }
    > }
    > res
    > }
    > The above function was created to:
    > Take  an original matrix, called matrix0, calculate its trace.
    > Generate a new matrix, called newmatdf after  swapping two elements of the
    > old one and  calculate the trace. If the trace of the newmatrix is
    > smaller than
    > that of the previous matrix, store both the current trace together
    > with the older trace and their  iteration values. If the newer matrix has
    > a trace larger than the previous trace, drop this trace and drop this
    > matrix too (but count its iteration).
    > Re-swap the old matrix that you stored previously and recalculate the
    > trace. Repeat the
    > process many times, say 10,000. The final results should be a list
    > with the original initial matrix and its trace, the final best
    > matrix that had the smallest trace after the 10000 simulations and a
    > dataframe  showing the values of the accepted traces that
    > were smaller than the previous and their respective iterations.
    > $Original_design
    > x  y block genotypes
    > 1    1  1     1        29
    > 7    1  2     1         2
    > 13   1  3     1         8
    > 19   1  4     1        10
    > 25   1  5     1         9
    > 31   1  6     2        29
    > 37   1  7     2         4
    > 43   1  8     2        22
    > 49   1  9     2         3
    > 55   1 10     2        26
    > 61   1 11     3        18
    > 67   1 12     3        19
    > 73   1 13     3        28
    > 79   1 14     3        10
    > ------truncated ----
    > the final results after running  funF<-
    > function(newmatdf,n,traceI)  given below looks like this:
    > ans1
    > $mat
    > value iterations
    > [1,] 1.474952          0
    > [2,] 1.474748          1
    > [3,] 1.474590          2
    > [4,] 1.474473          3
    > [5,] 1.474411          5
    > [6,] 1.474294         10
    > [7,] 1.474182         16
    > [8,] 1.474058         17
    > [9,] 1.473998         19
    > [10,] 1.473993         22
    > ---truncated
    > $Design_best
    > x  y block genotypes
    > 1    1  1     1        29
    > 7    1  2     1         2
    > 13   1  3     1        18
    > 19   1  4     1        10
    > 25   1  5     1         9
    > 31   1  6     2        29
    > 37   1  7     2        21
    > 43   1  8     2         6
    > 49   1  9     2         3
    > 55   1 10     2        26
    > ---- truncated
    > $Original_design
    > x  y block genotypes
    > 1    1  1     1        29
    > 7    1  2     1         2
    > 13   1  3     1         8
    > 19   1  4     1        10
    > 25   1  5     1         9
    > 31   1  6     2        29
    > 37   1  7     2         4
    > 43   1  8     2        22
    > 49   1  9     2         3
    > 55   1 10     2        26
    > 61   1 11     3        18
    > 67   1 12     3        19
    > 73   1 13     3        28
    > 79   1 14     3        10
    > ------truncated
    > Regards,
    > Laz
    > [[alternative HTML version deleted]]
    > ______________________________________________
    > [2]R-help at r-project.org mailing list
    > [3]https://stat.ethz.ch/mailman/listinfo/r-help
    > PLEASE do read the posting
    > guide [4]http://www.R-project.org/posting-guide.html
    > and provide commented, minimal, self-contained, reproducible code.

    > References

    > 1. mailto:lmramba at ufl.edu
    > 2. mailto:R-help at r-project.org
    > 3. https://stat.ethz.ch/mailman/listinfo/r-help
    > 4. http://www.R-project.org/posting-guide.html
    > ______________________________________________
    > R-help at r-project.org mailing list
    > https://stat.ethz.ch/mailman/listinfo/r-help
    > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
    > and provide commented, minimal, self-contained, reproducible code.