[R] Help on improving an algorithm

MacQueen, Don macqueen1 at llnl.gov
Thu Jul 28 18:16:56 CEST 2016


This is a good example to illustrate that R is a vectorized language, and
programming concepts that would work in fortran, C, or any of many other
languages are not always effective in R.

The vectorized method below has produced the same value for 'p' in every
example I¹ve tried, and is much faster.

Note that the desired calculation looks at every combination of one row
from each of four matrices, i.e., 126^4 combinations (a large number).
This suggests the use of expand.grid().

I don't see any need to actually construct the 'conti' matrix. Only its
row sums are needed, and those can be calculated directly from the rows of
the input matrices.

The vectorized method makes using 126 rows tractable. It still takes a
while on my machine, but it's tolerable. The original method takes longer
than I'm willing to wait. I used a smaller number of rows for testing.

## number of rows in each input matrix
nr <- 126
nr <- 11
## number of columns in each input matrix
nc <- 6

## simulate input data, in order to create
## a reproducible example
xr <- 0:2
tmpa <- matrix( sample(xr, nr*nc, replace=TRUE) , ncol=nc)
tmpb <- matrix( sample(xr, nr*nc, replace=TRUE) , ncol=nc)
tmpc <- matrix( sample(xr, nr*nc, replace=TRUE) , ncol=nc)
tmpd <- matrix( sample(xr, nr*nc, replace=TRUE) , ncol=nc)

cat('method 2 (vectorized)\n')
t2 <- system.time({
  rsa <- rowSums(tmpa)
  rsb <- rowSums(tmpb)
  rsc <- rowSums(tmpc)
  rsd <- rowSums(tmpd)

  bigr <- expand.grid(rsa, rsb, rsc, rsd)

  p2 <- sum( rowSums( bigr > 8) == 0)
  print(p2)
})

cat('original method (looping)\n')

torig <- system.time({
  p <- 0
  for (i in 1:nrow(tmpa)) {
    for (j in 1:nrow(tmpb)) {
      for (k in 1:nrow(tmpc)) {
        for (l in 1:nrow(tmpd)) {
          conti <- rbind(tmpa[i,], tmpb[j,], tmpc[k,],tmpd[l,])
          if (sum(apply(conti,1,sum)>8)==0) p <- p+1
          ##        print(p)
        }
      }
    }
  }

print(p)
})

cat('\n')
print(t2)
print(torig)


A couple of minor points:
  dim(tmp_a)[1] can be replaced by nrow(tmp_a)
  in the original code, apply() can be replaced by rowSums()
It might be faster if rowSums() was replaced by .rowSums(); I haven't
tried that.

-Don

 
-- 
Don MacQueen

Lawrence Livermore National Laboratory
7000 East Ave., L-627
Livermore, CA 94550
925-423-1062





On 7/27/16, 1:41 PM, "R-help on behalf of li li"
<r-help-bounces at r-project.org on behalf of hannah.hlx at gmail.com> wrote:

>Hi all,
> I have four matrix tmp_a, tmp_b, tmp_c and tmp_d whose dimensions are
>shown as below.
>I want to take one row from each of these matrices and then put the four
>selected rows into a matrix.
>I want to count the number of such matrices for which the vector of row
>sum
>is less than or equal to (8,8,8,8,8) (componentwise).
>Below is the code I use now. Is there a way to make this more efficient
>and
>faster?
> Thanks for the help in advance.
>
>> dim(tmp_a);dim(tmp_b); dim(tmp_c); dim(tmp_d)[1] 252   6
>[1] 126   6
>[1] 126   6
>[1] 126   6
>
>
>p <- 0
>for (i in 1:dim(tmp_a)[1]){
>    for (j in 1:dim(tmp_b)[1]){
>        for (k in 1:dim(tmp_c)[1]){
>            for (l in 1:dim(tmp_c)[1]){
>            conti <- rbind(tmp_a[i,], tmp_b[j,], tmp_c[k,],tmp_d[l,])
>            if (sum(apply(conti,1,sum)>8)==0)
>                p <- p+1
>                print(p)}}}}
>
>	[[alternative HTML version deleted]]
>
>______________________________________________
>R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list