[R] Efficient way to use data frame of indices to initialize matrix

Cutler, Gene gcutler at amgen.com
Wed Dec 8 18:51:48 CET 2010


Thanks for the three great answers!  For those who are curious, I timed the three approaches:

nr <- 15812
nc <- 64636
mymat <- matrix(nrow=nr, ncol=nc)
mymat[1,1] <- 1 # see note below

# mydf is created elsewhere
dim(mydf)
# 10910263        3
colnames(mydf)
# "x" "y" "a"

# approach 1:
# mymat[ mydf$x + (mydf$y-1) * nc ] <- mydf$a

# approach 2:
# mymat[ as.matrix(mydf[,2:1]) ] <- mydf$a

# approach 3:
# mymat[ cbind(mydf$x, mydf$y) ] <- mydf$a


system.time( for (i in 1:10) mymat[ mydf$x + (mydf$y-1) * nc ] <- mydf$a )
system.time( for (i in 1:10) mymat[ as.matrix(mydf$x, mydf$y) ] <- mydf$a )
system.time( for (i in 1:10) mymat[ cbind(mydf$x, mydf$y) ] <- mydf$a )


#   user  system elapsed 
# 10.478   3.837  14.317 <- #1
#  9.064   1.711  10.777 <- #2
# 10.747   2.702  13.450 <- #3

So you can see that approach #2 is the fastest.  Note that I found that initializing the new matrix with its first value takes about 8 elapsed seconds all on its own, which is why I have that initialization line above.

--
Gene


> -----Original Message-----
> From: David Winsemius [mailto:dwinsemius at comcast.net]
> Sent: Tuesday, December 07, 2010 11:00 AM
> To: Greg Snow
> Cc: Gene; r-help at r-project.org
> Subject: Re: [R] Efficient way to use data frame of indices to
> initialize matrix
> 
> 
> On Dec 7, 2010, at 1:49 PM, Greg Snow wrote:
> 
> > tmpdf <- data.frame( x = c(1,2,3), y=c(2,3,1), a=c(10,20,30) )
> > mymat <- matrix(0, ncol=3, nrow=3)
> > mymat[ as.matrix(tmpdf[,c('x','y')]) ] <- tmpdf$a
> 
> cbind is also useful for assembly of arguments to the  matrix-`[<-`
> function:
> 
> tmpdf <- data.frame( x = c(1,2,3), y=c(2,3,1), a=c(10,20,30) )
>   mymat <- matrix(NA, ncol=max(tmpdf$y), nrow=max(tmpdf$x))
>   mymat[ cbind(tmpdf$x,tmpdf$y) ] <- tmpdf$a
> 
>   mymat
>       [,1] [,2] [,3]
> [1,]   NA   10   NA
> [2,]   NA   NA   20
> [3,]   30   NA   NA
> 
> 
> > --
> > Gregory (Greg) L. Snow Ph.D.
> > Statistical Data Center
> > Intermountain Healthcare
> > greg.snow at imail.org
> > 801.408.8111
> >
> >
> >> -----Original Message-----
> >> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> >> project.org] On Behalf Of Gene
> >> Sent: Tuesday, December 07, 2010 11:31 AM
> >> To: r-help at r-project.org
> >> Subject: [R] Efficient way to use data frame of indices to
> initialize
> >> matrix
> >>
> >> I have a data frame with three columns, x, y, and a.  I want to
> >> create
> >> a matrix from these values such that for matrix m:
> >> m[x,y] == a
> >>
> >> Obviously, I can go row by row through the data frame and insert the
> >> value a at the correct x,y location in the matrix.  I can make that
> >> slightly more efficient (perhaps), by doing something like this:
> >>> for (each.x in unique(df$x)) m[each.x, df$y[df$x == each.x]] <-
> >> df$a[df$x == each.x]
> >>
> >> But I feel that there must be a more efficient, or at least more
> >> elegant way to do this.
> >>
> >> --
> >> Gene
> >>
> >> ______________________________________________
> >> R-help at r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide http://www.R-project.org/posting-
> >> guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> 
> David Winsemius, MD
> West Hartford, CT



More information about the R-help mailing list