[R] Matrix to "database" -- best practices/efficiency?

Jonathan Greenberg greenberg at ucdavis.edu
Tue Jun 8 07:07:06 CEST 2010


Its a race!  I decided to go ahead and time everyone's results, and
all of the method's (except mine) are around the same speed.  I ran
them a few times and Gabor's application of melt() tends to be a tad
bit faster than the other two, although that is far from conclusive --
do these methods share code in common?  Should I expect one to of
these to have a smaller during-processing footprint than the others?
Thanks again!

--j

# My terrible approach:
my_matrix=matrix(c(1:60),nrow=600,ncol=100)
id_m=seq(10,6000,by=10)
id_n=seq(100,10000,by=100)

system.time(
for (a in 1:length(id_m))
{
	for (b in 1:length(id_n))
	{
		if ((a==1) && (b==1))
		{
			my_database=c(id_m[a],id_n[b],my_matrix[a,b])
		} else
		{
			my_database=rbind(my_database,c(id_m[a],id_n[b],my_matrix[a,b]))
		}
	}
}
)
   user  system elapsed
173.601  10.288 202.433

# Gabor's method with reshape
library(reshape)
my_matrix = matrix(c(1:60),nrow=600,ncol=100,dimnames=list(seq(10,6000,by=10),seq(100,10000,by=100)))
system.time(
		my_database <- melt(my_matrix)
)

   user  system elapsed
  0.006   0.006   0.014

# Jorge's method with as.data.frame.table
my_matrix = matrix(c(1:60),nrow=600,ncol=100,dimnames=list(seq(10,6000,by=10),seq(100,10000,by=100)))
system.time(
		my_database <- as.data.frame.table(my_matrix)
)

   user  system elapsed
  0.027   0.005   0.036

# Bill's method with expand.grid
my_matrix=matrix(c(1:60),nrow=600,ncol=100)
id_m=seq(10,6000,by=10)
id_n=seq(100,10000,by=100)
system.time(
my_database <- cbind(
		expand.grid(id_m = id_m, id_n = id_n),
		mat = as.vector(my_matrix)
)
)

   user  system elapsed
  0.007   0.006   0.020

On Mon, Jun 7, 2010 at 9:30 PM,  <Bill.Venables at csiro.au> wrote:
> I think what you are groping for is something like this
>
> my_matrix <- matrix(1:60, nrow = 6)
> id_a <- seq(10,60,by=10)
> id_b <- seq(100,1000,by=100)
> my_database <- cbind(
>  expand.grid(id_a = id_a, id_b = id_b),
>  mat = as.vector(my_matrix)
> )
>
>
> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Jonathan Greenberg
> Sent: Tuesday, 8 June 2010 12:34 PM
> To: r-help
> Subject: [R] Matrix to "database" -- best practices/efficiency?
>
> I have a matrix of, say, M and N dimensions:
>
> my_matrix=matrix(c(1:60),nrow=6,ncol=10)
>
> I have two "id" vectors corresponding to the rows and columns, e.g.:
>
> id_m=seq(10,60,by=10)
> id_n=seq(100,1000,by=100)
>
> I would like to create a "proper" database (let's say a data.frame for
> this example -- i'm going to be loading these into an SQLite database,
> but we'll leave that complication out of this discussion for now) of m
> x n rows, and 3 columns, where the 3 columns relate to the values from
> m, n, and my_matrix respectively, e.g. a single row follows the form:
>
> c(id_m[a],id_n[b],my_matrix[a,b])
>
> I can, of course, for-loop this thing with an if-then, e.g.:
>
> ***
>
> for (a in 1:length(id_m))
> {
>        for (b in 1:length(id_n))
>        {
>                if ((a==1) && (b==1))
>                {
>                        my_database=c(id_m[a],id_n[b],my_matrix[a,b])
>                } else
>                {
>                        my_database=rbind(my_database,c(id_m[a],id_n[b],my_matrix[a,b]))
>                }
>        }
> }
>
> ***
>
> But my gut is telling me this is an incredibly inefficient way of
> doing this -- is there a faster approach to doing this same process?
> Thanks!
>
> --j
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list