[R] "unsparse" a vector

Petr Savicky savicky at cs.cas.cz
Thu Feb 9 12:35:36 CET 2012


On Wed, Feb 08, 2012 at 05:01:01PM -0500, Sam Steingold wrote:
> loop is too slow.
> it appears that sparseMatrix does what I want:
> 
> ll <- lapply(l,length)
> i <- rep(1:4, ll)
> vv <- unlist(l)
> j1 <- as.factor(substring(vv,1,1))
> t <- table(j1)
> j <- position of elements of j1 in names(t)
> sparseMatrix(i,j,x=as.numeric(substring(vv,2,2)), dimnames = names(t))
> 
> so, the question is, how do I produce a vector of positions?
> 
> i.e., from vectors
> [1] "A" "B" "A" "C" "A" "B"
> and
> [1] "A" "B" "C"
> I need to produce a vector
> [1] 1 2 1 3 1 2
> of positions of the elements of the first vector in the second vector.

This particular thing may be done as follows

  match(c("A", "B", "A", "C", "A", "B"), c("A", "B", "C"))
  [1] 1 2 1 3 1 2

> PS. Of course, I would much prefer a dataframe to a matrix...

As the final result or also as an intermediate result?

Changing individual rows in a data frame is much slower
than in a matrix.

Compare

  n <- 10000
  mat <- matrix(1:(2*n), nrow=n)
  df <- as.data.frame(mat)

  system.time( for (i in 1:n) { mat[i, 1] <- 0 } )

     user  system elapsed 
    0.021   0.000   0.021 

  system.time( for (i in 1:n) { df[i, 1] <- 0 } )

     user  system elapsed 
    4.997   0.069   5.084 

This effect is specific to working with rows. Working
with the whole columns is a different thing.

  system.time( {
  col1 <- df[[1]]
  for (i in 1:n) { col1[i] <- 0 }
  df[[1]] <- col1
  } )

    user  system elapsed 
   0.019   0.000   0.019 

Hope this helps.

Petr Savicky.



More information about the R-help mailing list