[R] Quicker way of combining vectors into a data.frame

Sebastian Weber sebastian.weber at physik.tu-darmstadt.de
Thu Nov 30 18:29:53 CET 2006


Hi!

I don't know for sure - and I have not tried it yet, but how about
allocating a matrix which will hold all stuff, then put all vectors in
it and at last assign some dimnames to it:

data <- matrix(0, ncol=5, nrow=length(vec1))
data[1,] <- vec1
...
dimnames(data) <- list(c(1,2,3,4,5), )

as.data.frame(data)

I forgot, I of course assume all of your vectors to be numeric ...

Hope that helps!

Greetings,

Sebastian

On Thu, 2006-11-30 at 17:00 +0000, Gavin Simpson wrote:
> Hi,
> 
> In a function, I compute 10 (un-named) vectors of reasonable length
> (4471 in the particular example I have to hand) that I want to combine
> into a data frame object, that the function will return.
> 
> This is very slow, so *I'm* doing something wrong if I want it to be
> quick and efficient, though I'm not sure what the best way to do this
> would be.
> 
> I know it is the combining into data frame bit that is slow, because
> I've Rprof'ed it:
> 
> $by.self
>                         self.time self.pct total.time total.pct
> "names<-.default"           16.58     52.8      16.58      52.8
> "unlist"                     7.22     23.0       7.26      23.1
> "data.frame"                 1.72      5.5      29.38      93.6
> "duplicated.default"         1.66      5.3       1.66       5.3
> "+"                          1.20      3.8       1.20       3.8
> "list"                       0.40      1.3       0.40       1.3
> "as.data.frame.numeric"      0.28      0.9       3.32      10.6
> "apply"                      0.26      0.8       1.70       5.4
> "pmatch"                     0.22      0.7       0.22       0.7
> "paste"                      0.20      0.6       0.90       2.9
> "deparse"                    0.14      0.4       0.70       2.2
> "eval"                       0.12      0.4      31.28      99.7
> "names<-"                    0.12      0.4      16.70      53.2
> "FUN"                        0.12      0.4       1.32       4.2
> "names"                      0.12      0.4       0.14       0.4
> "as.list.default"            0.12      0.4       0.12       0.4
> "duplicated"                 0.10      0.3       1.76       5.6
> "gc"                         0.10      0.3       0.10       0.3
> 
> And I stepped through it under debug() and all the calculations before
> are quick, and then this bit takes a little over 20 seconds to complete
> 
>  fab <- data.frame(lc.ratio = lc.ratio, Q = Q,
>                      fNupt = fNupt,
>                      rho.n = rho.n, rho.s = rho.s,
>                      net.Nimm = net.Nimm,
>                      net.Nden = net.Nden,
>                      CLminN = CLminN,
>                      CLmaxN = CLmaxN,
>                      CLmaxS = CLmaxS)
> 
> I can get it down to c. 5 seconds if I do (not Rprof'ed):
> 
>  fab <- data.frame(lc.ratio, Q,
>                      fNupt,
>                      rho.n, rho.s,
>                      net.Nimm,
>                      net.Nden,
>                      CLminN,
>                      CLmaxN,
>                      CLmaxS)
> 
> But this still seems quite a long time, so I'm thinking that there must
> be a quicker of doing what I want (end up with a data.frame with the 10
> vectors in it).
> 
> Can anyone enlighten me?
> 
> > version
>                _                                          
> platform       i686-pc-linux-gnu                          
> arch           i686                                       
> os             linux-gnu                                  
> system         i686, linux-gnu                            
> status         Patched                                    
> major          2                                          
> minor          4.0                                        
> year           2006                                       
> month          10                                         
> day            03                                         
> svn rev        39576                                      
> language       R                                          
> version.string R version 2.4.0 Patched (2006-10-03 r39576)
> 
> > sessionInfo()
> R version 2.4.0 Patched (2006-10-03 r39576) 
> i686-pc-linux-gnu 
> 
> locale:
> LC_CTYPE=en_GB.UTF-8;LC_NUMERIC=C;LC_TIME=en_GB.UTF-8;LC_COLLATE=en_GB.UTF-8;LC_MONETARY=en_GB.UTF-8;LC_MESSAGES=en_GB.UTF-8;LC_PAPER=en_GB.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_GB.UTF-8;LC_IDENTIFICATION=C
> 
> attached base packages:
> [1] "methods"   "stats"     "graphics"  "grDevices" "utils"
> "datasets" 
> [7] "base"
> 
> Thanks in advance,
> 
> G
> -- 
> %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
>  Gavin Simpson                 [t] +44 (0)20 7679 0522
>  ECRC & ENSIS, UCL Geography,  [f] +44 (0)20 7679 0565
>  Pearson Building,             [e] gavin.simpsonATNOSPAMucl.ac.uk
>  Gower Street, London          [w] http://www.ucl.ac.uk/~ucfagls/
>  UK. WC1E 6BT.                 [w] http://www.freshwaters.org.uk
> %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list