[R] efficient ways to dynamically grow a dataframe

R. Michael Weylandt michael.weylandt at gmail.com
Thu Dec 1 16:48:12 CET 2011


I'd also suggest you read circle 2 of the "R inferno" (just google it)
which has some helpful tips on how to deal with these sorts of
problems.

Also, did you know that matrices can have column names and that
rbind() preserves them? E.g.,

m <- matrix(1:6, 3); colnames(m) <- letters[1:2]

print(m)

print(rbind(m, c(10, 11)))

Michael

On Thu, Dec 1, 2011 at 9:02 AM, jim holtman <jholtman at gmail.com> wrote:
> First, dataframes can be much slower than matrices, for example, if
> you are changing/accessing values a lot.  I would suggest that you use
> a matrix since is seems that all your values are numeric.  Allocate a
> large empty matrix to start (hopefully as large as you need).  If you
> exceed this, you have the option of 'rbind'ing more empty rows on and
> continuing.  This might depend on how large your final matrix might be
> (you did not state the boundary conditions).
>
> On Thu, Dec 1, 2011 at 6:34 AM, Matteo Richiardi
> <matteo.richiardi at unito.it> wrote:
>> Hi,
>> I'm trying to write a small microsimulation in R: that is, I have a
>> dataframe with info on N individuals for the base-year and I have to
>> grow it dynamically for T periods:
>>
>> df = data.frame(
>>  id = 1:N,
>>  x =....
>> )
>>
>> The most straightforward way to solve the problem that came to my mind
>> is to create for every period a new dataframe:
>>
>> for(t in 1:T){
>>  for(i in 1:N){
>>  row = data.frame(
>>   id = i,
>>   t = t,
>>   x = ...
>>   )
>>   df = rbind(df,row)
>>  }
>> }
>>
>> This is very inefficient and my pc gets immediately stucked as N is
>> raised above some thousands.
>> As an alternative, I created an empty dataframe for all the projected
>> periods, and then filled it:
>>
>> df1 = data.frame(
>>  id = rep(1:N,T),
>>  t = rep(1:T, each = N),
>>  x = rep(NA,N*T)
>> )
>>
>> for(t in 1:T){
>>  for(i in 1:N){
>>  x = ...
>>  df1[df1$id==i & df1$t==t,"x"] = x
>>  }
>> }
>> df = rbind(df,df1)
>>
>> This is also too slow, and my PC gets stucked. I don't want to go for
>> a matrix, because I'd loose the column names and everything will
>> become too much error-prone.
>> Any suggestions on how to do it?
>> Thanks in advance,
>> Matteo
>>
>>
>>
>>
>> --
>> Matteo Richiardi
>> University of Turin
>> Faculty of Law
>> Department of Economics "Cognetti De Martiis"
>> via Po 53, 10124 Torino
>> Email: matteo.richiardi at unito.it
>> Tel. +39 011 670 3870
>> Web page: http://www.personalweb.unito.it/matteo.richiardi/
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>
>
> --
> Jim Holtman
> Data Munger Guru
>
> What is the problem that you are trying to solve?
> Tell me what you want to do, not how you want to do it.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list