[R] merge data

Chuck White chuckwhite8 at charter.net
Wed Nov 11 06:18:29 CET 2009


David -- thank you for your response. 

merge does work but it creates another dataframe. df1 is very large and I did not want another copy created. What I ended up doing is:
df1 <- merge(df1, df2, by="week")

In terms of memory allocation, will memory for two dataframes be allocated or will the additional column be added to df1?

Thanks.

---- David Winsemius <dwinsemius at comcast.net> wrote: 
> 
> On Nov 10, 2009, at 12:36 PM, Chuck White wrote:
> 
> > df1 -- dataframe with column date and several other columns. #rows  
> > >40k  Several of the dates are repeated.
> > df2 -- dataframe with two columns date and index. #rows ~130  This  
> > is really a map from date to index.
> >
> > I would like to create a column called index in df1 which has the  
> > corresponding index from df2.
> >
> > The following works:
> > index <- NULL
> > for(wk in df1$week){
> >    index <- c(index,df2$index[df2$week==wk])
> > }
> > and then add index to df1.
> >
> > Can you please suggest a better way of doing this? I didn't think  
> > merge was suitable for this...is it? THANKS.
> 
> I think merge should work, but if you really have looked at the  
> various arguments, tested reasonable examples and are still convinced  
> it wouldn't, then see what you get with:
> 
>  > df1 <- data.frame(dt = Sys.Date() - sample(100:120, 30,  
> replace=TRUE), 1:30)
>  > df2 <- data.frame(dt2 = Sys.Date() -100:120, index=LETTERS[1:21])
> 
>  > df1$index <- df2[ match(df1$dt,df2$dt2), "index"]
>  > df1
>             dt X1.30 index
> 1  2009-07-30     1     D
> 2  2009-07-16     2     R
> 3  2009-07-23     3     K
> 4  2009-07-29     4     E
> 5  2009-07-15     5     S
> 6  2009-08-02     6     A
> 7  2009-07-18     7     P
> 8  2009-07-21     8     M
> 9  2009-07-27     9     G
> 10 2009-07-26    10     H
> 11 2009-07-31    11     C
> 12 2009-07-26    12     H
> 13 2009-07-18    13     P
> 14 2009-07-23    14     K
> 15 2009-07-21    15     M
> 16 2009-07-19    16     O
> 17 2009-07-14    17     T
> 18 2009-07-16    18     R
> 19 2009-07-15    19     S
> 20 2009-07-13    20     U
> 21 2009-07-28    21     F
> 22 2009-07-20    22     N
> 23 2009-07-24    23     J
> 24 2009-07-20    24     N
> 25 2009-07-16    25     R
> 26 2009-07-30    26     D
> 27 2009-07-14    27     T
> 28 2009-08-02    28     A
> 29 2009-07-19    29     O
> 30 2009-07-26    30     H
> 
> I tried merge(df1, df2, by.x=1, by.y=1) and got the same result modulo  
> the order of the output.
> 
> 
> --
> 
> David Winsemius, MD
> Heritage Laboratories
> West Hartford, CT
>




More information about the R-help mailing list