[R] functionality of "update" in SAS

Denis Chabot chabotd at globetrotter.net
Wed Sep 20 21:42:36 CEST 2006


Dear list,

I've tried to search the archives but found nothing, although I may  
use the wrong wording in my searches. I've also double-checked the  
upData function in Hmisc, but it does something else.

I'm wondering if one can update a dataframe by "forcing into" it a  
shorter dataframe containing the corrections, like the "update"  
provided in SAS data steps.

In this simple example:
a <- data.frame(id=c(1:5),x=rnorm(5))
b <- data.frame(id=4,x=rnorm(1))
 > a
   id          x
1  1  0.6557921
2  2  0.1897523
3  3  0.7976721
4  4  0.2107103
5  5 -0.8855786
 > b
   id         x
1  4 0.8369147

I would like the "updated" dataframe to look like (row names are not  
important to me)

    id          x
1   1  0.6557921
2   2  0.1897523
3   3  0.7976721
4   4  0.8369147
5   5 -0.8855786

I thought this could be done with merge, but this never removes the  
old version of a row, it just gives me two rows with id==4.

I thought of this solution:

reject <- a$id %in% b$id
a2 <- a[!reject,]
a3 <- rbind(a2,b)
 > a3
    id          x
1   1  0.6557921
2   2  0.1897523
3   3  0.7976721
5   5 -0.8855786
11  4  0.8369147

This works, and obviously it is not the best way to make the  
correction in a simple case like this. But providing a few lines of  
corrected data can be an effective method with large dataframes,  
especially if many identifier (grouping) variables are needed to  
identify each line that needs updating, and in this context my  
solution above rapidly becomes ugly.

Furthermore (but I can live with this constraint) this method removes  
entire rows, so I need to make sure the dataframe used to make  
corrections contains all the Y variables in the original dataframe,  
even those that do not need correcting.

If a method exists to just change one variable in 5 lines for a  
dataframe of 5000 lines and 30 variables, I'd appreciate learning  
about it. But I'll already be thrilled if I can update whole lines at  
a time.

Sincerely,

Denis Chabot



More information about the R-help mailing list