[R] lookups and joins

roger bos roger.bos at gmail.com
Tue Jan 25 14:57:52 CET 2005


Paul,

You don't want to write you own function.  merge() will do that for
you very quickly and efficiently.  Just to elaborate on Mike's reply,
here is an example of how to use merge:

> test <- merge(out, trt1m, by="gvkey")
> names(out)
[1] "gvkey"    "datadate" "Price"    "FV"       "ER"       "Rank"    
> names(trt1m)
[1] "gvkey" "trt1m"
> names(test)
[1] "gvkey"    "datadate" "Price"    "FV"       "ER"       "Rank"     "trt1m"   
> 
"gvkey" is the column that is joined on and the new data frame (test)
has all the columns of both data frames.  Now you can subscript as
usual to get just the data you need.  Read ?merge to see what
all.x=TRUE does in case you need to use that.

Thanks,

Roger

On Mon, 24 Jan 2005 22:41:54 -0000, michael watson (IAH-C)
<michael.watson at bbsrc.ac.uk> wrote:
> All together now:
> 
> ?merge
> 
> :-)
> 
> 
> -----Original Message-----
> From:   r-help-bounces at stat.math.ethz.ch on behalf of Paul Sorenson
> Sent:   Mon 1/24/2005 10:34 PM
> To:     r-help at stat.math.ethz.ch
> Cc:
> Subject:        [R] lookups and joins
> I have some data coming from SQL sources that I wish to relate in various ways.  For reasons only known to our IT people, this can't be done in SQL at present.
> 
> I am looking for an R'ish technique for looking up new columns on a data frame.  As a simple, hardwired example I have tried the following:
> 
> # This gives me two columns, one the lookup value and the second one
> # the result column, ie my lookup table.
> stcl = read.csv("stockclass.csv")
> stockclass = as.vector(stcl$stock_class)
> # This gives me what appears to be a dictionary or map
> names(stockclass) = as.vector(stcl$stock_group)
> 
> getstockclass = function(stock_group) {
>        try(stockclass[[stock_group]], TRUE)
> }
> csg$stk_class=factor(sapply(csg$stock_group, getstockclass))
> 
> I need the try since if there is a missing value I get an exception.
> 
> I also tried something along the lines of (from memory):
> getstockclass = function(stock_group) {
>        stcl[which(stcl$stock_group == stock_group),]$stock_class
> }
> 
> These work but I just wanted to check if there was an inbuilt way to do this kind of thing in R?  I searched on "join" without much luck.
> 
> Really what I would like is a generic function that:
>        - Takes 2 data frames,
>        - Some kind of specification on which column(s) to join
>        - Outputs the joined frames, or perhaps a vector which is an index vector that I can use on the second data frame.
> 
> I don't really want to reinvent SQL and my data sets are not huge.
> 
> cheers
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>




More information about the R-help mailing list