[R] Quickest way to match two vectors besides %in%?

Duncan Murdoch murdoch at stats.uwo.ca
Tue Nov 8 21:00:16 CET 2005


On 11/8/2005 2:28 PM, Pete Cap wrote:
> Hello list,
> 
> I have two data frames, X (48469,2) and Y (79771,5).
> 
> X[,1] contains distinct values of Y[,2].
> I want to match values in X[,1] and Y[,2], then take
> the corresponding value in [X,2] and place it in
> Y[,4].
> 
> So far I have been doing it like so:
> for(i in 1:48469) {
> y[which(x[i,1]==y[,3]),4]<-x[i,2]
> }
> 
> But it chunks along so very slowly that I can't help
> but wonder if there's a faster way, mainly because on
> my box it takes R about 30 seconds to simply COUNT to
> 48,469 in the for loop.
> 
> I have already tried using %in%.  It tells me if the
> values in X[,1] are IN Y[,2], which is useful in
> removing unnecessary values from X[,1].  But it does
> not tell me exactly where they match.  which(X[,1]
> %in% Y[,2]) does but it only matches on the first
> instance.
> 
> This is the slowest part of the script I'm working
> on--if I could improve it I could shave off some
> serious operating time.  Any pointers?

Look at the merge() function to add the X and Y columns to a new 
dataframe, then process that to merge the X[,2] and Y[,4] values.

It will be something like

Z <- merge(X, Y, by.x=1, by.y=2, all.y=TRUE)

changes <- !is.na(Z[,2])
Z[changes,5] <- Z[changes,2]

but you are almost certainly better off (from a maintenance point of 
view) to use the names of the columns, rather than guessing at column 
numbers.

Duncan Murdoch




More information about the R-help mailing list