[R] Trying to avoid the loop while merging two data frames

Dimitri Liakhovitski dimitri.liakhovitski at gmail.com
Tue Dec 22 18:27:57 CET 2015


Hello!
I have a solution for my task that is based on a loop. However, it's
too slow for my real-life problem that is much larger in scope.
However, I cannot use merge. Any advice on how to do it faster?
Thanks a lot for any hint on how to speed it up!

# I have 'mydata' data frame:
set.seed(123)
mydata <- data.frame(myid = 1001:1100,
                     version = sample(1:20, 100, replace = T))
head(mydata)
table(mydata$version)

# I have 'myinfo' data frame that contains information for each 'version':
set.seed(12)
myinfo <- data.frame(version = sort(rep(1:20, 30)), a = rnorm(60), b =
rnorm(60),
                                 c = rnorm(60), d = rnorm(60))
head(myinfo, 40)

### MY SOLUTION WITH A LOOP:
### Looping through each id of mydata and grabbing
### all columns from 'myinfo' for the corresponding 'version':

# 1. Creating placeholder list for the results:
result <- split(mydata[c("myid", "version")], f = list(mydata$myid))
length(result)
(result)[1:3]


# 2. Looping through each element of 'result':
for(i in 1:length(result)){
      id <- result[[i]]$myid
      result[[i]] <- myinfo[myinfo$version == result[[i]]$version, ]
      result[[i]]$myid <- id
      result[[i]] <- result[[i]][c(5, 1:4)]
}
result <- do.call(rbind, result)
head(result) # This is the desired result

-- 
Dimitri Liakhovitski



More information about the R-help mailing list