[R] Avoiding loops using 'for' and pairwise comparison of columns

Blaser Nello nblaser at ispm.unibe.ch
Mon Jun 24 12:18:00 CEST 2013


Here's a possible solution to avoid the loop

k <- as.matrix(expand.grid(1:ncol(x),1:ncol(x)))
a1 <- as.data.frame(matrix(sapply(1:nrow(k), function(n)
agree(x[,k[n,]])$value), nrow=ncol(x)))
colnames(a1) <- colnames(x)
rownames(a1) <- colnames(x)

> identical(a, a1)
[1] TRUE

Or if you want to avoid double calculation, 

a2 <- as.data.frame(matrix(0, nrow=ncol(x), ncol=ncol(x)))
colnames(a2) <- colnames(x)
rownames(a2) <- colnames(x)
k <- t(combn(1:ncol(x), 2))
a2[lower.tri(a2)] <- sapply(1:nrow(k), function(n)
agree(x[,k[n,]])$value)
a2 <- a2+diag(100,ncol(x))
a2[upper.tri(a2)] <- t(a2)[upper.tri(a2)]

> identical(a, a2)
[1] TRUE

Best, 
Nello

-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
On Behalf Of Kulupp
Sent: Montag, 24. Juni 2013 11:02
To: r-help at r-project.org
Subject: [R] Avoiding loops using 'for' and pairwise comparison of
columns

Dear R-experts,

I'd like to avoid the use of very slow 'for'-loops but I don't know how.

My data look as follows (the original data has 1600 rows and 30
columns):

# data example
c1 <- c(1,1,1,0.25,0,1,1,1,0,1)
c2 <- c(0,0,1,1,0,1,0,1,0.5,1)
c3 <- c(0,1,1,1,0,0.75,1,1,0.5,0)
x <- data.frame(c1,c2,c3)

I need to compare every column with each other and want to know the
percentage of similar values for each column pair. To calculate the
percentage of similar values I used the function 'agree' from the
irr-package. I solved the problem with a loop that is very slow.

library(irr)     # required for the function 'agree'

# empty data frame for the results
a <- as.data.frame(matrix(data=NA, nrow=3, ncol=3))
colnames(a) <- colnames(x)
rownames(a) <- colnames(x)

# the loop to write the data
for (j in 1:ncol(x)){
   for (i in 1:ncol(x)){
     a[i,j] <- agree(cbind(x[,j], x[,i]))$value } }


I would be very pleased to receive your suggestions how to avoid the
loop. Furthermore the resulting data frame could be displayed as a
diagonal matrix without duplicates of each pairwise comparison, but I
don't know how to solve this problem.

Kind regards

Thomas

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list