[R] Aggregate weights for a unique set of rows

Rui Barradas ruipbarradas at sapo.pt
Fri Jun 29 02:04:00 CEST 2012


Hello,

Would around two orders of magnitude interess?

f1 <- function(Nodes, Weights){
	drop.index <- duplicated(Nodes)
	n.unique <- Nodes[!drop.index, ]
	w.unique <- numeric(length(n.unique[,1]))

	lw <- length(Weights)
	for (i in seq_along(w.unique)){
		index <- as.logical(2 == rowSums(
			Nodes == matrix(rep(n.unique[i,],lw), byrow=TRUE, nrow=lw)))
		w.unique[i] <- sum(Weights[index])
	}
	list(n.unique=n.unique, w.unique=w.unique)
}

f2 <- function(Nodes, Weights){
	rows <- paste(Nodes[,1], Nodes[,2], sep=".")
	w.uniq <- tapply(Weights, rows, sum)
	attributes(w.uniq) <- NULL
	ord <- order(unique(rows))
	list(n.unique=unique(Nodes), w.unique=w.uniq[order(ord)])
}

# Test it
M <- 100  # see text below
n <- 2e5

set.seed(1234)
nd <- matrix(sample(M, n*2, TRUE), n, 2)
ww <- rep(1, n)

t1 <- system.time(r1 <- f1(nd, ww))
t2 <- system.time(r2 <- f2(nd, ww))
identical(r1, r2)

print(rbind(t1=t1, t2=t2, ratio=t1/t2), digits=3)
       user.self sys.self elapsed user.child sys.child
t1       310.41       67  379.07         NA        NA
t2         5.59        0    5.62         NA        NA
ratio     55.53      Inf   67.45         NA        NA


With bigger M the number of pairwise combinations increases and so does 
the number of unique rows. This causes the time taken by f1 to really 
increase, but f2 scales up rather slowly. The ratio above becomes really 
better and better.

Hope this helps,

Rui Barradas


Em 28-06-2012 14:06, Weiser, Constantin escreveu:
> Hi, all together. I have - a maybe trivial - problem with aggregating a
> list of weights.
>
> Here is the problem:
>   - At first I have set of nodes (X/Y-coordinates) and associated weights,
> where the set
>     of nodes is typically not unique
>   - I want to get a set of unique nodes and the sum of associated weights
>
> I am grateful for any help
>
>
> See for example:
>
> 	# weights:
> 	w <- c(1, 1, 1, 1, 1)
>
> 	# not unique set of nodes (X/Y-coordinates):
> 	nodes <- matrix(c(1,2,3,4,5,6,1,2,1,4),ncol=2, byrow=T)
>
>
> desired Result:
>
> 	#nodes
> 	    [,1] [,2]
> 	[1,]    1    2
> 	[2,]    3    4
> 	[3,]    5    6
> 	[4,]    1    4
>
>
>
> 	#weights
> 	2 1 1 1
>
>
>
> That is my solution, but it is very slow (typical size of nodes -->
> 200000x2):
>
> weights <- c(1, 1, 1, 1, 1)
> nodes <- matrix(c(1,2,3,4,5,6,1,2,1,4),ncol=2, byrow=T)
>
>
> ## to be replaced by a faster code
> 	drop.index <- duplicated(nodes)
> 	n.unique <- nodes[!drop.index, ]
> 	w.unique <- numeric(length(n.unique[,1]))
>
> 	lw <- length(weights)
> 	for (i in seq_along(w.unique)){
> 		index <- as.logical(2==rowSums(nodes==matrix(rep(n.unique[i,],lw),byrow
> = TRUE, nrow=lw)))
> 		w.unique[i] <- sum(weights[index])
> 	}
> ##
>
> n.unique
> w.unique
>
>
>
>
>
> ^
> |                X
> |               /
> |              /eiser, Constantin
> |             / Gutenberg University of Mainz, Germany
> | *    /\    /  Chair of Statistics & Econometrics
> |  \  /  \  /   Jakob-Welder-Weg 4, 55128 Mainz
> |   \/    \/    House of Law and Economics II, Room 00-116
> |               Tel: 0049 6131 39 22715
> +--------------------------------------------------------->
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list