[R] How to improve this code?
Gabor Grothendieck
ggrothendieck at myway.com
Mon Apr 5 07:32:07 CEST 2004
If I understand correctly, storelist and customerlist are two column matrices
of lat and long and you want all combos less than a certain distance apart
sorted by store and distance.
dd is the distance matrix of all pairs. We form this into a data frame of row
numbers (i.e. store numbers), column numbers (i.e. customer numbers) and
distances, subset that and then sort it. Then tapply seq to each group of
data from the same store to get ranks within stores.
Note that this forms some very large matrices if your data is large.
require(fields)
maxd <- 100
dd <- rdist.earth( storelist, customerlist, miles = F )
out <- data.frame( store=c(row(dd)), cust=c(col(dd)), dist=c(dd) )[c(dd)<maxd,]
out <- out[ order( out$store, out$dist ),]
rk <- c( unlist( tapply( out$store, out$store, function(x)seq(along=x) ) ) )
out <- cbind( rank=rk, out )
Danny Heuman <dsheuman <at> rogers.com> writes:
:
: Hi all,
:
: I've got some functioning code that I've literally taken hours to
: write. My 'R' coding is getting better...it used to take days :)
:
: I know I've done a poor job of optimizing the code. In addition, I'm
: missing an important step and don't know where to put it.
:
: So, three questions:
:
: 1) I'd like the resulting output to be sorted on distance (ascending)
: and to have the 'rank' column represent the sort order, so that rank 1
: is the first customer and rank 10 is the 10th. Where do I do this?
:
: 2) Can someone suggest ways of 'optimizing' or improving the code?
: It's the only way I'm going to learn better ways of approaching R.
:
: 3) If there are no customers in the store's Trade Area, I'd like the
: output file have nothing written to it . How can I do that?
:
: All help is appreciated.
:
: Thanks,
:
: Danny
:
:
: *********************************************************
: library(fields)
:
: #Format of input files: ID, LONGITUDE, LATITUDE
:
: #Generate Store List
: storelist <- cbind(1:100, matrix(rnorm(100, mean = -60, sd = 3), ncol
: = 1),
: matrix(rnorm(100, mean = 50, sd = 3), ncol = 1))
:
: #Generate Customer List
: customerlist <- cbind(1:10000,matrix(rnorm(10000, mean = -60, sd =
: 20), ncol = 1),
: matrix(rnorm(10000, mean = 50, sd = 10), ncol = 1))
:
: #Output file
: outfile <- "c:\\output.txt"
: outfilecolnames <- c("rank","storeid","custid","distance")
: write.table(t(outfilecolnames), file = outfile, append=TRUE,
: sep=",",row.names=FALSE, col.names=FALSE)
:
: #Trade Area Size
: TAsize <- c(100)
:
: custlatlon <- customerlist[, 2:3]
:
: for(i in 1:length(TAsize)){
: for(j in 1:nrow(storelist)){
: cat("Store: ", storelist[j]," TA Size = ", TAsize[i],
: "\n")
:
: storelatlon <- storelist[j, 2:3]
:
: whichval <-
: which(rdist.earth(t(as.matrix(storelatlon)), as.matrix(custlatlon),
: miles=F) <= TAsize[i])
:
: dist <-
: as.data.frame(rdist.earth(t(as.matrix(storelatlon)),
: as.matrix(custlatlon), miles=F)[whichval])
:
: storetag <-
: as.data.frame(cbind(1:nrow(dist),storelist[j,1]))
: fincalc <-
: as.data.frame(cbind(1:nrow(dist),(customerlist[whichval,1]),rdist.earth(t
(as.matrix(storelatlon)),
: as.matrix(custlatlon), miles=F)[whichval]))
:
: combinedata <- data.frame(storetag, fincalc)
:
: combinefinal <- subset(combinedata, select= c(-1,-3))
:
: flush.console()
:
: write.table(combinefinal, file = outfile, append=TRUE,
: sep=",", col.names=FALSE)
: }
:
: }
:
: ______________________________________________
: R-help <at> stat.math.ethz.ch mailing list
: https://www.stat.math.ethz.ch/mailman/listinfo/r-help
: PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
:
:
More information about the R-help
mailing list