[R] faster row by row data frame processing

Mon Dec 20 20:37:51 CET 2004

Something like this perhaps?

x <- matrix(rnorm(1000),ncol=10)
y <- t(apply(abs(x),1,rank,ties.method="first"))

thresh <- 8
x[y>thresh] <- sign(x[y>thresh])
x[y<=thresh] <- 0

> -----Original Message-----
> From: r-help-bounces at stat.math.ethz.ch 
> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of bogdan romocea
> Sent: Monday, December 20, 2004 1:52 PM
> To: r-help at stat.math.ethz.ch
> Subject: [R] faster row by row data frame processing
> 
> 
> Dear R users,
> 
> I have a data frame with a few thousand rows and several 
> hundred numeric columns (plus a date column). For each row 
> (day), I want to assign +/- 1 to the highest X absolute 
> values, 0 to the other values, and save all that in a 
> separate data frame. 
> 
> I have a working solution (below), however I find it rather 
> slow. Is there something I could do to increase the speed? 
> (The code is CPU-bound; Pentium 4 @ 2.4 GHz, 512 MB RAM, Win 
> XP, R 2.0.0.)
> 
> Thank you,
> b.
> 
> 
> #all is the original data frame (date + a number of columns) 
> #set up the output data frame DailyTopN <- 
> data.frame(all[1,1],matrix(ncol=ncol(all)-1))
> names(DailyTopN) <- names(all)
> top <- 20
> for (i in 1:1000)	#the rows to be processed
> 	{
> 	#data frame row as vector
> 	onerow <- na.omit(as.matrix(all[i,][2:ncol(all)])[1, ])
> 	#select the 'top' highest absolute values
> 	r <- rank(abs(onerow),ties.method="random")
> 	selected <- names(r[which(r <= top)])
> 	#set +/-1 for the highest absolute values, 0 for the others
> 	DailyTopN[i,selected] <- 1 * sign(all[i,selected])
> 	DailyTopN[i,1] <- all[i,1]	#add the date
> 	}
> DailyTopN[is.na(DailyTopN)] <- 0
> rownames(DailyTopN) <- 1:nrow(DailyTopN)
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list 
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
>