[R] help speeding up simple Theil regression function

Brad Schneid bps0002 at auburn.edu
Sun Oct 21 21:05:49 CEST 2012


Wow.  Thank you greatly, that is amazing.  

 Thiel statistic ==> (Pedantic comment: it is Theil (swap the i and e)
Yes sir; I do that every time. 
Dyslexia perhaps?

Thanks again. 



Berend Hasselman wrote
> On 21-10-2012, at 20:06, Brad Schneid wrote:
> 
>> Hello,
>> 
>> I am working on a simple non-parametric (Theil) regression function and
>> and
>> am following Hollander and Wolfe 1999 text.  I would like some help
>> making
>> my function faster.  I have compared with pre-packaged version from
>> "MBLM",
>> which isnt very fast either, but it appears mine is faster with N = 1000
>> (see results below).  I plan on running this function repeatedly, and I
>> generally have data lengths of ~ N = 6000 or more.  
>> 
>> # My function following Hollander and Wolfe text, Chapter 9
>> np.lm <-function(dat, X, Y, ...){
>> 	# Ch 9.2: Slope est. (X) for Thiel statistic
>> 	combos <- combn(nrow(dat), 2)
>> 	i.s <- combos[1,] 
>> 	j.s <- combos[2,] 
>> 	num <- vector("list", length=length(i.s))
>> 	dom <- vector("list", length=length(i.s))
>> 
>> 		for(i in 1:length(i.s)){
>> 			num[[i]]  <- dat[j.s[i],Y] - dat[i.s[i],Y]    
>> 			dom[[i]]  <- dat[j.s[i],X] - dat[i.s[i],X]    
>> 	        	 	}	
>> 	
>> 	X <- median( sort( do.call(c, num) / do.call(c, dom) ) )
>> 	# Ch 9.4: Intercept est. for Thiel statistic
>> 	Intercept <- median(dat[,"Y"] - X*dat[,"X"])
>> 	out <- data.frame(Intercept, X)
>> 	return(out)
>> 		}   # usage: np.lm(dat, X=1, Y=2)
>> ################################################################
>> 
>> library("mblm") # I will compare to mblm() function
>> 
>> X <- rnorm(1000)
>> Y <- rnorm(1000)
>> dat <- data.frame(X, Y)
>> 
>> system.time(np.lm(dat, X=1, Y=2) )
>>   user  system elapsed 
>> 118.610   0.130 119.144 
>> 109.000   0.040 109.416 # ran it twice
>> 86.190   0.100  86.589 # 3rd time
> 
> Alternative function without your i loop (it isn't needed and can be
> vectorized):
> 
> np.lm.alt <-function(dat, X, Y, ...){
> 	# Ch 9.2: Slope est. (X) for Thiel statistic ==> (Pedantic comment: it is
> Theil (swap the i and e)
> 	combos <- combn(nrow(dat), 2)
> 	i.s <- combos[1,] 
> 	j.s <- combos[2,] 
> 	
> 	Y.num <- dat[j.s,Y] - dat[i.s,Y]
> 	X.dom <- dat[j.s,X] - dat[i.s,X]
> 	X <- median( Y.num / X.dom)
> 	# Ch 9.4: Intercept est. for Thiel statistic ==> (Pedantic comment: it is
> Theil (swap the i and e)
> 	Intercept <- median(dat[,"Y"] - X*dat[,"X"])
> 	out <- data.frame(Intercept, X)
> 	return(out)
> }   # usage: np.lm(dat, X=1, Y=2)
> 
> 
> Try the compiler package on you original function:
> 
> library(compiler)
> np.lm.c <- cmpfun(np.lm)
> 
> Test speed and correct results:
> 
> X <- rnorm(500)
> Y <- rnorm(500)
> dat <- data.frame(X, Y)
> 
> system.time(npout.c <- np.lm.c(dat, X=1, Y=2) )
> system.time(npout.1 <- np.lm(dat, X=1, Y=2) )
> system.time(npout.a <- np.lm.alt(dat, X=1, Y=2) )
> identical(npout.1,npout.c)      
> identical(npout.1,npout.a)
> 
> Results:
> 
>> system.time(npout.c <- np.lm.c(dat, X=1, Y=2) )
>    user  system elapsed 
>  21.442   0.066  21.517 
>> system.time(npout.1 <- np.lm(dat, X=1, Y=2) )
>    user  system elapsed 
>  21.068   0.073  21.161 
>> system.time(npout.a <- np.lm.alt(dat, X=1, Y=2) )
>    user  system elapsed 
>   0.303   0.010   0.313 
>> identical(npout.1,npout.c)      
> [1] TRUE
>> identical(npout.1,npout.a)
> [1] TRUE
> 
> You may try and test this with larger data lengths.
> 
> 
> Berend
> 
> ______________________________________________

> R-help@

>  mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.





--
View this message in context: http://r.789695.n4.nabble.com/help-speeding-up-simple-Theil-regression-function-tp4646923p4646933.html
Sent from the R help mailing list archive at Nabble.com.




More information about the R-help mailing list