[R] function optimization: reducing the computing time

Tue Jul 24 09:24:39 CEST 2007

Dear useRs,

I have written a function that implements a Bayesian method to  
compare a patient's score on two tasks with that of a small control  
group, as described in Crawford, J. and Garthwaite, P. (2007).  
Comparison of a single case to a control or normative sample in  
neuropsychology: Development of a bayesian approach. Cognitive  
Neuropsychology, 24(4):343–372.

The function (see below) return the expected results, but the time  
needed to compute is quite long (at least for a test that may be  
routinely used). There is certainly room for  improvement. It would  
really be helpful if some experts of you may have  a  look ...

Thanks a lot.
Regards,

Matthieu

FUNCTION
----------
The function takes the performance on two tasks  and estimate the  
rarity (the p-value) of the difference between the patient's two  
scores, in comparison to the difference i the  controls subjects. A  
standardized and an unstandardized version are provided (controlled  
by the parameter standardized: T vs. F). Also, for congruency with  
the original publication, both the raw data  and  summary statistics  
could be used for the control group.

##################################################
# Bayesian (un)Standardized Difference Test
##################################################

#from Crawford and Garthwaite (2007) Cognitive Neuropsychology
# implemented by Matthieu Dubois, Matthieu.Dubois<at>psp.ucl.ac.be

#PACKAGE MCMCpack REQUIRED

# patient: a vector with the two scores; controls: matrix/data.frame  
with the raw scores (one column per  task)
# mean.c, sd.c, r, n: possibility to enter summaries statistics  
(mean, standard deviation, correlation, group size)
# n.simul: number of simulations
# two-sided (Boolean): two-sided (T) vs. one-sided (F) Bayesian  
Credible interval
# standardized (Boolean): standardized (T) vs. unstandardized (F) test
# values are: $p.value (one_tailed), $confidence.interval

crawford.BSDT <- function(patient, controls, mean.c=0, sd.c=0 , r=0,  
n=0, na.rm=F, n.simul=100000, two.sided=T, standardized=T)
{
	library(MCMCpack)

	#if no summaries are entered, they are computed
	if(missing(n))
	{	
		if(!is.data.frame(controls)) controls <- as.data.frame(controls)
		n <- dim(controls)[1]
		mean.c <- mean(controls, na.rm=na.rm)
		sd.c <- sd(controls, na.rm=na.rm)

		na.method <- ifelse(na.rm,"complete.obs","all.obs")

		r <- cor(controls[,1], controls[,2], na.method)
	}

	#variance/covariance matrix
	s.xx <- (sd.c[1]^2) * (n-1)
	s.yy <- (sd.c[2]^2) * (n-1)
	s.xy <- sd.c[1] * sd.c[2] * r * (n-1)

	A <- matrix(c(s.xx, s.xy, s.xy, s.yy), ncol=2)

	#estimation function
	if(standardized)
	{
		estimation <- function(patient, mean.c, n, A)
		{
			#estimation of a variance/covariance matrix (sigma)
			sigma = riwish(n,A)	#random obs. from an inverse-Wishart distribution

			#estimation of the means (mu)
			z <- rnorm(2)
			T <- t(chol(sigma)) #Cholesky decomposition
			mu <- mean.c + T %*% z/sqrt(n)

			#standardization
			z.x <- (patient[1]-mu[1]) / sqrt(sigma[1,1])
			z.y <- (patient[2]-mu[2]) / sqrt(sigma[2,2])
			rho.xy <- sigma[2.2] / sqrt(sigma[1,1]*sigma[2,2])

			z.star <- (z.x - z.y) / sqrt(2-2*rho.xy)

			#conditional p-value
			p <- pnorm(z.star)
			p
		}
	}
	else
	{
		estimation <- function(patient, mean.c, n, A)
		{
			#estimation of a variance/covariance matrix (sigma)
			sigma = riwish(n,A)	#random obs. from an inverse-Wishart distribution

			#estimation of the means (mu)
			z <- rnorm(2)
			T <- t(chol(sigma)) #Cholesky decomposition
			mu <- mean.c + T %*% z/sqrt(n)

			num <- (patient[1]-mu[1]) - (patient[2] - mu[2])
			denom <- sqrt(sigma[1,1]+sigma[2,2]-(2*sigma[1,2]))

			z.star <- num/denom

			#conditional p-value
			p <- pnorm(z.star)
			p
		}
	}

	#application
	p <- replicate(n.simul, estimation(patient, mean.c, n, A))

	#outputs
	pval <- mean(p)
	CI <- if(two.sided) 100*quantile(p,c(0.025,0.975)) else 100*quantile 
(p,c(0.95))
	output <- list(p.value=pval, confidence.interval=CI)
	output	
}

TIME ESTIMATION
--------------
# the values used in these examples are taken from the original paper
# system times are estimated for both the standardized and   
unstandardized versions.

system.time(crawford.BSDT(c(95,105),mean.c=c(100,100),sd.c=c 
(10,10),n=5,r=0.6, standardized=F))

    user  system elapsed
230.709  19.686 316.464

system.time(crawford.BSDT(c(90,110),mean.c=c(100,100),sd.c=c 
(10,10),n=5,r=0.6, standardized=T))
    user  system elapsed
227.618  15.656 293.810

R version
-------
 >sessionInfo()
R version 2.5.1 (2007-06-27)
powerpc-apple-darwin8.9.1

locale:
en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] "stats"     "graphics"  "grDevices" "utils"     "datasets"
[6] "methods"   "base"

other attached packages:
MCMCpack     MASS     coda  lattice
  "0.8-2" "7.2-34" "0.11-2" "0.16-2"

Matthieu Dubois
Ph.D. Student

Cognition and Development Lab
Catholic University of Louvain
10, Place Cardinal Mercier
B-1348 Louvain-la-Neuve - Belgium

E-mail: Matthieu.Dubois at psp.ucl.ac.be
Web:  http://www.code.ucl.ac.be/MatthieuDubois/