[R] how to get r-squared for a predefined curve or function with "other" data points

David Winsemius dwinsemius at comcast.net
Thu Feb 16 23:16:29 CET 2012


On Feb 16, 2012, at 11:43 AM, protoplast wrote:

> hello mailing list!
> i still consider myself an R beginner, so please bear with me if my
> questions seems strange.
>
> i'm in the field of biology, and have done consecutive hydraulic
> conductivity measurements in three parallels ("Sample"), resulting  
> in three
> sets of conductivity values ("PLC" for percent loss of conductivity,
> relative to 100%) at multiple pressures ("MPa").
>
> ---
>   Sample      MPa    PLC
>     1      -0.3498324    0.000000
>     1      -1.2414770   15.207821
>     1      -1.7993249   23.819995
>     1      -3.0162866   33.598570
>     1      -3.5184321   46.376933
>     1      -3.9899791   67.532226
>     1      -4.2731145   89.735541
>     1      -4.7597244   99.836239
>     2      -0.2754036    0.000000
>     2      -1.2912619   12.476132
>     2      -1.5128974   13.543273
> ...
> ---
>
> since each sample is a statistical unit, i have fitted each sample- 
> subset to
> a sigmoid curve:
>
> ---
> plot(
> 	NA,
> 	NA,
> 	main="",
> 	xlim=c(-20,0),
> 	ylim=c(0,100),
> 	xlab = "water potential [MPa]",
> 	ylab = "percent loss of conductivity [%]",
> 	xaxp = c(0,-20,4),
> 	yaxp = c(0,100,5),
> 	tck = 0.02,
> 	mgp = c(1.5,0.1,0),
> )
>
> for(i in 1:3){
> 	x <- subset(curvedata,Group == i)$MPa
> 	y <- subset(curvedata,Group == i)$PLC
> 	name <- subset(curvedata,Group == i)$Sample
> 	points(x,y)
>
> 	vlc <- nls(y ~ 100/(1+exp(a*(x-b))), start=c(a=1, b=-3),  
> data=list(x,y))
>
> 	curve(100/(1+exp(coef(vlc)[1]*(x-coef(vlc)[2]))), col=1, add = TRUE)
> 		
> 	Rsquared <- 1 - var(residuals(vlc))/var(y)
>
> 	summarizeall[i ,"Run"] <- i
> 	summarizeall[i ,"Sample"] <- name[1]
> 	summarizeall[i ,"a"] <- coef(vlc)[1]
> 	summarizeall[i ,"b"] <- coef(vlc)[2]
> 	summarizeall[i ,"R2"] <- Rsquared
>
> 	listnow <- data.frame(list(Run = c(i),Sample = c(name[1]), a =
> c(coef(vlc)[1]), b = c(coef(vlc)[2]), R2 = c(Rsquared)))
> 	print(listnow)
>
> 	i <- i+1
> }
> ---
>
> ...and get three slightly different curves with three different  
> estimatinos
> of fit (r², Rsquared).
>
> ---
>> summarizeall
>  Sample   a       b        R2
> 1   1 1.388352 -3.277755 0.9379886
> 2   2 1.800007 -3.363075 0.9327164
> 3   3 1.736857 -2.743972 0.9882998
>
>> average
>   Var n     a          b         R2
> 1 Mean 3 1.6417389 -3.1282673 NA
> 2   SE . 0.1279981  0.1937197 NA
> ---
>
> by averaging parameters a and b of the curve, i create a "mean  
> curve" that
> is added to the plot (red curve in the attached image).
>
> http://r.789695.n4.nabble.com/file/n4394609/conductivity-curve.gif
>
> ---
> meana <- average[1,"a"]
> meanb <- average[1,"b"]
> curve(), col=2, lwd=2, add = TRUE)
> ---
>
> and now here's my problem:
> i'd like to calculate R squared for all points on that mean curve.
> since i have to average the curve parameters, i loose the curve's  
> residuals
> that are stored in my variable vlc (the result of the nls function)  
> for
> every sample.
> just fitting one curve to all the data points is not good enough.

So just calculate them?
# pseudo-code: residual= actual - predicted

gresid <- curvedata$PLC - 100/(1+exp(meana*(curvedata$MPa-meanb))


If you are convinced that your formula for R^2 makes sense and this  
practice is generally accepted in your domain, then you can apply it  
across the whole dataset. I would have thought that a single  
regression model built with nlmer might have been more statistically  
sound. (But this is a bit outside my domain of comfort for giving  
advice.)

>
> an extensive google search over several days hasn't gotten me  
> anywhere, but
> maybe someone here can help me?
>
> is there an efficient way to calculate r squared for a predefined  
> function
> with "unrelated" data points?
> (unrelated as in "not used directly for fitting")
>
>
> thanks in advance
> markus

David Winsemius, MD
West Hartford, CT



More information about the R-help mailing list