[R] confidence / prediction ellipse

Rolf Turner rolf.turner at xtra.co.nz
Thu Feb 7 23:13:41 CET 2013


Please see in-line below.

On 02/08/2013 05:20 AM, Giuseppe Amatulli wrote:
> Hi Rolf,
> sorry for this late answer and thanks for your kind explanation and
> relative R code. I really appreciate.
> In reality the concept that I'm trying to address is a bit more complex.
> I'm fitting a model y vs 6 predictors with MARS / RandomForest /
> Multiple Linear Regression Models having 140 observations.
> I have the prediction of each model and would like to delineate the
> prediction ellipses for 3 models, for the 95% probability, and
> plotting them together with the observation vs prediction.
> I think that the prediction-ellipses code that you provide to me is
> valid also for predictions derived by not-linear model (such as MARS
> and RF).
> Is it correct?
         Probably not.  The degrees of freedom will very likely be
         wrong.  You need to consult a suitable book on multivariate
         analysis --- quite possibly Seber's book "Multivariate 
Observations"
         would help.
> or should i use an alternative solution ?
         At the very least some adjustment will be required I think.
>
> Moreover, I was expecting that the  abline (lm(b,a)) would be
> correspond to the main axis of the prediction ellipse,
         Why do you expect that?
> but is not this
> the case.
> why?
         No idea.  What are "b" and "a"?  In general if a and b are
         jointly Gaussian you can

             (i) regress b on a and plot the fitted regression line, which
             is what abline(lm(b~a)) will do, or

             (ii) form the prediction ellipse for (a,b) .

         The major axis of this prediction ellipse will NOT be the 
regression
         line.  I have a vague recollection that this major axis is the 
line which
         minimizes the sum of squares of the *orthogonal* distances of the
         points to the line (whereas the regression line minimizes the 
sum of
         squares of the *vertical* distances).

         But none of this seems to me to have much if anything to do with
         what you are trying to accomplish.


         At any rate this discussion has nothing to do with R.  You 
should ask
         about it on some statistics discussion forum, or consult with 
an expert
         on multivariate statistics.

     cheers,

             Rolf Turner
> Thanks in advance
> Giuseppe
>
> On 28 January 2013 19:04, Rolf Turner <rolf.turner at xtra.co.nz> wrote:
>> I believe that the value of "radius" that you are using is incorrect. If you
>> have a data
>> matrix X whose columns  are jointly distributed N(mu,Sigma) then a
>> confidence
>> ellipse for mu is determined by
>>
>>      n * (x - Xbar)' S^{-1}(x - Xbar) ~ T^2
>>
>> where Xbar is the mean vector for X and S is the sample covariance matrix,
>> and where "T^2" means Hotelling's T-squared distribution, which is equal to
>>
>>      (n-1)*2/(n-2) * F_{2,n-2}
>>
>> the latter representing the F distribution on 2 and n-2 degrees of freedom.
>>
>> Thus (I think) your radius should be
>>
>>      radius <- sqrt(2 * (npts-1) * qf(0.95, 2, npts-2)/(npts*(npts-2)))
>>
>> where npts <- length(a).  Note that it is qf(0.95,2,npts-2) and *NOT*
>> qf(0.95,2,npts-1).
>>
>> To get the corresponding *prediction* ellipse simply multiply the foregoing
>> radius by sqrt(npts+1).  By "prediction ellipse" I mean an ellipse such that
>> the probability that a new independent observation from the same population
>> will fall in that ellipse is the given probability (e.g. 0.95). Note that
>> this does
>> not mean that 95% of the data will fall in the calculated ellipse (basically
>> because
>> of the *dependence* between S and the individual observations).
>>
>> These confidence and prediction ellipses are (I'm pretty sure) valid under
>> the assumption that the data are (two dimensional, independent) Gaussian,
>> and that you use the sample covariance and sample mean as "shape" and
>> "centre".  I don't know what impact your robustification procedure of using
>> cov.trob() will/would have on the properties of these ellipses.
>>
>> A script which does the ellipses for your toy data, using the sample
>> covariance
>> and sample mean (rather than output from cov.trob()) is as follows:
>>
>> #
>> # Script scr.amatulli
>> #
>>
>> require(car)
>> a <- c(12,12,4,5,63,63,23)
>> b <- c(13,15,7,10,73,83,43)
>> npts   <- length(a)
>> shape  <- var(cbind(a, b))
>> center <- c(mean(a),mean(b))
>> rconf  <- sqrt(2 * (npts-1) * qf(0.95, 2, npts-2)/(npts*(npts-2)))
>> rpred  <- sqrt(npts+1)*rconf
>>
>> conf.elip <- ellipse(center, shape, rconf,draw = FALSE)
>> pred.elip <- ellipse(center, shape, rpred,draw = FALSE)
>> plot(pred.elip, type='l')
>> points(a,b)
>> lines(conf.elip,col="red")
>>
>>      cheers,
>>
>>          Rolf Turner
>>
>>
>> On 01/27/2013 10:12 AM, Giuseppe Amatulli wrote:
>>> Hi,
>>> I'm using the R library(car) to draw confidence/prediction ellipses in a
>>> scatterplot.
>>> >From what i understood  the ellipse() function return an ellipse based
>>> parameters:  shape, center,  radius .
>>> If i read  dataEllipse() function i can see how these parameters are
>>> calculated for a confidence ellipse.
>>>
>>> ibrary(car)
>>>
>>> a=c(12,12,4,5,63,63,23)
>>> b=c(13,15,7,10,73,83,43)
>>>
>>> v <- cov.trob(cbind(a, b))
>>> shape <- v$cov
>>> center <- v$center
>>>
>>> radius <- sqrt(2 * qf(0.95, 2, length(a) - 1))   # radius <- sqrt(dfn *
>>> qf(level, dfn, dfd))
>>>
>>> conf.elip = ellipse(center, shape, radius,draw = F)
>>> plot(conf.elip, type='l')
>>> points(a,b)
>>>
>>> My question is how I can calculate shape, center and radius  to obtain a
>>> prediction ellipses rather than a confidence ellipse?



More information about the R-help mailing list