[R] contour for plotting confidence interval on scatter plot of bivariate normal distribution

Michael Friendly friendly at yorku.ca
Sun Mar 4 01:10:41 CET 2012


On 3/3/2012 12:34 PM, drflxms wrote:
> Thanx a lot Greg for the hint and letting me not alone with this!
>
> Tried ellipse and it works well. But I am looking for something more
> precise. The ellipse fits the real border to the nearest possible
> ellipse. I want the "real" contour, if possible.

The 'contours' you are talking about here are estimates of the level
sets of your bivariate distribution with varying density.  There is
no "real" contour -- only what you can estimate from your data.

Following
your subject line, "plotting confidence interval on scatter plot of 
bivariate normal distribution", the most straight-forward approach
is in terms of a data ellipse, e.g.

library(car)
dataEllipse(x,y, levels=c(0.40, 0.68))

where the ellipses plotted have the property that the 0.40 coverage
data ellipse has univariate mean +- 1SD shadows on the X,Y axes,
while the 0.68 ellipse has bivariate mean +- 1SD shadows on those
axes.  This is a precise answer to the question you posed, and others
have supplied other versions, but I think car::dataEllipse() is the most
generally useful, but this depends on the assumption that your data
are at least approximately bivariate normal.


>
> Meanwhile I found an interesting function named draw.contour
> http://quantitative-ecology.blogspot.com/2008/03/plotting-contours.html
> provided by "Forester", an "Assistant Professor in the Department of
> Fisheries". - I love it how statistics brings people from completely
> different specialities together. - I am a neurologist.

Similar plots of empirical contours of density for (x,y) can be obtained
with contour() on the result of MASS::kde2d() and KernSmooth::bdke2D().
These may seem to you to be "more precise", but then they don't give
a way to estimate confidence regions with any coverage.  Take your choice.

>
> This function results not exactly in the same curve, I got with the code
> I posted last, but is in deed very close by (parallel). So I think both
> solutions are probably true and differ only because of rounding and
> method (..!?).
>
> Still I am not completely happy with
>
> 1. the numeric solution of setting to 68% to get 1SD. I'd prefer a
> symbolic, formula driven solution instead.
> 2. me not comprehending the code completely.
>
> While 2. will be hopefully solved soon by me delving into the code, 1
> remains...
>
> Best, Felix
>
>
> Am 03.03.12 17:28, schrieb Greg Snow:
>> Look at the ellipse package (and the ellipse function in the package)
>> for a simple way of showing a confidence region for bivariate data on
>> a plot (a 68% confidence interval is about 1 SD if you just want to
>> show 1 SD).
>>
>> On Sat, Mar 3, 2012 at 7:54 AM, drflxms<drflxms at googlemail.com>  wrote:
>>> Dear all,
>>>
>>> I created a bivariate normal distribution:
>>>
>>> set.seed(138813)
>>> n<-100
>>> x<-rnorm(n); y<-rnorm(n)
>>>
>>> and plotted a scatterplot of it:
>>>
>>> plot(x,y)
>>>
>>> Now I'd like to add the 2D-standard deviation.
>>>
>>> I found a thread regarding plotting arbitrary confidence boundaries from
>>> Pascal Hänggi
>>> http://www.mail-archive.com/r-help@r-project.org/msg24013.html
>>> which cites the even older thread
>>> http://tolstoy.newcastle.edu.au/R/help/03b/5384.html
>>>
>>> As I am unfortunately only a very poor R programmer, the code of Pascal
>>> Hänggi is a myth to me and I am not sure whether I was able to translate
>>> the recommendation of Brain Ripley in the later thread (which provides
>>> no code) into the the correct R code. Brain wrote:
>>>
>>> You need a 2D density estimate (e.g. kde2d in MASS) then compute the
>>> density values at the points and draw the contour of the density which
>>> includes 95% of the points (at a level computed from the sorted values
>>> via quantile()). [95% confidence interval was desired in thread instead
>>> of standard deviation...]
>>>
>>> So I tried this...
>>>
>>> den<-kde2d(x, y, n=n) #as I chose n to be the same as during creating
>>> the distributions x and y (see above), a z-value is assigned to every
>>> combination of x and y.
>>>
>>> # create a sorted vector of z-values (instead of the matrix stored
>>> inside the den object
>>> den.z<-sort(den$z)
>>>
>>> # set desired confidence border to draw and store it in variable
>>> confidence.border<- quantile(den.z, probs=0.6827, na.rm = TRUE)
>>>
>>> # draw a line representing confidence.border on the existing scatterplot
>>> par(new=TRUE)
>>> contour(den, levels=confidence.border, col = "red", add = TRUE)
>>>
>>> Unfortunately I doubt very much this is correct :( In fact I am sure
>>> this is wrong, because the border for probs=0.05 is drawn outside the
>>> values.... So please help and check.
>>> Pascal Hänggis code seems to work, but I don't understand the magic he
>>> does with
>>>
>>> pp<- array()
>>> for (i in 1:1000){
>>>         z.x<- max(which(den$x<  x[i]))
>>>         z.y<- max(which(den$y<  y[i]))
>>>         pp[i]<- den$z[z.x, z.y]
>>> }
>>>
>>> before doing the very same as I did above:
>>>
>>> confidencebound<- quantile(pp, 0.05, na.rm = TRUE)
>>>
>>> plot(x, y)
>>> contour(den, levels = confidencebound, col = "red", add = TRUE)
>>>
>>>
>>> My problems:
>>>
>>> 1.) setting probs=0.6827 is somehow a dirty trick which I can only use
>>> by simply knowing that this is the percentage of values inside +-1sd
>>> when a distribution is normal. Is there a way doing this with "native"
>>> sd function?
>>> sd(den.z) is not correct, as den.z is in contrast to x and y not normal
>>> any more. So ecdf(den.z)(sd(den.z)) results in a percentile of 0.5644 in
>>> this example instead of the desired 0.6827.
>>>
>>> 2.) I would like to have code that works with any desired confidence.
>>> Unfortunately setting probs to the desired confidence would probably be
>>> wrong (?!) as it relates to den.z instead of x and y, which are the
>>> underlying distributions I am interested in. To put it short I want the
>>> confidence of x/y and not of den.z.
>>>
>>>
>>> I am really completely stuck. Please help me out of this! Felix
>>>
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>
>>
>



More information about the R-help mailing list