[R] contour for plotting confidence interval on scatter plot of bivariate normal distribution

Sat Mar 3 23:08:03 CET 2012

Greg, extremely cool thoughts! Thank you for delving into it this deep.

As I mentioned, I am a neurologist with unfortunately poor statistical
training. You are professional statisticians. So I'd like to apologize
for any unprofessional nomenclature and strange thoughts beforehand.

As my previous postings show, I think a lot in pictures. Therefore I
found the interactive demo(bivar) in the rgl package very impressive.

I believe everything you are writing is completely true for
theoretical/parametric normal distributions. When you plot the density
of such an probability distribution it will have the form of a sugar
loaf (skewed or unskewed doesn't matter in this case). The projection of
the standard deviations of all possible slices to the x-y-plane below
the sugar loaf will be an ellipse for shore. Corresponding to that you
would get nested ellipses when you apply contour() to the 2d scatter
plot of the data of that bivariate distribution.

But the distribution generated via rnorm are random. demo(bivar) shows
both for such an rnorm() distribution: 1.) the "sugar loaf" parametric
theoretical 3d normal density, and 2.) the crumpled non-parametric
"real" 3d density (looks like a mountain), which is derived from the
data of the 2d scatter plot. I grasped the terms parametric and
non-parametric form the code of demo(bivar) by the way.
In this case contour() will not draw nested ellipses but contour lines
similar to those on a mountain map.
As far as I understand this is closer to reality because fewer
assumptions are made compared to the parametric approach. That's why I
preferred contours over ellipses.

Hopefully I could made my thoughts understood and they are not
completely ridiculous.
Thank you again for your detailed explanations which I am going to study
in detail right now... :) Just wanted to answer before that more
detailed study, as this is certainly taking some time...

Best, Felix

Am 03.03.12 22:14, schrieb Greg Snow:
> To further explain.  If you want contours of a bivariate normal, then
> you want ellipses.   The density for a bivariate normal (with 0
> correlation to keep things simple, but the theory will extend to
> correlated cases)  is proportional to exp( -1/2 ( x1^2/v1 + x2^2/v2 )
> so a contour of the distribution will be all points such that x1^2/v1
> + x2^2/v2 = c for some constant c (each c will give a different
> contour), but that is the definition of an ellipse (well divide both
> sides by c so that the right side is 1 to get the canonical form).
> The ellipse function in the ellipse package chooses c from the chi
> squared distribution (since if x1 and x2 are normally distributed with
> mean 0 (or have the mean subtracted), then x1^2/v1 +x2^2/v2 is chi
> squared distributed with 2 degrees of freedom.
> 
> So if you really want to you can try to approximate the contours in
> some other way, but any decent approach will just converge to the
> ellipse.
> 
> On Sat, Mar 3, 2012 at 1:26 PM, drflxms <drflxms at googlemail.com> wrote:
>> Wow, David,
>>
>> thank you for these sources, which I just screened. bagplot looks most
>> promising to me. I found it in the package ‘aplpack’ as well as in the R
>> Graph Gallery
>> http://addictedtor.free.fr/graphiques/RGraphGallery.php?graph=112
>>
>> Ellipses are not exactly what I am heading for. I am looking for a 2D
>> equivalent of plotting 1D standard deviation boundaries. In other words
>> how to plot SD boundaries on a 2D scatter plot.
>> So I started searching for contour/boundaries etc. instead of ellipse
>> leading me to Pascal Hänggi:
>> http://www.mail-archive.com/r-help@r-project.org/msg24013.html
>>
>> To describe it in an image: I want to cut the density mountain above the
>> scatter plot (see demo(bivar) in the rgl package) in a way so that the
>> part of the mountain, that covers 68% of the data on the x-y-plane below
>> it (+-1 SD) is removed. Then I'd like to project the edge that results
>> from the cut to the x-y-plane below the mountain. This should be the 2d
>> equivalent of 1s SD boundaries.
>>
>> I think this might be achieved as well by Hänggis code as by the
>> function of Forester. Unfortunately they result in slightly different
>> boundaries which shouldn't be the case. And I did not figure out which
>> one is correct if one is correct at all (!?).
>>
>> Can anyone explain the difference?
>>
>> I compared them with this code:
>>
>> # parameters:
>> n<-100
>>
>> # generate samples:
>> set.seed(138813)
>> x<-rnorm(n); y<-rnorm(n)
>> a<-list("x"=x,"y"=y) # input for Foresters function which is appended at
>> the very end
>>
>> # estimate non-parameteric density surface via kernel smoothing
>> library(MASS)
>> den<-kde2d(x, y, n=n)
>>
>> z <- array()
>> for (i in 1:n){
>>        z.x <- max(which(den$x < x[i]))
>>        z.y <- max(which(den$y < y[i]))
>>        z[i] <- den$z[z.x, z.y]
>> }
>>
>> # store class/level borders of confidence interval in variables
>> confidence.border <- quantile(z, probs=0.05, na.rm = TRUE)# 0.05
>> corresponds to 0.95 in draw.contour
>>
>> plot(x,y)
>> draw.contour(a, alpha=0.95)
>> par(new=TRUE)
>> contour(den, levels=confidence.border, col = "red", add = TRUE)
>>
>>
>> ###########################################
>> ## drawcontour.R
>> ## Written by J.D. Forester, 17 March 2008
>> ###########################################
>>
>> ## This function draws an approximate density contour based on
>> empirical, bivariate data.
>>
>> ##change testit to FALSE if sourcing the file
>> testit=TRUE
>>
>> draw.contour<-function(a,alpha=0.95,plot.dens=FALSE, line.width=2,
>> line.type=1, limits=NULL, density.res=800,spline.smooth=-1,...){
>>  ## a is a list or matrix of x and y coordinates (e.g.,
>> a=list("x"=rnorm(100),"y"=rnorm(100)))
>>  ## if a is a list or dataframe, the components must be labeled "x" and "y"
>>  ## if a is a matrix, the first column is assumed to be x, the second y
>>  ## alpha is the contour level desired
>>  ## if plot.dens==TRUE, then the joint density of x and y are plotted,
>>  ## otherwise the contour is added to the current plot.
>>  ## density.res controls the resolution of the density plot
>>
>>  ## A key assumption of this function is that very little probability
>> mass lies outside the limits of
>>  ## the x and y values in "a". This is likely reasonable if the number
>> of observations in a is large.
>>
>>  require(MASS)
>>  require(ks)
>>  if(length(line.width)!=length(alpha)){
>>    line.width <- rep(line.width[1],length(alpha))
>>  }
>>
>>  if(length(line.type)!=length(alpha)){
>>    line.type <- rep(line.type[1],length(alpha))
>>  }
>>
>>  if(is.matrix(a)){
>>    a=list("x"=a[,1],"y"=a[,2])
>>  }
>>  ##generate approximate density values
>>  if(is.null(limits)){
>>    limits=c(range(a$x),range(a$y))
>>  }
>>  f1<-kde2d(a$x,a$y,n=density.res,lims=limits)
>>
>>  ##plot empirical density
>>  if(plot.dens) image(f1,...)
>>
>>  if(is.null(dev.list())){
>>    ##ensure that there is a window in which to draw the contour
>>    plot(a,type="n",xlab="X",ylab="Y")
>>  }
>>
>>  ##estimate critical contour value
>>  ## assume that density outside of plot is very small
>>
>>  zdens <- rev(sort(f1$z))
>>  Czdens <- cumsum(zdens)
>>  Czdens <- (Czdens/Czdens[length(zdens)])
>>  for(cont.level in 1:length(alpha)){
>>    ##This loop allows for multiple contour levels
>>    crit.val <- zdens[max(which(Czdens<=alpha[cont.level]))]
>>
>>    ##determine coordinates of critical contour
>>    b.full=contourLines(f1,levels=crit.val)
>>    for(c in 1:length(b.full)){
>>      ##This loop is used in case the density is multimodal or if the
>> desired contour
>>      ##  extends outside the plotting region
>>
>> b=list("x"=as.vector(unlist(b.full[[c]][2])),"y"=as.vector(unlist(b.full[[c]][3])))
>>
>>      ##plot desired contour
>>      line.dat<-xspline(b,shape=spline.smooth,open=TRUE,draw=FALSE)
>>      lines(line.dat,lty=line.type[cont.level],lwd=line.width[cont.level])
>>    }
>>  }
>> }
>>
>> ##############################
>> ##Test the function
>> ##############################
>>
>> ##generate data
>> if(testit){
>>  n=10000
>>  a<-list("x"=rnorm(n,400,100),"y"=rweibull(n,2,100))
>>
>> draw.contour(a=a,alpha=c(0.95,0.5,0.05),line.width=c(2,1,2),line.type=c(1,2,1),plot.dens=TRUE,
>> xlab="X", ylab="Y")
>> }
>>
>> Am 03.03.12 19:56, schrieb David Winsemius:
>>>
>>> On Mar 3, 2012, at 12:34 PM, drflxms wrote:
>>>
>>>> Thanx a lot Greg for the hint and letting me not alone with this!
>>>>
>>>> Tried ellipse and it works well. But I am looking for something more
>>>> precise. The ellipse fits the real border to the nearest possible
>>>> ellipse. I want the "real" contour, if possible.
>>>>
>>>> Meanwhile I found an interesting function named draw.contour
>>>> http://quantitative-ecology.blogspot.com/2008/03/plotting-contours.html
>>>> provided by "Forester", an "Assistant Professor in the Department of
>>>> Fisheries". - I love it how statistics brings people from completely
>>>> different specialities together. - I am a neurologist.
>>>>
>>>> This function results not exactly in the same curve, I got with the code
>>>> I posted last, but is in deed very close by (parallel). So I think both
>>>> solutions are probably true and differ only because of rounding and
>>>> method (..!?).
>>>>
>>>> Still I am not completely happy with
>>>>
>>>> 1. the numeric solution of setting to 68% to get 1SD. I'd prefer a
>>>> symbolic, formula driven solution instead.
>>>> 2. me not comprehending the code completely.
>>>>
>>>> While 2. will be hopefully solved soon by me delving into the code, 1
>>>> remains...
>>>>
>>>
>>> You could take a look at answers to similar questions on StackOverflow:
>>>
>>> http://stackoverflow.com/questions/6655268/ellipse-containing-percentage-of-given-points-in-r
>>>
>>>
>>> http://stackoverflow.com/questions/7055848/plot-ellipse-bounding-a-percentage-of-points
>>>
>>>
>>> http://stackoverflow.com/questions/7718569/how-to-plot-90-confidence-bands-with-locfit-in-r
>>>
>>>
>>> http://stackoverflow.com/questions/7511889/ellipse-representing-horizontal-and-vertical-error-bars-with-r
>>>
>>>
>>> Searching on "plot confidence ellipse" in the rhelp archives should be
>>> similarly productive.
>>>
>>> There's also a non-parametric 2d boundary plotting method called a
>>> bagplot and I think it's implemented in a package by the same name.
>>>
>>
> 
> 
>