[R] contour for plotting confidence interval on scatter plot of bivariate normal distribution
David Winsemius
dwinsemius at comcast.net
Sun Mar 4 02:19:16 CET 2012
I do agree with the recommendation to consider car::dataEllipse() for
plotting a best fit for the theoretical bivariate Normal situation.
However, the bagplot function from Has Peter Wold: http://www.wiwi.uni-bielefeld.de/~wolf/#software
is more likely to "line up" and be suitable for drawing "contours"
on with a two-dimensional kernel density estimate for a 2d ECDF in a
distribution-agnostic analysis. I had never thought of trying to
integrate its output with an `rgl` representation of a 2d density, but
it seems like an interesting and perhaps challenging exercise.
The real question is how much prior assumptions and knowledge about
distributions in nature and the joint relationships of the variables
under consideration should constrain the efforts of the OP regarding
a graphical representation. In medical situation, my experience is
that log-normal or Gamma distributions are more common than Normal
ones. This is in part because preclinical and clinical disease will
skew measures.
(And I do thank drflxms for initiating a very interesting discussion.
I share his belief that graphical representations are important tools.)
--
David.
On Mar 3, 2012, at 7:10 PM, Michael Friendly wrote:
> On 3/3/2012 12:34 PM, drflxms wrote:
>> Thanx a lot Greg for the hint and letting me not alone with this!
>>
>> Tried ellipse and it works well. But I am looking for something more
>> precise. The ellipse fits the real border to the nearest possible
>> ellipse. I want the "real" contour, if possible.
>
> The 'contours' you are talking about here are estimates of the level
> sets of your bivariate distribution with varying density. There is
> no "real" contour -- only what you can estimate from your data.
>
> Following
> your subject line, "plotting confidence interval on scatter plot of
> bivariate normal distribution", the most straight-forward approach
> is in terms of a data ellipse, e.g.
>
> library(car)
> dataEllipse(x,y, levels=c(0.40, 0.68))
>
> where the ellipses plotted have the property that the 0.40 coverage
> data ellipse has univariate mean +- 1SD shadows on the X,Y axes,
> while the 0.68 ellipse has bivariate mean +- 1SD shadows on those
> axes. This is a precise answer to the question you posed, and others
> have supplied other versions, but I think car::dataEllipse() is the
> most
> generally useful, but this depends on the assumption that your data
> are at least approximately bivariate normal.
>
>
>>
>> Meanwhile I found an interesting function named draw.contour
>> http://quantitative-ecology.blogspot.com/2008/03/plotting-contours.html
>> provided by "Forester", an "Assistant Professor in the Department of
>> Fisheries". - I love it how statistics brings people from completely
>> different specialities together. - I am a neurologist.
>
> Similar plots of empirical contours of density for (x,y) can be
> obtained
> with contour() on the result of MASS::kde2d() and
> KernSmooth::bdke2D().
> These may seem to you to be "more precise", but then they don't give
> a way to estimate confidence regions with any coverage. Take your
> choice.
>
>>
>> This function results not exactly in the same curve, I got with the
>> code
>> I posted last, but is in deed very close by (parallel). So I think
>> both
>> solutions are probably true and differ only because of rounding and
>> method (..!?).
>>
>> Still I am not completely happy with
>>
>> 1. the numeric solution of setting to 68% to get 1SD. I'd prefer a
>> symbolic, formula driven solution instead.
>> 2. me not comprehending the code completely.
>>
>> While 2. will be hopefully solved soon by me delving into the code, 1
>> remains...
>>
>> Best, Felix
>>
>>
>> Am 03.03.12 17:28, schrieb Greg Snow:
>>> Look at the ellipse package (and the ellipse function in the
>>> package)
>>> for a simple way of showing a confidence region for bivariate data
>>> on
>>> a plot (a 68% confidence interval is about 1 SD if you just want to
>>> show 1 SD).
>>>
>>> On Sat, Mar 3, 2012 at 7:54 AM, drflxms<drflxms at googlemail.com>
>>> wrote:
>>>> Dear all,
>>>>
>>>> I created a bivariate normal distribution:
>>>>
>>>> set.seed(138813)
>>>> n<-100
>>>> x<-rnorm(n); y<-rnorm(n)
>>>>
>>>> and plotted a scatterplot of it:
>>>>
>>>> plot(x,y)
>>>>
>>>> Now I'd like to add the 2D-standard deviation.
>>>>
>>>> I found a thread regarding plotting arbitrary confidence
>>>> boundaries from
>>>> Pascal Hänggi
>>>> http://www.mail-archive.com/r-help@r-project.org/msg24013.html
>>>> which cites the even older thread
>>>> http://tolstoy.newcastle.edu.au/R/help/03b/5384.html
>>>>
>>>> As I am unfortunately only a very poor R programmer, the code of
>>>> Pascal
>>>> Hänggi is a myth to me and I am not sure whether I was able to
>>>> translate
>>>> the recommendation of Brain Ripley in the later thread (which
>>>> provides
>>>> no code) into the the correct R code. Brain wrote:
>>>>
>>>> You need a 2D density estimate (e.g. kde2d in MASS) then compute
>>>> the
>>>> density values at the points and draw the contour of the density
>>>> which
>>>> includes 95% of the points (at a level computed from the sorted
>>>> values
>>>> via quantile()). [95% confidence interval was desired in thread
>>>> instead
>>>> of standard deviation...]
>>>>
>>>> So I tried this...
>>>>
>>>> den<-kde2d(x, y, n=n) #as I chose n to be the same as during
>>>> creating
>>>> the distributions x and y (see above), a z-value is assigned to
>>>> every
>>>> combination of x and y.
>>>>
>>>> # create a sorted vector of z-values (instead of the matrix stored
>>>> inside the den object
>>>> den.z<-sort(den$z)
>>>>
>>>> # set desired confidence border to draw and store it in variable
>>>> confidence.border<- quantile(den.z, probs=0.6827, na.rm = TRUE)
>>>>
>>>> # draw a line representing confidence.border on the existing
>>>> scatterplot
>>>> par(new=TRUE)
>>>> contour(den, levels=confidence.border, col = "red", add = TRUE)
>>>>
>>>> Unfortunately I doubt very much this is correct :( In fact I am
>>>> sure
>>>> this is wrong, because the border for probs=0.05 is drawn outside
>>>> the
>>>> values.... So please help and check.
>>>> Pascal Hänggis code seems to work, but I don't understand the
>>>> magic he
>>>> does with
>>>>
>>>> pp<- array()
>>>> for (i in 1:1000){
>>>> z.x<- max(which(den$x< x[i]))
>>>> z.y<- max(which(den$y< y[i]))
>>>> pp[i]<- den$z[z.x, z.y]
>>>> }
>>>>
>>>> before doing the very same as I did above:
>>>>
>>>> confidencebound<- quantile(pp, 0.05, na.rm = TRUE)
>>>>
>>>> plot(x, y)
>>>> contour(den, levels = confidencebound, col = "red", add = TRUE)
>>>>
>>>>
>>>> My problems:
>>>>
>>>> 1.) setting probs=0.6827 is somehow a dirty trick which I can
>>>> only use
>>>> by simply knowing that this is the percentage of values inside
>>>> +-1sd
>>>> when a distribution is normal. Is there a way doing this with
>>>> "native"
>>>> sd function?
>>>> sd(den.z) is not correct, as den.z is in contrast to x and y not
>>>> normal
>>>> any more. So ecdf(den.z)(sd(den.z)) results in a percentile of
>>>> 0.5644 in
>>>> this example instead of the desired 0.6827.
>>>>
>>>> 2.) I would like to have code that works with any desired
>>>> confidence.
>>>> Unfortunately setting probs to the desired confidence would
>>>> probably be
>>>> wrong (?!) as it relates to den.z instead of x and y, which are the
>>>> underlying distributions I am interested in. To put it short I
>>>> want the
>>>> confidence of x/y and not of den.z.
>>>>
>>>>
>>>> I am really completely stuck. Please help me out of this! Felix
>>>>
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>
>>>
>>>
>>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
West Hartford, CT
More information about the R-help
mailing list