[R] qqplot for count data

Jean-Christophe BOUËTTÉ jcbouette at gmail.com
Thu Sep 1 16:39:47 CEST 2011


Dear list,
I just tried to do the same thing, and did not find anything on a
weighted qqplot. My weights are actually counts (positive integers).
Here is a modification of qqplot, following Duncan Murdoch's
suggestion. Any feedback would be welcome!

Thanks,
Jean-Christophe

weighted.qqplot <- function (x, y,
plot.it = TRUE, xlab = deparse(substitute(x)),
ylab = deparse(substitute(y)), x.counts=rep(1L,length.out=length(x)),
y.counts=rep(1L,length.out=length(y)), ...){
    sx <- sort(x)
    sy <- sort(y)
    swx <- cumsum(x.counts[order(x)])
    swy <- cumsum(y.counts[order(y)])
    lenx <- length(sx)
    leny <- length(sy)
    sx <- approx(swx, sx, n=min(lenx,leny))$y
    sy <- approx(swy, sy, n=min(lenx,leny))$y
    if (plot.it)
        plot(sx, sy, xlab = xlab, ylab = ylab, ...)
    invisible(list(x = sx, y = sy))
}

#Sample example
n <- 15
a <- runif(n);b <- 1L:length(a);x <- rep(a,b)
c <- runif(n);d <- length(c):1L;y <- rep(c,d)

weighted.qqplot(x,y,type="b")
par(new=TRUE)
weighted.qqplot(a,c,x.counts=b,y.counts=d,type="b",pch="*",col="grey")
par(new=TRUE)
qqplot(x,y,type="b",pch="+",col="red")


From: Duncan Murdoch <murdoch_at_stats.uwo.ca>
Date: Thu 16 Mar 2006 - 05:50:27 EST


On 3/15/2006 1:38 PM, Vivek Satsangi wrote:
> Folks,
> I am documenting what I finally did, for the next person who comes along...
>
> Following Dr. Murdoch's suggestion, I looked at qqplot. The following
> approach might be helpful to get to the same information as given by
> qqplot.
> To summarize the ask: given x, y, xw and yw, show (visually is okay)
> whether a and b are from the same distribution. xw is the weight of
> each x observation and yw is the weight of each y observation.
>
> Put x and xw into a dataframe.
> Sort by x.
> Calculate cumulative x weights, normalized to total 1.
>
> Put y and yw into a dataframe.
> Sort by y
> Calculate cumulative weights, normalized to total 1.
>
> Plot x and y against cumulative normalized weights. The shapes of the
> two lines should be similar (to the eye)-- or the distribution is
> "different".


One variation that would make the result more like a qqplot: you could
work out a vector of weights w (perhaps the cumulative weights from x
or from y or perhaps something else) and plot y(w) versus x(w), where
y(w) and x(w) are the linear interpolation values that approx gives
you.

Duncan Murdoch



More information about the R-help mailing list