[R] (newbie) Weighted qqplot?

Vivek Satsangi vivek.satsangi at gmail.com
Wed Mar 15 19:38:34 CET 2006


Folks,
I am documenting what I finally did, for the next person who comes along...

Following Dr. Murdoch's suggestion, I looked at qqplot. The following
approach might be helpful to get to the same information as given by
qqplot.
To summarize the ask: given x, y, xw and yw, show (visually is okay)
whether a and b are from the same distribution. xw is the weight of
each x observation and yw is the weight of each y observation.

Put x and xw into a dataframe.
Sort by x.
Calculate cumulative x weights, normalized to total 1.

Put y and yw into a dataframe.
Sort by y
Calculate cumulative weights, normalized to total 1.

Plot x and y against cumulative normalized weights. The shapes of the
two lines should be similar (to the eye)-- or the distribution is
"different".

Vivek

On 3/15/06, Duncan Murdoch <murdoch at stats.uwo.ca> wrote:
> On 3/15/2006 8:31 AM, Vivek Satsangi wrote:
> > Folks,
> > Normally, in a data frame, one observation counts as one observation
> > of the distribution. Thus one can easily produce a CDF and (in Splus
> > atleast) use cdf.compare to compare the CDF (BTW: what is the R
> > equivalent of the SPlus cdf.compare() function, if any?)
> >
> > However, if each point should not count equally, how can I weight the
> > points before comparing the distributions? I was thinking of somehow
> > creating multiple observations for each actual observation based on
> > weights and creating a new dataframe etc. -- but that seem excessive.
> > Surely there is a simpler way?
> >
> >> x <- rnorm(100)
> >> y <- rnorm(10)
> >> xw <- rnorm(100) * 1.73 # The weights. These won't add up to 1 or N or anything because of missing values.
> >> yw <- rnorm(10) * 6.23 # The weights. These won't add up to 1 or to the same number as xw.
> >> # The question to answer is, how can I create a qq plot or cdf compare of x vs. y, weighted by their weights, xw and yw (to eventually figure out if y comes from the population x, similar to Kolmogorov-Smirnov GOF)?
> >> qqplot(x,y) # What now?
>
> qqplot doesn't support weights, but it's a simple enough function that
> you could write a version that did.  Look at the cases where length(x)
> is not equal to length(y):  e.g. if length(y) < length(x), qqplot
> constructs a linear approximation to a function mapping 1:nx onto the
> sorted x values, then takes length(y) evenly spaced values from that
> function.  You want to do the same sort of thing, except that instead of
> even spacing, you want to look at the cumulative sums of the weights.
>
> You might want to use some kind of graphical indicator of whether points
> are heavily weighted or not, but I don't know what to recommend for that.
>
> By the way, your example above will give negative weights in xw and yw;
> you probably won't like the results if you do that.
>
> Duncan Murdoch
>


--
-- Vivek Satsangi
Student, Rochester, NY USA

Life is short, the art long, opportunity fleeting, experiment
treacherous, judgement difficult.




More information about the R-help mailing list