[R] OT: a weighted rank-based, non-paired test statistic ?

Dylan Beaudette debeaudette at ucdavis.edu
Tue Jun 9 20:45:39 CEST 2009


On Tuesday 09 June 2009, Torsten Hothorn wrote:
> > Date: Fri, 5 Jun 2009 16:09:42 -0700 (PDT)
> > From: Thomas Lumley <tlumley at u.washington.edu>
> > To: dylan.beaudette at gmail.com
> > Cc: "'r-help at stat.math.ethz.ch'" <r-help at stat.math.ethz.ch>
> > Subject: Re: [R] OT: a weighted rank-based, non-paired test statistic ?
> >
> > On Fri, 5 Jun 2009, Dylan Beaudette wrote:
> >> Is anyone aware of a rank-based, non-paired test such as the
> >> Krustal-Wallis,
> >> that can accommodate weights?
> >
> > You don't say what sort of weights, but basically, no.
> >
> > Whether you have precision weights or sampling weights, the test will no
> > longer be distribution-free.
> >
> >> Alternatively, would it make sense to simulate a dataset by duplicating
> >> observations in proportion to their weight, and then using the
> >> Krustal-Wallis
> >> test?
> >
> > No.
>
> well, if you have case weights, i.e., w[i] == 5 means: there are five
> observations which look exactly like observation i, then there are several
>
> ways to do it:
> > library("coin")
> >
> > set.seed(29)
> > x <- gl(3, 10)
> > y <- rnorm(length(x), mean = c(0, 0, 1)[x])
> > d <- data.frame(y = y, x = x)
> > w <- rep(2, nrow(d)) ### double each obs
> >
> > ### all the same
> > kruskal_test(y ~ x, data = rbind(d, d))
>
>  	Asymptotic Kruskal-Wallis Test
>
> data:  y by x (1, 2, 3)
> chi-squared = 12.1176, df = 2, p-value = 0.002337
>
> > kruskal_test(y ~ x, data = d[rep(1:nrow(d), w),])
>
>  	Asymptotic Kruskal-Wallis Test
>
> data:  y by x (1, 2, 3)
> chi-squared = 12.1176, df = 2, p-value = 0.002337
>
> > kruskal_test(y ~ x, data = d, weights = ~ w)
>
>  	Asymptotic Kruskal-Wallis Test
>
> data:  y by x (1, 2, 3)
> chi-squared = 12.1176, df = 2, p-value = 0.002337
>
> the first two work by duplicating data, the latter one is more memory
> efficient since it computes weighted statistics (and their distribution).
>
> However, as Thomas pointed out, other forms of weights are more difficult
> to deal with.
>
> Best wishes,
>
> Torsten

Thanks Torsten. This looks like the solution I was after.

Cheers,
Dylan

> > 	-thomas
> >
> > Thomas Lumley			Assoc. Professor, Biostatistics
> > tlumley at u.washington.edu	University of Washington, Seattle
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html and provide commented,
> > minimal, self-contained, reproducible code.



-- 
Dylan Beaudette
Soil Resource Laboratory
http://casoilresource.lawr.ucdavis.edu/
University of California at Davis
530.754.7341




More information about the R-help mailing list