[R] Wilcoxon signed rank test and its requirements

Fri Jun 25 20:55:38 CEST 2010

Let me see if I understand.  You actually have the data for the whole population (the entire piece) but you have some pre-defined sections that you want to see if they differ from the population, or more meaningfully they are different from a randomly selected set of measures.  Is that correct?

If so, since you have the entire population of interest you can create the actual sampling distribution (or a good approximation of it).  Just take random samples from the population of the given size (matching the subset you are interested in) and calculate the means (or other value of interest), probably 10,000 to 1,000,000 samples.  Now compare the value from your predefined subset to the set of random values you generated to see if it is in the tail or not.

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow at imail.org
801.408.8111

> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> project.org] On Behalf Of Atte Tenkanen
> Sent: Thursday, June 24, 2010 11:04 PM
> To: David Winsemius
> Cc: R mailing list
> Subject: Re: [R] Wilcoxon signed rank test and its requirements
> 
> The values come from this kind of process:
> The musical composition is segmented into so-called 'pitch-class
> segments' and these segments are compared with one reference set with a
> distance function. Only some distance values are possible. These
> distance values can be averaged over music bars which produces smoother
> distribution and the 'comparison curve' that illustrates the distances
> according to the reference set through a musical piece result in more
> readable curve (see e.g. http://users.utu.fi/attenka/with6.jpg ), but I
> would prefer to use original values.
> 
> then, I want to pick only some regions from the piece and compare those
> values of those regions, whether they are higher than the mean of all
> values.
> 
> Atte
> 
> > On Jun 24, 2010, at 6:58 PM, Atte Tenkanen wrote:
> >
> > > Is there anything for me?
> > >
> > > There is a lot of data, n=2418, but there are also a lot of ties.
> > > My sample n≈250-300
> > >
> >
> > I do not understand why there should be so many ties. You have not
> > described the measurement process or units. ( ... although you offer
> a
> >
> > glipmse without much background  later.)
> >
> > > i would like to test, whether the mean of the sample differ
> > > significantly from the population mean.
> >
> > Why? What is the purpose of this investigation? Why should the mean
> of
> >
> > a sample be that important?
> >
> > >
> > > The histogram of the population looks like in attached histogram,
> > > what test should I use? No choices?
> > >
> > > This distribution comes from a musical piece and the values are
> > > 'tonal distances'.
> > >
> > > http://users.utu.fi/attenka/Hist.png
> >
> > That picture does not offer much insidght into the features of that
> > measurement. It appears to have much more structure than I would
> > expect for a sample from a smooth unimodal underlying population.
> >
> > --
> > David.
> >
> > >
> > > Atte
> > >
> > >> On 06/24/2010 12:40 PM, David Winsemius wrote:
> > >>>
> > >>> On Jun 23, 2010, at 9:58 PM, Atte Tenkanen wrote:
> > >>>
> > >>>> Thanks. What I have had to ask is that
> > >>>>
> > >>>> how do you test that the data is symmetric enough?
> > >>>> If it is not, is it ok to use some data transformation?
> > >>>>
> > >>>> when it is said:
> > >>>>
> > >>>> "The Wilcoxon signed rank test does not assume that the data are
> > >>>> sampled from a Gaussian distribution. However it does assume
> that
> >
> > >>>> the
> > >>>> data are distributed symmetrically around the median. If the
> > >>>> distribution is asymmetrical, the P value will not tell you much
> >
> > >>>> about
> > >>>> whether the median is different than the hypothetical value."
> > >>>
> > >>> You are being misled. Simply finding a statement on a statistics
> > >>> software website, even one as reputable as Graphpad (???), does
> not
> > >> mean
> > >>> that it is necessarily true. My understanding (confirmed
> reviewing
> > >>> "Nonparametric statistical methods for complete and censored
> data"
> > >> by M.
> > >>> M. Desu, Damaraju Raghavarao, is that the Wilcoxon signed-rank
> test
> > >> does
> > >>> not require that the underlying distributions be symmetric. The
> > >>> above
> > >>> quotation is highly inaccurate.
> > >>>
> > >>
> > >> To add to what David and others have said, look at the kernel that
> >
> > >> the
> > >>
> > >> U-statistic associated with the WSR test uses: the indicator (0/1)
> > of
> > >> xi
> > >> + xj > 0.  So WSR tests H0:p=0.5 where p = the probability that
> the
> > >> average of a randomly chosen pair of values is positive.  [If
> there
> > >> are
> > >> ties this probably needs to be worded as P[xi + xj > 0] = P[xi +
> xj
> > <
> > >>
> > >> 0], i neq j.
> > >>
> > >> Frank
> > >>
> > >> --
> > >> Frank E Harrell Jr   Professor and Chairman        School of
> Medicine
> > >>                      Department of Biostatistics   Vanderbilt
> > >> University
> >
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.