[R] Is it ok to apply the z.test this way?

Sat Apr 17 09:32:29 CEST 2010

Hi,

I thank you all who answered to my question. I think I learned a lot although there still remain things and concepts I have to ruminate. To the questions about the plots:

In this case, I have segmented music into so-called pitch-class sets and further transformed them to 'set classes', equivalence classes, a concept which comes from the musical set theory. These classes have been compared to some comparison set class using a similarity function. Let's think that this csc is the diatonic scale (similar to white keys in the piano), if there is a diatonic segment in the piece under study, this segment gets the value of 1. Those segments which are very different in their nature (according to the similarity function) get values nearer to 0.

I can detect rhythms, tonalities  etc, and their combinations as well. I call this method Comparison Structure Analysis or C. Set A.

I have used R for many years as a programming environment in my research but not its statistical capabilies. Henceforth, I have to focus more on statistics.

Atte

> It would help if you could give more detail on what you are trying to 
> accomplish.  You can get boundaries from a dataset using the quantile 
> function, but it is not clear if that is really what you want or not.  
> Asking about a sample size of 30 implies that you want to do some 
> normal based inference using your data, but you don't say what your 
> ultimate question/goal is. (and 30 is just a rule of thumb, in some 
> cases too conservative, in others too liberal).
> 
> -- 
> Gregory (Greg) L. Snow Ph.D.
> Statistical Data Center
> Intermountain Healthcare
> greg.snow at imail.org
> 801.408.8111
> 
> 
> > -----Original Message-----
> > From: Atte Tenkanen [mailto:attenka at utu.fi]
> > Sent: Friday, April 16, 2010 1:22 PM
> > To: Greg Snow
> > Cc: r-help at r-project.org
> > Subject: Re: RE: [R] Is it ok to apply the z.test this way?
> > 
> > Thanks,
> > 
> > OK. My question is if there is any reasonable way to find p=0.05
> > boundaries for such a random distribution? Unfortunately I'm not
> > statistician and thus I'm not sure, if even this question makes
> > sense... Should we always consider samples of, say, more than 30
> > individuals?
> > 
> > Atte Tenkanen
> > University of Turku, Finland
> > Department of Musicology
> > +35823335278
> > http://users.utu.fi/attenka/
> > 
> > ----- Original Message -----
> > From: Greg Snow <Greg.Snow at imail.org>
> > Date: Friday, April 16, 2010 10:07 pm
> > Subject: RE: [R] Is it ok to apply the z.test this way?
> > To: Atte Tenkanen <attenka at utu.fi>, "r-help at r-project.org" <r-help at r-
> > project.org>
> > 
> > > Several points:
> > >
> > > 1. The Shapiro test does not tell you that something is normal or
> > > highly normal, only that you don't have enough evidence to disprove
> > > that the data came from a normal population (powered for a certain
> > > type of deviation from normality).
> > >
> > > 2. The z.test function is intended to be used as a stepping stone 
> in
> > > learning for students, a simple test with unrealistic assumptions 
> to
> > > get the ideas, then relax the assumptions and learn about t tests 
> and
> > > others.
> > >
> > > 3.  The z test is only used when the population standard deviation 
> is
> > > known, you calculate the sd from the data, that is what t tests are
> > for.
> > >
> > > 4.  Calculating the hypothesized mean from the data is backwards.
> > >
> > > 5.  using a sample size of 1 is questionable, doing this 1,000 times
> > > without correction is even more questionable.
> > >
> > > 6.  Your code is equivalent to:
> > >
> > > tmp <- seq(0,1, by=0.001)
> > > tmp2 <- tmp[ abs(tmp-mean(Distribution))/sd(Distribution) > 1.96 ]
> > >
> > > just slower and less memory efficient.
> > >
> > > 7. None of this establishes what is from an unknown distribution.
> > >
> > > If you can tell us what your real question is, then maybe we can help
> > > with a real solution.
> > >
> > > So to answer your question of if it is ok to use z.test in that way:
> > > Leagally the license says you can use it anyway you want,
> > > ethically/morally/aesthetically/or following the intent of the
> > author,
> > > No!
> > >
> > > --
> > > Gregory (Greg) L. Snow Ph.D.
> > > Statistical Data Center
> > > Intermountain Healthcare
> > > greg.snow at imail.org
> > > 801.408.8111
> > >
> > >
> > > > -----Original Message-----
> > > > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> > > > project.org] On Behalf Of Atte Tenkanen
> > > > Sent: Friday, April 16, 2010 10:11 AM
> > > > To: r-help at r-project.org
> > > > Subject: [R] Is it ok to apply the z.test this way?
> > > >
> > > > Dear R-users,
> > > >
> > > > I want to check if certain values are from random distribution,
> > that
> > > > includes values between 0-1. So, it is not really normal even
> > though
> > > > shapiro.test says it is highly normal... Can I do something like
> > this
> > > > and think that the values given are right. z.test is from package
> > > > TeachingDemos.
> > > > -------------------------------------------------------------------
> > ----
> > > > --------
> > > > SelectedVals=c()
> > > > for(i in seq(0,1,by=0.001))
> > > > {
> > > > 	if((z.test(i, mu=mean(Distribution),
> > > > stdev=sd(Distribution))$p.value)<=0.05)
> > SelectedVals=c(SelectedVals,i)
> > > > }
> > > >
> > > > -------------------------------------------------------------------
> > ----
> > > > --------
> > > > I have marked the border values given by this script to the
> > histogram
> > > > of the original random distribution:
> > > >
> > > > http://www.ag.fimug.fi/~Atte/62Hist100410.pdf
> > > >
> > > > Atte Tenkanen
> > > > University of Turku, Finland
> > > > Department of Musicology
> > > > +35823335278
> > > > http://users.utu.fi/attenka/
> > > >
> > > > ______________________________________________
> > > > R-help at r-project.org mailing list
> > > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > > PLEASE do read the posting guide http://www.R-project.org/posting-
> > > > guide.html
> > > > and provide commented, minimal, self-contained, reproducible code.