[R] Is it ok to apply the z.test this way?

Greg Snow Greg.Snow at imail.org
Fri Apr 16 21:35:19 CEST 2010


It would help if you could give more detail on what you are trying to accomplish.  You can get boundaries from a dataset using the quantile function, but it is not clear if that is really what you want or not.  Asking about a sample size of 30 implies that you want to do some normal based inference using your data, but you don't say what your ultimate question/goal is. (and 30 is just a rule of thumb, in some cases too conservative, in others too liberal).

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow at imail.org
801.408.8111


> -----Original Message-----
> From: Atte Tenkanen [mailto:attenka at utu.fi]
> Sent: Friday, April 16, 2010 1:22 PM
> To: Greg Snow
> Cc: r-help at r-project.org
> Subject: Re: RE: [R] Is it ok to apply the z.test this way?
> 
> Thanks,
> 
> OK. My question is if there is any reasonable way to find p=0.05
> boundaries for such a random distribution? Unfortunately I'm not
> statistician and thus I'm not sure, if even this question makes
> sense... Should we always consider samples of, say, more than 30
> individuals?
> 
> Atte Tenkanen
> University of Turku, Finland
> Department of Musicology
> +35823335278
> http://users.utu.fi/attenka/
> 
> ----- Original Message -----
> From: Greg Snow <Greg.Snow at imail.org>
> Date: Friday, April 16, 2010 10:07 pm
> Subject: RE: [R] Is it ok to apply the z.test this way?
> To: Atte Tenkanen <attenka at utu.fi>, "r-help at r-project.org" <r-help at r-
> project.org>
> 
> > Several points:
> >
> > 1. The Shapiro test does not tell you that something is normal or
> > highly normal, only that you don't have enough evidence to disprove
> > that the data came from a normal population (powered for a certain
> > type of deviation from normality).
> >
> > 2. The z.test function is intended to be used as a stepping stone in
> > learning for students, a simple test with unrealistic assumptions to
> > get the ideas, then relax the assumptions and learn about t tests and
> > others.
> >
> > 3.  The z test is only used when the population standard deviation is
> > known, you calculate the sd from the data, that is what t tests are
> for.
> >
> > 4.  Calculating the hypothesized mean from the data is backwards.
> >
> > 5.  using a sample size of 1 is questionable, doing this 1,000 times
> > without correction is even more questionable.
> >
> > 6.  Your code is equivalent to:
> >
> > tmp <- seq(0,1, by=0.001)
> > tmp2 <- tmp[ abs(tmp-mean(Distribution))/sd(Distribution) > 1.96 ]
> >
> > just slower and less memory efficient.
> >
> > 7. None of this establishes what is from an unknown distribution.
> >
> > If you can tell us what your real question is, then maybe we can help
> > with a real solution.
> >
> > So to answer your question of if it is ok to use z.test in that way:
> > Leagally the license says you can use it anyway you want,
> > ethically/morally/aesthetically/or following the intent of the
> author,
> > No!
> >
> > --
> > Gregory (Greg) L. Snow Ph.D.
> > Statistical Data Center
> > Intermountain Healthcare
> > greg.snow at imail.org
> > 801.408.8111
> >
> >
> > > -----Original Message-----
> > > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> > > project.org] On Behalf Of Atte Tenkanen
> > > Sent: Friday, April 16, 2010 10:11 AM
> > > To: r-help at r-project.org
> > > Subject: [R] Is it ok to apply the z.test this way?
> > >
> > > Dear R-users,
> > >
> > > I want to check if certain values are from random distribution,
> that
> > > includes values between 0-1. So, it is not really normal even
> though
> > > shapiro.test says it is highly normal... Can I do something like
> this
> > > and think that the values given are right. z.test is from package
> > > TeachingDemos.
> > > -------------------------------------------------------------------
> ----
> > > --------
> > > SelectedVals=c()
> > > for(i in seq(0,1,by=0.001))
> > > {
> > > 	if((z.test(i, mu=mean(Distribution),
> > > stdev=sd(Distribution))$p.value)<=0.05)
> SelectedVals=c(SelectedVals,i)
> > > }
> > >
> > > -------------------------------------------------------------------
> ----
> > > --------
> > > I have marked the border values given by this script to the
> histogram
> > > of the original random distribution:
> > >
> > > http://www.ag.fimug.fi/~Atte/62Hist100410.pdf
> > >
> > > Atte Tenkanen
> > > University of Turku, Finland
> > > Department of Musicology
> > > +35823335278
> > > http://users.utu.fi/attenka/
> > >
> > > ______________________________________________
> > > R-help at r-project.org mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide http://www.R-project.org/posting-
> > > guide.html
> > > and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list