[R] population variance and sample variance

Bert Gunter gunter.berton at gene.com
Thu Feb 4 22:11:38 CET 2010


 Well, a perverse view: ;-)

If using n vs n-1 makes a difference in the results, then you have too
little data (more properly, error df) to say much about the variance anyway:
n vs n-1 is the least of your problems. 

Otherwise, choose whichever you're in the mood for. Just state which for
reproducibility's sake.

In other words, why waste any time or energy on such a pointless discussion?


Bert Gunter
Genentech Nonclinical Statistics


-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On
Behalf Of Greg Snow
Sent: Thursday, February 04, 2010 9:59 AM
To: Ista Zahn; Peng Yu
Cc: r-help at stat.math.ethz.ch
Subject: Re: [R] population variance and sample variance

Probably not a typo, but a different textbook used originally.  Statistics
is still a relatively young science, so we have not settled on a single set
of notation/symbols/jargon yet (look at intro textbooks, is p the population
proportion (with p-hat the sample) or is p the sample proportion (with pi as
the population)?

I originally learned that dividing by n gives the 'population' variance
since if you have the entire population then mu is known exactly and you do
not need to correct for unknown mu.  You should only divide by n when you
have the entire population.  When you have a sample you need to divide by
n-1 to adjust for using the sample mean.

So from that I learned: population-divide by n; sample-divide by n-1.

But I have seen others use the approach of dividing a sample sum of squares
by n gives the variance of the sample data, but dividing by n-1 gives the
estimate of the population variance.

So from that thinking: population-divide by n-1; sample-divide by n.

Both make sense, so to be clear it is best to just state the divisor rather
than using terms like population and sample and expecting to be unambiguous.

I have also seen them referred to as unbiased (n-1) and maximum likelihood
(n), but these are not perfect descriptors once you start talking about
standard deviations rather than variances.

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow at imail.org
801.408.8111


> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> project.org] On Behalf Of Ista Zahn
> Sent: Tuesday, February 02, 2010 12:03 PM
> To: Peng Yu
> Cc: r-help at stat.math.ethz.ch
> Subject: Re: [R] population variance and sample variance
> 
> Probably a simple typo, but just to keep things straight: you want to
> divide by n when describing the standard deviation of a sample, and
> divide by n-1 when estimating a population standard deviation (your
> initial description had it backwards I think).
> 
> On Tue, Feb 2, 2010 at 5:25 PM, Peng Yu <pengyu.ut at gmail.com> wrote:
> > On Mon, Oct 19, 2009 at 12:53 PM, Kingsford Jones
> > <kingsfordjones at gmail.com> wrote:
> >>> sum((x-mean(x))^2)/(n)
> >> [1] 0.4894708
> >>> ((n-1)/n) * var(x)
> >> [1] 0.4894708
> >
> > But this is not a built-in function in R to do so, right?
> >
> >> hth,
> >> Kingsford
> >>
> >> On Mon, Oct 19, 2009 at 9:30 AM, Peng Yu <pengyu.ut at gmail.com>
> wrote:
> >>> It seems that var() computes sample variance. It is straight
> forward
> >>> to compute population variance from sample variance. However, I
> feel
> >>> that it is still convenient to have a function that can compute
> >>> population variance. Is there a population variance function
> available
> >>> in R?
> >>>
> >>> $ Rscript var.R
> >>>> set.seed(0)
> >>>> n = 4
> >>>> x = rnorm(n)
> >>>> var(x)
> >>> [1] 0.6526278
> >>>> sum((x-mean(x))^2)/(n-1)
> >>> [1] 0.6526278
> >>>>
> >>>
> >>> ______________________________________________
> >>> R-help at r-project.org mailing list
> >>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> >>> and provide commented, minimal, self-contained, reproducible code.
> >>>
> >>
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
> 
> 
> 
> --
> Ista Zahn
> Graduate student
> University of Rochester
> Department of Clinical and Social Psychology
> http://yourpsyche.org
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list