[R] Use of geometric mean .. in good data analysis

John Fox j|ox @end|ng |rom mcm@@ter@c@
Mon Jan 22 18:36:40 CET 2024


Dear Martin,

Helpful general advice, although it's perhaps worth mentioning that the 
geometric mean, defined e.g. naively as prod(x)^(1/length(x)), is 
necessarily 0 if there are any 0 values in x. That is, the geometric 
mean "works" in this case but isn't really informative.

Best,
  John
-- 
John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
web: https://www.john-fox.ca/

On 2024-01-22 12:18 p.m., Martin Maechler wrote:
> Caution: External email.
> 
> 
>>>>>> Rich Shepard
>>>>>>      on Mon, 22 Jan 2024 07:45:31 -0800 (PST) writes:
> 
>      > A statistical question, not specific to R.  I'm asking for
>      > a pointer for a source of definitive descriptions of what
>      > types of data are best summarized by the arithmetic,
>      > geometric, and harmonic means.
> 
> In spite of  off-topic:
> 
> I think it is a good question, not really only about
> geo-chemistry, but about statistics in applied sciences (and
> engineering for that matter).
> 
> Something I sure good applied statisticians in the 1980's and
> 1990's would all know the answer of :
> 
> To use the geometric mean instead of the arithmetic mean
> is basically  *equivalent* to  first log-transform the data
> and then work with that transformed data:
> Not just for computing average, but for more relevant modelling,
> inference, etc.
> 
> John W Tukey (and several other of the grands of the time)
> had the log transform among the  "First aid transformations":
> 
> If the data for a continuous variable must all be positive it is
> also typically the case that the distribution is considerably
> skewed to the right.
> In such a case behave as a good human who sees another human in
> health distress: apply First Aid -- do the things you learned to
> do quickly without too much thought, because things must happen
> fast ---to hopefully save the other's life.
> 
> Here: Do log transform all such variables with further ado,
> and only afterwards start your (exploratory and more) data analysis.
> 
> Now,  mean(log(y)) = log(geometricmean(y)),
> where mean() is the arithmetic mean as in R
> {mathematically; on the computer you need all.equal(), not '==' !!}
> 
> I.e., according to Tukey and all the other experienced applied
> statisticians of the past, the geometric mean is the "best thing"
> to do for such positive right-skewed data   in the same sense
> that the log-transform is the best "a priori" transformation for
> such data -- with the one advantage even that you need to fiddle
> with zeroes when log-transforming, whereas the geometric mean
> works already for zeroes.
> 
> Martin
> 
> 
>      > As an aquatic ecologist I see regulators apply the
>      > geometric mean to geochemical concentrations rather than
>      > using the arithmetic mean. I want to know whether the
>      > geometric mean of a set of chemical concentrations (e.g.,
>      > in mg/L) is an appropriate representation of the expected
>      > value. If not, I want to explain this to non-technical
>      > decision-makers; if so, I want to understand why my
>      > assumption is wrong.
> 
>      > TIA,
> 
>      > Rich
> 
>      > ______________________________________________
>      > R-help using r-project.org mailing list -- To UNSUBSCRIBE and
>      > more, see https://stat.ethz.ch/mailman/listinfo/r-help
>      > PLEASE do read the posting guide
>      > http://www.R-project.org/posting-guide.html and provide
>      > commented, minimal, self-contained, reproducible code.
> 
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list