[R] Use of geometric mean .. in good data analysis

Martin Maechler m@ech|er @end|ng |rom @t@t@m@th@ethz@ch
Mon Jan 22 18:18:36 CET 2024


>>>>> Rich Shepard 
>>>>>     on Mon, 22 Jan 2024 07:45:31 -0800 (PST) writes:

    > A statistical question, not specific to R.  I'm asking for
    > a pointer for a source of definitive descriptions of what
    > types of data are best summarized by the arithmetic,
    > geometric, and harmonic means.

In spite of  off-topic:

I think it is a good question, not really only about
geo-chemistry, but about statistics in applied sciences (and
engineering for that matter).

Something I sure good applied statisticians in the 1980's and
1990's would all know the answer of :

To use the geometric mean instead of the arithmetic mean
is basically  *equivalent* to  first log-transform the data
and then work with that transformed data:
Not just for computing average, but for more relevant modelling,
inference, etc.

John W Tukey (and several other of the grands of the time)
had the log transform among the  "First aid transformations":

If the data for a continuous variable must all be positive it is
also typically the case that the distribution is considerably
skewed to the right.
In such a case behave as a good human who sees another human in
health distress: apply First Aid -- do the things you learned to
do quickly without too much thought, because things must happen
fast ---to hopefully save the other's life.

Here: Do log transform all such variables with further ado,
and only afterwards start your (exploratory and more) data analysis.

Now,  mean(log(y)) = log(geometricmean(y)), 
where mean() is the arithmetic mean as in R
{mathematically; on the computer you need all.equal(), not '==' !!}

I.e., according to Tukey and all the other experienced applied
statisticians of the past, the geometric mean is the "best thing" 
to do for such positive right-skewed data   in the same sense
that the log-transform is the best "a priori" transformation for
such data -- with the one advantage even that you need to fiddle
with zeroes when log-transforming, whereas the geometric mean
works already for zeroes.

Martin


    > As an aquatic ecologist I see regulators apply the
    > geometric mean to geochemical concentrations rather than
    > using the arithmetic mean. I want to know whether the
    > geometric mean of a set of chemical concentrations (e.g.,
    > in mg/L) is an appropriate representation of the expected
    > value. If not, I want to explain this to non-technical
    > decision-makers; if so, I want to understand why my
    > assumption is wrong.

    > TIA,

    > Rich

    > ______________________________________________
    > R-help using r-project.org mailing list -- To UNSUBSCRIBE and
    > more, see https://stat.ethz.ch/mailman/listinfo/r-help
    > PLEASE do read the posting guide
    > http://www.R-project.org/posting-guide.html and provide
    > commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list