[R] mode

(Ted Harding) Ted.Harding at nessie.mcc.ac.uk
Sat Dec 13 13:07:20 CET 2003


Douglas Bates wrote:
> "Christian Mora" <christian_mora at vtr.net> writes:
>>How can I get the mode (most frequent value) from a dataset
>>(continuos variables)? I can obtain it when the data is discrete
>>(by making a table and looking at the higher frequency) but I
>>don't know how obtain it from, for example, a density plot of the
>>data. Does anyone know how to do it? Thanks
> 
> 
> I don't think the mode of a sample from a continuous random variable
> is well  defined.

Indeed it is not (except for categorical variables and even then may
not be unique). All other suggestions for finding a mode from a sample
from a continuous distribution encounter the issues that

a) all sample values occur once, each -> every sample value is a mode!

b) the mode of a distribution is the location of the maximum of the
   probability density function, and this is not defined until you
   have stated what the underlying measure is, with respect to
   which you derive the density.

   This issue has its counterpart in that "mode" is not invariant
   under functional transformation, even if you have stated the
   base measure: in other words, the mode of X^2 is not the same
   as (the mode of X)^2.

The implication is that any suggestion for finding a mode from a
sample amounts to computing a density function from the sample
(and a histogram is a kind of discretised density estimate), so
several approaches are open:

1) Choose a class of distributions (e.g. normal), estimate the
   parameters, and find the maximum of the estimated density function
2) Adopt a "local" suggestion such as Murray Jorgensen's "nearest
   neighbour" idea.
3) go down the road of more general distribution-free density
   estimation, for which one approach is a "kernel density
   estimation" and another could be a "spline" density.

Example of kernel density estimation:

  X<-c(rnorm(200),2+0.5*rnorm(300))
  hist(X,freq=FALSE,breaks=(-4)+0.2*(0:50))
  S<-density(X,from=(-4),to=5,bw=0.2)
  N<-length(S$y)
  V1<-S$y[1:(N-2)];V2<-S$y[2:(N-1)];V3<-S$y[3:N]
  ix<-1+which((V1<V2)&(V2>V3))
  lines(S$x,S$y,col="red")
  points(S$x[ix],S$y[ix],col="blue")

where the index ix identifies all the local modes of the fitted
spline density estimate S. These include the global mode[s]

  S$x[which(S$y==maxS$y)]

This seems in fact to come back to Christian Mora's original
question, and I hope it helps to answer it.

(And I hope it helps answer Murray's concern that the thread was
gettin off-topic, since we're now back to R ... !)

Best wishes to all,
Ted.


--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk>
Fax-to-email: +44 (0)870 167 1972
Date: 13-Dec-03                                       Time: 12:07:20
------------------------------ XFMail ------------------------------




More information about the R-help mailing list