[R] Diff btw percentile and quantile

Wed Mar 4 18:29:52 CET 2009

On 04-Mar-09 16:56:14, Wacek Kusnierczyk wrote:
> (Ted Harding) wrote:
> <snip>
>> So, with reference to your original question
>>  "Excel has percentile() function. R function quantile() does the
>>   same thing. Is there any significant difference btw percentile
>>   and quantile?"
>> the answer is that they in effect give the same results, though
>> differ with respect to how they are to be fed (quantile eats
>> probabilities, percentile eats percentages). [Though (since I am
>> not familiar with Excel) I cannot rule out that Excel's percentile()
>> function also eats probabilities; in which case its name would be
>> an example of sloppy nomenclature on Excel's part; which I cannot
>> rule out on general grounds either].
> 
> i am not familiar enough with excel to prove or disprove what you say
> above, but in general such claims should be grounded in the respective
> documentations. 
> 
> there are a number of ways to compute empirical quantiles (see, e.g.,
> [1]), and it's possible that the one used by r's quantile by default
> (see ?quantile) is not the one used by excel (where you probably have
> no choice;  help in oocalc does not specify the method, and i guess
> that excel's does not either).
> 
> have you actually confirmed that excel's percentile() does the same as
> r's quantile() (modulo the scaling)?
> vQ

I have now googled around a bit. All references to the Excel
percentile() function say that you feed it the fractional value
corresponding to the percentage. So, for example, to get the
80-th percentile you would give it 0.8. Hence Excel should call
it "quantile"!

As to the algorithm, Wikipedia states the following (translated
into R syntax):

  Many software packages, such as Microsoft Excel, use the
  following method recommended by NIST[4] to estimate the
  value, vp, of the pth percentile of an ascending ordered
  dataset containing N elements with values v[1],v[2],...,v[N]:

    n = (p/100)*(N-1) + 1

  n is then split into its integer component, k and decimal
  component, d, such that n = k + d.
  If k = 1, then the value for that percentile, vp, is the
  first member of the ordered dataset, v[1].
  If k = N, then the value for that percentile, vp, is the
  Nth member of the ordered dataset, v[N].
  Otherwise, 1 < k < N and vp = v[k] + d*(v[k + 1] - v[k]).

Note that the Wikipedia article uses the "%" interpretation of
"p-th percentile", i.e. the point which is (p/100) of the way
along the distribution.

It looks as though R's quantile with type=4 might be the same,
since it is explained as "linear interpolation of the empirical
cdf", which is what the above description of Excel's method does.
However, R's default type is 7, which is different.

Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 04-Mar-09                                       Time: 17:29:50
------------------------------ XFMail ------------------------------