# [R] Rank and extract data from a series

James Brown jdb33 at hermes.cam.ac.uk
Tue Sep 23 13:50:43 CEST 2003

```I would like to rank a time-series of data, extract the top ten data items
from this series, determine the corresponding row numbers for each value
in the sample, and take a mean of these *row numbers* (not the data).

I would like to do this in R, rather than pre-process the data on the
UNIX command line if possible, as I need to calculate other statistics
for the series.

I understand that I can use 'sort' to order the data, but I am not aware
of a function in R that would allow me to extract a given number of these
data and then determine their positions within the original time series.

e.g.

Time series:

1.0 (row 1)
4.5 (row 2)
2.3 (row 3)
1.0 (row 4)
7.3 (row 5)

Sort would give me:

1.0
1.0
2.3
4.5
7.3

I would then like to extract the top two data items:

4.5
7.3

and determine their positions within the original (unsorted) time series:

4.5 = row 2
7.3 = row 5

then take a mean:

2 and 5 = 3.5

James Brown

___________________________________________

James Brown

Cambridge Coastal Research Unit (CCRU)
Department of Geography
University of Cambridge
Downing Place
Cambridge
CB2 3EN, UK

Telephone: +44 (0)1223 339776
Mobile: 07929 817546
Fax: +44 (0)1223 355674

E-mail: jdb33 at cam.ac.uk
E-mail: james_510 at hotmail.com

http://www.geog.cam.ac.uk/ccru/CCRU.html
___________________________________________

On Wed, 10 Sep 2003, Jerome Asselin wrote:

> On September 10, 2003 04:03 pm, Kevin S. Van Horn wrote:
> >
> > Your method looks like a naive reimplementation of integration, and
> > won't work so well for distributions that have the great majority of the
> > probability mass concentrated in a small fraction of the sample space.
> >  I was hoping for something that would retain the adaptability of
> > integrate().
>
> Yesterday, I've suggested to use approxfun(). Did you consider my
> suggestion? Below is an example.
>
> N <- 500
> x <- rexp(N)
> y <- rank(x)/(N+1)
> empCDF <- approxfun(x,y)
> xvals <- seq(0,4,.01)
> plot(xvals,empCDF(xvals),type="l",
> xlab="Quantile",ylab="Cumulative Distribution Function")
> lines(xvals,pexp(xvals),lty=2)
> legend(2,.4,c("Empirical CDF","Exact CDF"),lty=1:2)
>
>
> It's possible to tune in some parameters in approxfun() to better match
> your personal preferences. Have a look at help(approxfun) for details.
>
> HTH,
> Jerome Asselin
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
>

```