[R] Rank and extract data from a series

Liaw, Andy andy_liaw at merck.com
Tue Sep 23 14:13:59 CEST 2003

```Here's one way.  Suppose your "time series" is in a vector called "x".

top10 <- sort(x, decreasing=TRUE)[1:10]
mean.index <- mean(which(x %in% top10))

HTH,
Andy

> -----Original Message-----
> From: James Brown [mailto:jdb33 at hermes.cam.ac.uk]
> Sent: Tuesday, September 23, 2003 7:51 AM
> To: r-help at stat.math.ethz.ch
> Subject: [R] Rank and extract data from a series
>
>
>
> I would like to rank a time-series of data, extract the top
> ten data items from this series, determine the corresponding
> row numbers for each value in the sample, and take a mean of
> these *row numbers* (not the data).
>
> I would like to do this in R, rather than pre-process the
> data on the UNIX command line if possible, as I need to
> calculate other statistics for the series.
>
> I understand that I can use 'sort' to order the data, but I
> am not aware of a function in R that would allow me to
> extract a given number of these data and then determine their
> positions within the original time series.
>
> e.g.
>
> Time series:
>
> 1.0 (row 1)
> 4.5 (row 2)
> 2.3 (row 3)
> 1.0 (row 4)
> 7.3 (row 5)
>
> Sort would give me:
>
> 1.0
> 1.0
> 2.3
> 4.5
> 7.3
>
> I would then like to extract the top two data items:
>
> 4.5
> 7.3
>
> and determine their positions within the original (unsorted)
> time series:
>
> 4.5 = row 2
> 7.3 = row 5
>
> then take a mean:
>
> 2 and 5 = 3.5
>
>
> James Brown
>
> ___________________________________________
>
> James Brown
>
> Cambridge Coastal Research Unit (CCRU)
> Department of Geography
> University of Cambridge
> Downing Place
> Cambridge
> CB2 3EN, UK
>
> Telephone: +44 (0)1223 339776
> Mobile: 07929 817546
> Fax: +44 (0)1223 355674
>
> E-mail: jdb33 at cam.ac.uk
> E-mail: james_510 at hotmail.com
>
> http://www.geog.cam.ac.uk/ccru/CCRU.html
> ___________________________________________
>
>
>
>
>
>
> On Wed, 10 Sep 2003, Jerome Asselin wrote:
>
> > On September 10, 2003 04:03 pm, Kevin S. Van Horn wrote:
> > >
> > > Your method looks like a naive reimplementation of
> integration, and
> > > won't work so well for distributions that have the great
> majority of
> > > the probability mass concentrated in a small fraction of
> the sample
> > > space.  I was hoping for something that would retain the
> > > adaptability of integrate().
> >
> > Yesterday, I've suggested to use approxfun(). Did you consider my
> > suggestion? Below is an example.
> >
> > N <- 500
> > x <- rexp(N)
> > y <- rank(x)/(N+1)
> > empCDF <- approxfun(x,y)
> > xvals <- seq(0,4,.01)
> > plot(xvals,empCDF(xvals),type="l",
> > xlab="Quantile",ylab="Cumulative Distribution Function")
> > lines(xvals,pexp(xvals),lty=2)
> > legend(2,.4,c("Empirical CDF","Exact CDF"),lty=1:2)
> >
> >
> > It's possible to tune in some parameters in approxfun() to better
> > match your personal preferences. Have a look at help(approxfun) for
> > details.
> >
> > HTH,
> > Jerome Asselin
> >
> > ______________________________________________
> > R-help at stat.math.ethz.ch mailing list
> > https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> >
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo> /r-help
>

```