[R] Rank and extract data from a series

Tue Sep 23 14:13:59 CEST 2003

Here's one way.  Suppose your "time series" is in a vector called "x".

top10 <- sort(x, decreasing=TRUE)[1:10]
mean.index <- mean(which(x %in% top10))

HTH,
Andy

> -----Original Message-----
> From: James Brown [mailto:jdb33 at hermes.cam.ac.uk] 
> Sent: Tuesday, September 23, 2003 7:51 AM
> To: r-help at stat.math.ethz.ch
> Subject: [R] Rank and extract data from a series
> 
> 
> 
> I would like to rank a time-series of data, extract the top 
> ten data items from this series, determine the corresponding 
> row numbers for each value in the sample, and take a mean of 
> these *row numbers* (not the data).
> 
> I would like to do this in R, rather than pre-process the 
> data on the UNIX command line if possible, as I need to 
> calculate other statistics for the series.
> 
> I understand that I can use 'sort' to order the data, but I 
> am not aware of a function in R that would allow me to 
> extract a given number of these data and then determine their 
> positions within the original time series.
> 
> e.g.
> 
> Time series:
> 
> 1.0 (row 1)
> 4.5 (row 2)
> 2.3 (row 3)
> 1.0 (row 4)
> 7.3 (row 5)
> 
> Sort would give me:
> 
> 1.0
> 1.0
> 2.3
> 4.5
> 7.3
> 
> I would then like to extract the top two data items:
> 
> 4.5
> 7.3
> 
> and determine their positions within the original (unsorted) 
> time series:
> 
> 4.5 = row 2
> 7.3 = row 5
> 
> then take a mean:
> 
> 2 and 5 = 3.5
> 
> Thanks in advance.
> 
> James Brown
> 
> ___________________________________________
> 
> James Brown
> 
> Cambridge Coastal Research Unit (CCRU)
> Department of Geography
> University of Cambridge
> Downing Place
> Cambridge
> CB2 3EN, UK
> 
> Telephone: +44 (0)1223 339776
> Mobile: 07929 817546
> Fax: +44 (0)1223 355674
> 
> E-mail: jdb33 at cam.ac.uk
> E-mail: james_510 at hotmail.com
> 
> http://www.geog.cam.ac.uk/ccru/CCRU.html
> ___________________________________________
> 
> 
> 
> 
> 
> 
> On Wed, 10 Sep 2003, Jerome Asselin wrote:
> 
> > On September 10, 2003 04:03 pm, Kevin S. Van Horn wrote:
> > >
> > > Your method looks like a naive reimplementation of 
> integration, and 
> > > won't work so well for distributions that have the great 
> majority of 
> > > the probability mass concentrated in a small fraction of 
> the sample 
> > > space.  I was hoping for something that would retain the 
> > > adaptability of integrate().
> >
> > Yesterday, I've suggested to use approxfun(). Did you consider my 
> > suggestion? Below is an example.
> >
> > N <- 500
> > x <- rexp(N)
> > y <- rank(x)/(N+1)
> > empCDF <- approxfun(x,y)
> > xvals <- seq(0,4,.01)
> > plot(xvals,empCDF(xvals),type="l",
> > xlab="Quantile",ylab="Cumulative Distribution Function")
> > lines(xvals,pexp(xvals),lty=2)
> > legend(2,.4,c("Empirical CDF","Exact CDF"),lty=1:2)
> >
> >
> > It's possible to tune in some parameters in approxfun() to better 
> > match your personal preferences. Have a look at help(approxfun) for 
> > details.
> >
> > HTH,
> > Jerome Asselin
> >
> > ______________________________________________
> > R-help at stat.math.ethz.ch mailing list 
> > https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> >
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list 
> https://www.stat.math.ethz.ch/mailman/listinfo> /r-help
>