AW: [R] Rank and extract data from a series

Tue Sep 23 19:44:09 CEST 2003

Using Thomas Unternährer's handy example, one could also do:

 > X <- c(1, 4.5, 2.3, 1, 7.3)
 > mean(order(X, decreasing=TRUE)[1:2])
[1] 3.5
 >

I think this will give the same results as Thomas Unternährer's suggested 
code in almost all cases, but it is perhaps more concise and direct 
(provided that you don't actually need the values of the top items).

(of course you have to change the 1:2 to 1:10 for your needs).

Note that this question gets tricky if there are ties such that there is no 
unique set of row numbers that identify N "top" items.

For example, consider the following data:

 > X <- c(1,3,2,3,4)

Taking "top two", should the answer be 3.5 (avg of row numbers 2 and 5), 
4.5 (avg of row numbers 4 and 5), or 3.666667 (avg of row numbers 2,4 and 5)?

 > mean(order(X, decreasing=TRUE)[1:2])
[1] 3.5
 > order(X, decreasing=TRUE)[1:2]
[1] 5 2
 > # Andy Liaw's suggestion:
 > mean(which(X %in% sort(X, decreasing=TRUE)[1:2]))
[1] 3.666667
 > which(X %in% sort(X, decreasing=TRUE)[1:2])
[1] 2 4 5
 > # Thomas Unternährer's suggestion:
 > mean(match(sort(X, decreasing=TRUE)[1:2], X))
[1] 3.5
 > match(sort(X, decreasing=TRUE)[1:2], X)
[1] 5 2
 >

hope this helps,

Tony Plate

At Tuesday 02:23 PM 9/23/2003 +0200, Unternährer Thomas, uth wrote:

>Hi,
>
> >I would like to rank a time-series of data, extract the top ten data 
> items from this series, determine the
> >corresponding row numbers for each value in the sample, and take a mean 
> of these *row numbers* (not the data).
>
> >I would like to do this in R, rather than pre-process the data on the 
> UNIX command line if possible, as I need to >calculate other statistics 
> for the series.
>
> >I understand that I can use 'sort' to order the data, but I am not aware 
> of a function in R that would allow me
> >to extract a given number of these data and then determine their 
> positions within the original time series.
>
> >e.g.
>
> >Time series:
>
> >1.0 (row 1)
> >4.5 (row 2)
> >2.3 (row 3)
> >1.0 (row 4)
> >7.3 (row 5)
>
> >Sort would give me:
>
> >1.0
> >1.0
> >2.3
> >4.5
> >7.3
>
> >I would then like to extract the top two data items:
>
> >4.5
> >7.3
>
> >and determine their positions within the original (unsorted) time series:
>
> >4.5 = row 2
> >7.3 = row 5
>
> >then take a mean:
>
> >2 and 5 = 3.5
>
> >Thanks in advance.
>
> >James Brown
>
>X <- c(1, 4.5, 2.3, 1, 7.3)
>X1 <- sort(X, decreasing=TRUE)[1:2]
>X2 <- match(X1, X)
>mean(X2)
>
>
>
>Hope this helps
>
>Thomas
>
>
>___________________________________________
>
>James Brown
>
>Cambridge Coastal Research Unit (CCRU)
>Department of Geography
>University of Cambridge
>Downing Place
>Cambridge
>CB2 3EN, UK
>
>Telephone: +44 (0)1223 339776
>Mobile: 07929 817546
>Fax: +44 (0)1223 355674
>
>E-mail: jdb33 at cam.ac.uk
>E-mail: james_510 at hotmail.com
>
>http://www.geog.cam.ac.uk/ccru/CCRU.html
>___________________________________________
>
>
>
>
>
>
>On Wed, 10 Sep 2003, Jerome Asselin wrote:
>
> > On September 10, 2003 04:03 pm, Kevin S. Van Horn wrote:
> > >
> > > Your method looks like a naive reimplementation of integration, and
> > > won't work so well for distributions that have the great majority of
> > > the probability mass concentrated in a small fraction of the sample
> > > space.  I was hoping for something that would retain the
> > > adaptability of integrate().
> >
> > Yesterday, I've suggested to use approxfun(). Did you consider my
> > suggestion? Below is an example.
> >
> > N <- 500
> > x <- rexp(N)
> > y <- rank(x)/(N+1)
> > empCDF <- approxfun(x,y)
> > xvals <- seq(0,4,.01)
> > plot(xvals,empCDF(xvals),type="l",
> > xlab="Quantile",ylab="Cumulative Distribution Function")
> > lines(xvals,pexp(xvals),lty=2)
> > legend(2,.4,c("Empirical CDF","Exact CDF"),lty=1:2)
> >
> >
> > It's possible to tune in some parameters in approxfun() to better
> > match your personal preferences. Have a look at help(approxfun) for
> > details.
> >
> > HTH,
> > Jerome Asselin
> >
> > ______________________________________________
> > R-help at stat.math.ethz.ch mailing list
> > https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> >
>
>______________________________________________
>R-help at stat.math.ethz.ch mailing list 
>https://www.stat.math.ethz.ch/mailman/listinfo/r-help
>
>______________________________________________
>R-help at stat.math.ethz.ch mailing list
>https://www.stat.math.ethz.ch/mailman/listinfo/r-help

Tony Plate   tplate at acm.org