AW: [R] Rank and extract data from a series

"Unternährer Thomas, uth" uth at zhwin.ch
Tue Sep 23 14:23:48 CEST 2003


Hi,



>I would like to rank a time-series of data, extract the top ten data items from this series, determine the 
>corresponding row numbers for each value in the sample, and take a mean of these *row numbers* (not the data).

>I would like to do this in R, rather than pre-process the data on the UNIX command line if possible, as I need to >calculate other statistics for the series.

>I understand that I can use 'sort' to order the data, but I am not aware of a function in R that would allow me 
>to extract a given number of these data and then determine their positions within the original time series.

>e.g.

>Time series:

>1.0 (row 1)
>4.5 (row 2)
>2.3 (row 3)
>1.0 (row 4)
>7.3 (row 5)

>Sort would give me:

>1.0
>1.0
>2.3
>4.5
>7.3

>I would then like to extract the top two data items:

>4.5
>7.3

>and determine their positions within the original (unsorted) time series:

>4.5 = row 2
>7.3 = row 5

>then take a mean:

>2 and 5 = 3.5

>Thanks in advance.

>James Brown

X <- c(1, 4.5, 2.3, 1, 7.3)
X1 <- sort(X, decreasing=TRUE)[1:2]
X2 <- match(X1, X)
mean(X2)



Hope this helps

Thomas


___________________________________________

James Brown

Cambridge Coastal Research Unit (CCRU)
Department of Geography
University of Cambridge
Downing Place
Cambridge
CB2 3EN, UK

Telephone: +44 (0)1223 339776
Mobile: 07929 817546
Fax: +44 (0)1223 355674

E-mail: jdb33 at cam.ac.uk
E-mail: james_510 at hotmail.com

http://www.geog.cam.ac.uk/ccru/CCRU.html
___________________________________________






On Wed, 10 Sep 2003, Jerome Asselin wrote:

> On September 10, 2003 04:03 pm, Kevin S. Van Horn wrote:
> >
> > Your method looks like a naive reimplementation of integration, and 
> > won't work so well for distributions that have the great majority of 
> > the probability mass concentrated in a small fraction of the sample 
> > space.  I was hoping for something that would retain the 
> > adaptability of integrate().
>
> Yesterday, I've suggested to use approxfun(). Did you consider my 
> suggestion? Below is an example.
>
> N <- 500
> x <- rexp(N)
> y <- rank(x)/(N+1)
> empCDF <- approxfun(x,y)
> xvals <- seq(0,4,.01)
> plot(xvals,empCDF(xvals),type="l",
> xlab="Quantile",ylab="Cumulative Distribution Function")
> lines(xvals,pexp(xvals),lty=2)
> legend(2,.4,c("Empirical CDF","Exact CDF"),lty=1:2)
>
>
> It's possible to tune in some parameters in approxfun() to better 
> match your personal preferences. Have a look at help(approxfun) for 
> details.
>
> HTH,
> Jerome Asselin
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list 
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
>

______________________________________________
R-help at stat.math.ethz.ch mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help




More information about the R-help mailing list