[R] sorting without order

Prof Brian Ripley ripley at stats.ox.ac.uk
Tue Nov 23 14:46:10 CET 2004


Try a `very long' vector as originally specified:

> v <- sample(1:25000, 1e6, TRUE)
> system.time(ix <- sort.list(v, method="radix"), gcFirst=TRUE)
[1] 0.14 0.01 0.15 0.00 0.00
> system.time(x <- sort(v), gcFirst=TRUE)
[1] 0.42 0.02 0.44 0.00 0.00
> system.time(x <- sort(v, method="quick", index.return=TRUE), gcFirst=TRUE)
[1] 0.27 0.03 0.30 0.00 0.00
> system.time(ix <- unlist(split(seq(along=v), v), use.names=FALSE),
+ gcFirst=TRUE)
[1] 1.18 0.11 1.30 0.00 0.00

so sort can be beatened quite easily, even on a level playing field.


On Tue, 23 Nov 2004, Dimitris Rizopoulos wrote:

> Hi Marc,
>
> continuing on Prof. Dalgaard's proposal, you could use:
>
> ix <- unlist(split(seq(along=v), v), use.names=FALSE)
>
> but even with this, `sort()' seems faster if you are interseted only in 
> grouping:
>
> v <- sample(1:25000, 50000, TRUE)
> ######
> system.time(ix <- do.call("c",split(seq(along=v),v)), gcFirst=TRUE)
> [1] 0.13 0.00 0.13   NA   NA
>
> system.time(ix <- unlist(split(seq(along=v), v), use.names=FALSE), 
> gcFirst=TRUE)
> [1] 0.06 0.00 0.07   NA   NA
>
> system.time(x <- sort(v), gcFirst=TRUE)
> [1] 0.01 0.00 0.02   NA   NA
>
>
> I hope it helps.
>
> Best,
> Dimitris
>
> ----
> Dimitris Rizopoulos
> Ph.D. Student
> Biostatistical Centre
> School of Public Health
> Catholic University of Leuven
>
> Address: Kapucijnenvoer 35, Leuven, Belgium
> Tel: +32/16/336899
> Fax: +32/16/337015
> Web: http://www.med.kuleuven.ac.be/biostat
>    http://www.student.kuleuven.ac.be/~m0390867/dimitris.htm
>
>
>
> ----- Original Message ----- From: "Marc Mamin" <M.Mamin at intershop.de>
> To: <r-help at stat.math.ethz.ch>
> Sent: Tuesday, November 23, 2004 10:58 AM
> Subject: [R] sorting without order
>
>
>> Hello,
>> 
>> 
>> In order to increase the performance of a script I'd like to sort very 
>> large vectors containing repeated integer values.
>> I'm not interesting in having the values sorted, but only grouped.
>> I also need the equivalent of index.return from the standard "sort" 
>> function:
>> 
>>  f(c(10,1,10,100,1,10))
>> 
>>  =>
>> 
>>  grouped: c(10,10,10,1,1,100)
>>  ix:   c(1,3,6,2,5,4)
>> 
>> 
>> is there a way to achieve this which would be faster than the standard sort 
>> function?
>> 
>> Thanks for any hints,
>> 
>> Marc Mamin
>> 
>> ______________________________________________
>> R-help at stat.math.ethz.ch mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide! 
>> http://www.R-project.org/posting-guide.html
>> 
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>
>

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595




More information about the R-help mailing list