[R] Subset and order

arun smartpink111 at yahoo.com
Sun Jul 7 16:51:05 CEST 2013


Hi,
You could also try ?data.table()
x<- read.table(text="a    b    c
1    2    3
3    3    4
2    4    5
1    3    4
",sep="",header=TRUE)

library(data.table)

xt<- data.table(xt)
 setkey(xt,a)
 subset(xt,b==3)
#   a b c
#1: 1 3 4
#2: 3 3 4



 iord <- order(x$a)
 subset(x[iord, ], b == 3) 
#  a b c
#4 1 3 4
#2 3 3 4


Speed comparison:
set.seed(12345)
dat1<- as.data.frame(matrix(sample(1:10,3*1e7,replace=TRUE),ncol=3))
colnames(dat1)<-letters[1:3]
system.time({
iord <- order(dat1$a)
res1<-subset(dat1[iord, ], b == 3)
})
#  user  system elapsed 
#  6.888   0.296   7.202 

dt1<- data.table(dat1)
system.time({setkey(dt1,a)
    resdt1<-subset(dt1,b==3)})
# user  system elapsed 
#   0.72    0.06    0.78 

head(resdt1)
#   a b  c
#1: 1 3  6
#2: 1 3  4
#3: 1 3 10
#4: 1 3  2
#5: 1 3  9
#6: 1 3  8
 head(res1)
#    a b  c
#75  1 3  6
#93  1 3  4
#300 1 3 10
#301 1 3  2
#437 1 3  9
#672 1 3  8

A.K.
----- Original Message -----
From: Rui Barradas <ruipbarradas at sapo.pt>
To: Noah Silverman <noahsilverman at ucla.edu>
Cc: "R-help at r-project.org" <r-help at r-project.org>
Sent: Friday, July 5, 2013 3:51 PM
Subject: Re: [R] Subset and order

Hello,

If time is one of the problems, precompute an ordered index, and use it 
every time you want the df sorted. But that would mean you can't do it 
in a single operation.

iord <- order(x$a)
subset(x[iord, ], b == 3)


Rui Barradas

Em 05-07-2013 20:47, Noah Silverman escreveu:
> That would work, but is painfully slow.  It forces a new sort of the data with every query.  I have 200,000 rows and need almost a hundred queries.
>
> Thanks,
>
> -N
>
>
> On Jul 5, 2013, at 12:43 PM, Rui Barradas <ruipbarradas at sapo.pt> wrote:
>
>> Hello,
>>
>> Maybe like this?
>>
>> subset(x[order(x$a), ], b == 3)
>>
>>
>> Hope this helps,
>>
>> Rui Barradas
>>
>> Em 05-07-2013 20:33, Noah Silverman escreveu:
>>> Hello,
>>>
>>> I have a data frame with several columns.
>>>
>>> I'd like to select some subset *and* order by another field at the same time.
>>>
>>> Example:
>>>
>>> a    b    c
>>> 1    2    3
>>> 3    3    4
>>> 2    4    5
>>> 1    3    4
>>> etc…
>>>
>>>
>>> I want to select all rows where b=3 and then order by a.
>>>
>>> To subset is easy:  x[x$b==3,]
>>> To order is easy: x[order(x$a),]
>>>
>>> Is there a way to do both in a single efficient statement?
>>>
>>> Thanks,
>>>
>>>
>>>
>>> --
>>> Noah Silverman, M.S., C.Phil
>>> UCLA Department of Statistics
>>> 8117 Math Sciences Building
>>> Los Angeles, CA 90095
>>>
>>>
>>>
>>>
>>>     [[alternative HTML version deleted]]
>>>
>>>
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list