[R] How to remove similar successive objects from a vector?

Dimitris Rizopoulos dimitris.rizopoulos at med.kuleuven.be
Wed Aug 16 14:09:48 CEST 2006


----- Original Message ----- 
From: "Gavin Simpson" <gavin.simpson at ucl.ac.uk>
To: <attenka at utu.fi>
Cc: <r-help at stat.math.ethz.ch>
Sent: Wednesday, August 16, 2006 1:21 PM
Subject: Re: [R] How to remove similar successive objects from a 
vector?


> On Wed, 2006-08-16 at 13:01 +0300, Atte Tenkanen wrote:
>> Thanks for all respondents!
>>
>> I wasn't precise enough, when I enclosed my example. In fact, I 
>> need a
>> version which works with all kinds of symbolic data, not only with
>> numbers. So these versions
>>
>> rle(VECTOR)$values
>>
>> and
>>
>> VECTOR=c(3,2,2,3,4,4,5,5,5,3,3,3,5,1,6,6)
>> NEWVECTOR <- ifelse(VECTOR[-length(VECTOR)]==VECTOR[-1],NA,VECTOR)
>> NEWVECTOR[!is.na(NEWVECTOR)]
>
> Note that the above is not giving the same answer as
> rle(VECTOR)$values :
>
>> VECTOR=c(3,2,2,3,4,4,5,5,5,3,3,3,5,1,6,6)
>> NEWVECTOR <- ifelse(VECTOR[-length(VECTOR)]==VECTOR[-1],NA,VECTOR)
>> NEWVECTOR[!is.na(NEWVECTOR)]
> [1] 3 2 3 4 5 3 5 1
>> rle(VECTOR)$values
> [1] 3 2 3 4 5 3 5 1 6
>> all.equal(NEWVECTOR[!is.na(NEWVECTOR)], rle(VECTOR)$values)
> [1] "Numeric: lengths (8, 9) differ"
>
> So make sure you use the rle solution.
>
> G
>


interestingly, if speed matters, then the 2nd and 3rd solutions below 
seem slightly faster than rle():

> x <- rep(c(3,2,2,3,4,4,5,5,5,3,3,3,5,1,6,6), 5000)
>
>
> system.time(for(i in 1:1000) out1 <- rle(x)$values)
[1] 55.44  2.08 57.89    NA    NA
>
>
> system.time(for(i in 1:1000) {
+     nx <- length(x)
+     ind <- c(TRUE, (x[1:(nx-1)] - x[2:nx]) != 0)
+     out2 <- x[ind]
+ })
[1] 27.69  2.28 30.36    NA    NA
>
>
> system.time(for(i in 1:1000) out3 <- x[diff(x) != 0])
[1] 22.30  2.32 24.62    NA    NA
>
>
> all.equal(out1, out2)
[1] TRUE
> all.equal(out1, out3)
[1] TRUE


Best,
Dimitris


----
Dimitris Rizopoulos
Ph.D. Student
Biostatistical Centre
School of Public Health
Catholic University of Leuven

Address: Kapucijnenvoer 35, Leuven, Belgium
Tel: +32/(0)16/336899
Fax: +32/(0)16/337015
Web: http://med.kuleuven.be/biostat/
     http://www.student.kuleuven.be/~m0390867/dimitris.htm


>> answered to my needs.
>>
>> I made a test and the first version was 2.5x faster with my data, 
>> but
>> both works enough fast.
>>
>> Atte
>>
>> On Wed, 2006-08-16 at 08:58 +0100, Patrick Burns wrote:
>> > I think
>> >
>> > rle(VECTOR)$values
>> >
>> > will get you what you want.
>> >
>> > Patrick Burns
>> > patrick at burns-stat.com
>> > +44 (0)20 8525 0696
>> > http://www.burns-stat.com
>> > (home of S Poetry and "A Guide for the Unwilling S User")
>> >
>> > Atte Tenkanen wrote:
>> >
>> > >Is there some (much) more efficient way to do this?
>> > >
>> > >VECTOR=c(3,2,4,5,5,3,3,5,1,6,6);
>> > >NEWVECTOR=VECTOR[1];
>> > >
>> > >for(i in 1:(length(VECTOR)-1))
>> > >{
>> > > if((identical(VECTOR[i], VECTOR[i+1]))==FALSE){
>> > > NEWVECTOR=c(NEWVECTOR,VECTOR[i+1])}
>> > >}
>> > >
>> > >
>> > >
>> > >>VECTOR
>> > >>
>> > >>
>> > > [1] 3 2 4 5 5 3 3 5 1 6 6
>> > >
>> > >
>> > >>NEWVECTOR
>> > >>
>> > >>
>> > >[1] 3 2 4 5 3 5 1 6
>> > >
>> > >_______________________________
>> > >Atte Tenkanen
>> > >University of Turku, Finland
>> > >
>> > >______________________________________________
>> > >R-help at stat.math.ethz.ch mailing list
>> > >https://stat.ethz.ch/mailman/listinfo/r-help
>> > >PLEASE do read the posting guide 
>> > >http://www.R-project.org/posting-guide.html
>> > >and provide commented, minimal, self-contained, reproducible 
>> > >code.
>> > >
>> > >
>> > >
>> > >
>>
>> ______________________________________________
>> R-help at stat.math.ethz.ch mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide 
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> -- 
> %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
> Gavin Simpson                 [t] +44 (0)20 7679 0522
> ECRC & ENSIS, UCL Geography,  [f] +44 (0)20 7679 0565
> Pearson Building,             [e] gavin.simpsonATNOSPAMucl.ac.uk
> Gower Street, London          [w] http://www.ucl.ac.uk/~ucfagls/cv/
> UK. WC1E 6BT.                 [w] http://www.ucl.ac.uk/~ucfagls/
> %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 


Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm



More information about the R-help mailing list