[R] duplicate values

Sun Nov 16 19:43:03 CET 2008

Is the question 'duplicated next to each other' or 'duplicated anywhere 
later'?  I read it as the latter, so would use

dup <- duplicated(x$dt)

or

dup <- duplicated(x[c("Date", "time")]

Also, be very careful as Date-time values like this can be duplicated and 
refer to different times on days when DST ends.  E.g. there are both

"2008-10-26 02:30:00 CEST"
"2008-10-26 02:30:00 CET"

in the timezone of Germany (at least with the names my system gives me in 
English).

On Sun, 16 Nov 2008, jim holtman wrote:

> This should do it for you:
>
>> x <- read.table(textConnection(         "Date time                      Temperature
> + 1        2008-6-1 00:00:00      5
> + 2        2008-6-1 02:00:00      5
> + 3        2008-6-1 03:00:00      6
> + 4        2008-6-1 03:00:00      0
> + 5        2008-6-1 04:00:00      6
> + 6        2008-6-1 04:00:00      0
> + 7        2008-6-1 05:00:00      7
> + 8        2008-6-1 06:00:00      7"), header=TRUE)
>> closeAllConnections()
>> # create datetime
>> x$dt <- as.POSIXct(paste(x$Date, x$time))
>> # create list of duplicate values next to each other
>> dup <- c(FALSE, diff(x$dt) == 0)
>> # remove
>> x[!dup,]
>      Date     time Temperature                  dt
> 1 2008-6-1 00:00:00           5 2008-06-01 00:00:00
> 2 2008-6-1 02:00:00           5 2008-06-01 02:00:00
> 3 2008-6-1 03:00:00           6 2008-06-01 03:00:00
> 5 2008-6-1 04:00:00           6 2008-06-01 04:00:00
> 7 2008-6-1 05:00:00           7 2008-06-01 05:00:00
> 8 2008-6-1 06:00:00           7 2008-06-01 06:00:00
>
>
> On Sun, Nov 16, 2008 at 1:10 PM, Antje Nöthlich <antno at web.de> wrote:
>> Hei R Users,
>>
>> i have the following dataframe:
>>
>>          Datetime                      Temperature             and many more collumns
>> 1        2008-6-1 00:00:00      5
>> 2        2008-6-1 02:00:00      5
>> 3        2008-6-1 03:00:00      6
>> 4        2008-6-1 03:00:00      0
>> 5        2008-6-1 04:00:00      6
>> 6        2008-6-1 04:00:00      0
>> 7        2008-6-1 05:00:00      7
>> 8        2008-6-1 06:00:00      7
>> .            .                                .
>> .            .                                .
>> .            .                                .
>> 3000  2008-8-31 00:00:00    3
>>
>>
>> the problem is that row 3 & 4 and row 5 & 6 have the same "Datetime" value but they differ in the values of the "Temperature" column.
>> Now for the whole dataframe i would like to delete rows that have the same "Datetime" value as the prior row.
>> I have tried unique(dataframe), but it does not work here because the rows are no real duplicates of each other.
>> thanks in advance for your help!
>>
>> Antje
>
>
> -- 
> Jim Holtman
> Cincinnati, OH
> +1 513 646 9390
>
> What is the problem that you are trying to solve?

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595