[R] Undesired result

Val v@|kremk @end|ng |rom gm@||@com
Wed Feb 17 19:45:54 CET 2021


Very helpful and thank you so much!


On Wed, Feb 17, 2021 at 12:50 PM Duncan Murdoch
<murdoch.duncan using gmail.com> wrote:
>
> On 17/02/2021 9:50 a.m., Val wrote:
> > HI All,
> >
> > I am reading a data file which has different date formats. I wanted to
> > standardize to one format and used  a library anytime but got
> > undesired results as shown below. It gave me year 2093 instead of 1993
> >
> >
> > library(anytime)
> > DFX<-read.table(text="name ddate
> >    A  19-10-02
> >    D  11/19/2006
> >    F  9/9/2011
> >    G1  12/29/2010
> >    AA   10/18/93 ",header=TRUE)
> >      getFormats()
> >      addFormats(c("%d-%m-%y"))
> >      addFormats(c("%m-%d-%y"))
> >      addFormats(c("%Y/%d/%m"))
> >      addFormats(c("%m/%d/%y"))
> >
> > DFX$anew=anydate(DFX$ddate)
> >
> > Output
> >   name      ddate       anew
> > 1    A   19-10-02 2002-10-19
> > 2    D 11/19/2006 2020-11-19
> > 3    F   9/9/2011 2011-09-09
> > 4   G1 12/29/2010 2020-12-29
> > 5   AA   10/18/93 2093-10-18
> >
> > The problem is in the last row. It should be  1993-10-18 instead of 2093-10-18
> >
> > How do I correct this?
>
> This looks a little tricky.  The basic idea is that the %y format has to
> guess at the century, but the guess depends on things specific to your
> system.  So what would be nice is to say "two digit dates should be
> assumed to fall between 1922 and 2021", but there's no way to do that
> directly.
>
> What you could do is recognize when you have a two digit year, and then
> force the result into the range you want.  Here's a function that does
> that, but it's not really tested much at all, so be careful if you use
> it.  (One thing:  I recommend the 'useR = TRUE' option to anydate(); it
> worked better in my tests than the default.)
>
> adjustCentury <- function(inputString,
>                            outputDate = anydate(inputString, useR = TRUE),
>                            start = "1922-01-01") {
>
>    start <- as.Date(start)
>
>    twodigityear <- !grepl("[[:digit:]]{4}", inputString)
>
>    while (length(bad <- which(twodigityear & outputDate < start))) {
>      for (i in bad) {
>        longdate <- as.POSIXlt(outputDate[i])
>        longdate$year <- longdate$year + 100
>        outputDate[i] <- as.Date(longdate)
>      }
>    }
>    longdate <- as.POSIXlt(start)
>    longdate$year <- longdate$year + 100
>    finish <- as.Date(longdate)
>
>    while (length(bad <- which(twodigityear & outputDate >= finish))) {
>      for (i in bad) {
>        longdate <- as.POSIXlt(outputDate[i])
>        longdate$year <- longdate$year - 100
>        outputDate[i] <- as.Date(longdate)
>      }
>    }
>    outputDate
> }
>
> library(anytime)
> DFX<-read.table(text="name ddate
>    A  19-10-02
>    D  11/19/2006
>    F  9/9/2011
>    G1  12/29/2010
>    AA   10/18/93
>    BB   10/18/1893
>    CC   10/18/2093",header=TRUE)
>
> addFormats(c("%d-%m-%y"))
> addFormats(c("%m-%d-%y"))
> addFormats(c("%Y/%d/%m"))
> addFormats(c("%m/%d/%y"))
>
> DFX$anew=adjustCentury(DFX$ddate, start = "1921-01-01")
> DFX
> #>   name      ddate       anew
> #> 1    A   19-10-02 2019-10-02
> #> 2    D 11/19/2006 2006-11-19
> #> 3    F   9/9/2011 2011-09-09
> #> 4   G1 12/29/2010 2010-12-29
> #> 5   AA   10/18/93 1993-10-18
> #> 6   BB 10/18/1893 1893-10-18
> #> 7   CC 10/18/2093 2093-10-18



More information about the R-help mailing list