[R] data frame of strings formatted

Peter Langfelder peter.langfelder at gmail.com
Fri Mar 2 05:29:19 CET 2012


On Thu, Mar 1, 2012 at 8:05 PM, Ben quant <ccquant at gmail.com> wrote:
> Hello,
>
> I have another question....
>
> I have a data frame that looks like this:
>                         a          b
> 2007-03-31 "20070514" "20070410"
> 2007-06-30 "20070814" "20070709"
> 2007-09-30 "20071115" "20071009"
> 2007-12-31 "20080213" "20080109"
> 2008-03-31 "20080514" "20080407"
> 2008-06-30 "20080814" "--"
> 2008-09-30 "20081114" "20081007"
> 2008-12-31 "20090217" "20090112"
> 2009-03-31 "--"             "20090407"
> 2009-06-30 "20090817" "20090708"
> 2009-09-30 "20091113" "--"
> 2009-12-31 "20100212" "20100111"
> 2010-03-31 "20100517" "20100412"
> 2010-06-30 "20100816" "20100712"
> 2010-09-30 "20101112" "20101007"
> 2010-12-31 "20110214" "20110110"
> 2011-03-31 "20110513" "20110411"
> 2011-06-30 "20110815" "20110711"
> 2011-09-30 "20111115" "20111011"
>
> (actually it has about 10,00 columns)
>
> I'd like all of the strings to be formatted like 2011-11-15, 2011-10-11,
> etc. as a data frame of the same dimensions and all of the and dimnames
> intact. They don't have to be of date format. "--" can be NA or left the
> same. It does have to be fast though...

There may be a ready-made function for this, but if not, substring and
paste are your friends. Look them up.

Here's how I would do it:

fix = function(x)
{
  year = substring(x, 1, 4);
  mo = substring(x, 5, 6);
  day = substring(x, 7, 8);
  ifelse(year=="--", "NA", paste(year, mo, day, sep = "-"))
}

fixed = apply(YourDataFrame, 2, fix)
dimnames(fixed) = dimnames(YourDataFrame)

Since you don't provide an example I can't test it exhaustively but it
seems to work for me.

Peter



More information about the R-help mailing list