[R] Date-Time-Stamp input method for user-specific formats

Gabor Grothendieck ggrothendieck at gmail.com
Tue Oct 6 00:50:42 CEST 2009


Try this.  First we read a line at a time into L except for the
header.  Then we use strapply to match on the given pattern.  It
passes the backreferences (the portions within parentheses in the
pattern) to the function (defined via a formula) whose implicit
arguments are x, y and z.  That function returns two columns which are
in the required form so that in the next statement we convert one to
chron and the other to numeric.  See R News 4/1 for more about dates
and times.

library(gsubfn) # strapply
library(chron) # as.chron
Lines <- "DATETIME        FREQ
01/09/2009      59.036
01/09/2009 00:00:01     58.035
01/09/2009 00:00:02     53.035
01/09/2009 00:00:03     47.033
01/09/2009 00:00:04     52.03
01/09/2009 00:00:05     55.025"
L <- readLines(Lines)[-1]

pat <- "(../../....) (..:..:..){0,1} *([0-9.]+)"
s <- strapply(L, pat, ~ c(paste(x, y, "00:00:00"), z), simplify = rbind)

fmt <- "%m/%d/%Y %H:%M:%S"
DF <- data.frame(Time = as.chron(s[,1], fmt), Freq = as.numeric(s[,2]))

DF

The final output looks like this:

> DF
                 Time   Freq
1 (01/09/09 00:00:00) 59.036
2 (01/09/09 00:00:01) 58.035
3 (01/09/09 00:00:02) 53.035
4 (01/09/09 00:00:03) 47.033
5 (01/09/09 00:00:04) 52.030
6 (01/09/09 00:00:05) 55.025

If the times are unique you could consider making a zoo object out of
it by replacing the DF<- statement with:

library(zoo)
z <- zoo(as.numeric(s[,2]), as.chron(s[,1], fmt))

See the three vignettes in the zoo package.


On Mon, Oct 5, 2009 at 5:14 PM, esp <davidgaryesp at gmail.com> wrote:
>
> Date-Time-Stamp input method to correctly interpret user-specific
> formats:coding is  90% there - based on exmple at
> http://tolstoy.newcastle.edu.au/R/help/05/02/12003.html
> ...anyone got the last 10% please?
>
> CONTEXT:
>
> Data is received where one of the columns is a datetimestamp.  At midnight,
> the value represented as text in this column consists of just the date part,
> e.g. "01/09/2009".  At other times, the value in the column contains both
> date and time e.g. "01/09/2009 00:00:01".  The goal is to read it into R as
> an appropriate data type, where for example date arithmetic can be
> performed.  As far as I can tell, the most appropriate such data type is
> POSIXct.  The trick then is to read in the datetimestamps in the data as
> this type.
>
> PROBLEM:
>
> POSIXct defaults to a text representation almost but not quite like my
> received data.  The main difference is that the POSIXct date part is in
> reverse order, e.g. "2009-09-01".  It is possible to define a different
> format where date and time parts look like my data but when encountering
> datetimestamps where only the the date part is present (as in the case of my
> midnight data) then this is interpreted as NA i.e. undefined.
>
> SOLUTION (ALMOST):
>
> There is a workaround (based on example at
> http://tolstoy.newcastle.edu.au/R/help/05/02/12003.html).  It is possible to
> define a class then read the data in as this class.  For such a class it is
> possible to define a class method, in terms of a function, for translating a
> text (character string) representation into a value. In that function, one
> can use a conditional expression to treat midnight datetimestamps
> differently from those at other times of day.  The example below does that.
> In order to apply this function over all of the datetimestamp values in the
> column, it is necessary to use something like R's 'sapply' function.
>
> SNAG:
>
> The function below implements this approach.  A datetimestamp with only the
> date part, including leading zeroes, is always length 10 (characters).   It
> correctly interprets the datetimestamp values, but unfortunately translates
> them into what appear to be numeric type.  I am actually uncertain precisely
> what is happening, as I am very new to R and have most certainly stretched
> myself in writing this code.  I think perhaps it returns a list and
> something associated with this aspect makes it "forget" the data type is
> POSIXct or at least how such a type should be displayed as text or what to
> do about it.
>
> PLEA:
>
> Please, can anyone give any help whatsoever, however tenuous?
>
> CODE, DATA & RESULTS:
>
> Function to Read required data, intended to make the datetime column of the
> data (example given further below) into POSIXct values:
> <<<
> spot_frequency_readin <- function(file,nrows=-1) {
>
> # create temp class
> setClass("t_class2_", representation("character"))
> setAs("character", "t_class2_", function(from) {sapply(from, function(x) {
>  if (nchar(x)==10) {
> as.POSIXct(strptime(x,format="%d/%m/%Y"))
> }
> else {
> as.POSIXct(strptime(x,format="%d/%m/%Y %H:%M:%S"))
> }
> }
> )
> }
> )
>
> #(for format symbols, see "R Reference Card")
>
> # read the file (TSV)
> file <- read.delim(file, header=TRUE, comment.char = "", nrows=nrows,
> as.is=FALSE, col.names=c("DATETIME", "FREQ"), colClasses=c("t_class2_",
> "numeric") )
>
> # remove it now that we are done with it
> removeClass("t_class2_")
>
> return(file)
> }
>>>>
> This appears to work apart as regards processing each row of data correctly,
> but the values returned look like numeric equivalents of POSIXct, as opposed
> to the expected character-based (string) equivalents:
>
>
> Example Data:
> <<<
> DATETIME        FREQ
> 01/09/2009      59.036
> 01/09/2009 00:00:01     58.035
> 01/09/2009 00:00:02     53.035
> 01/09/2009 00:00:03     47.033
> 01/09/2009 00:00:04     52.03
> 01/09/2009 00:00:05     55.025
>>>>
>
>
> Example Function Call:
> <<<
>> spot = spot_frequency_readin("mydatafile.txt",4)
>>>>
>
>
> Result of Example Function Call:
> <<<
>> spot[1]
>    DATETIME
>
> 1 1251759600
> 2 1251759601
> 3 1251759602
> 4 1251759603
>>>>
>
>
> What I ideally wanted to see (whether or not the time part of the
> datetimestamp at midnight was displayed):
> <<<
>> spot[1]
>    DATETIME
>
> 01/09/2009 00:00:00
> 01/09/2009 00:00:01
> 01/09/2009 00:00:02
> 01/09/2009 00:00:03
> 01/09/2009 00:00:04
>>>>
>
>
> For the function as defined above using 'sapply'
>> spot[,1]
>         01/09/2009 01/09/2009 00:00:01 01/09/2009 00:00:02 01/09/2009
> 00:00:03
>         1251759600          1251759601          1251759602
> 1251759603
>
> This was unexpected - it seems to have displayed the datetimestamp values
> both as per my defined character-string representation and as numeric
> values.
>
> Alternatively ifI replace the 'sapply' by a 'lapply' then I get something
> closer to what I expect.  It is at least what looks like R's default text
> representation for POSIXct datetimes, even if it is not in my preferred
> format.
> <<<
>> spot[,1]
>
> [[1]]
> [1] "2009-09-01 BST"
>
> [[2]]
> [1] "2009-09-01 00:00:01 BST"
>
> [[3]]
> [1] "2009-09-01 00:00:02 BST"
>
> [[4]]
> [1] "2009-09-01 00:00:03 BST"
>>>>
>
> --
> View this message in context: http://www.nabble.com/Date-Time-Stamp-input-method-for-user-specific-formats-tp25757018p25757018.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>




More information about the R-help mailing list