[R] Date-Time-Stamp input method for user-specific formats

esp davidgaryesp at gmail.com
Mon Oct 5 23:14:35 CEST 2009


Date-Time-Stamp input method to correctly interpret user-specific
formats:coding is  90% there - based on exmple at
http://tolstoy.newcastle.edu.au/R/help/05/02/12003.html
...anyone got the last 10% please?  

CONTEXT:

Data is received where one of the columns is a datetimestamp.  At midnight,
the value represented as text in this column consists of just the date part,
e.g. "01/09/2009".  At other times, the value in the column contains both
date and time e.g. "01/09/2009 00:00:01".  The goal is to read it into R as
an appropriate data type, where for example date arithmetic can be
performed.  As far as I can tell, the most appropriate such data type is
POSIXct.  The trick then is to read in the datetimestamps in the data as
this type.

PROBLEM:

POSIXct defaults to a text representation almost but not quite like my
received data.  The main difference is that the POSIXct date part is in
reverse order, e.g. "2009-09-01".  It is possible to define a different
format where date and time parts look like my data but when encountering
datetimestamps where only the the date part is present (as in the case of my
midnight data) then this is interpreted as NA i.e. undefined.

SOLUTION (ALMOST):

There is a workaround (based on example at
http://tolstoy.newcastle.edu.au/R/help/05/02/12003.html).  It is possible to
define a class then read the data in as this class.  For such a class it is
possible to define a class method, in terms of a function, for translating a
text (character string) representation into a value. In that function, one
can use a conditional expression to treat midnight datetimestamps
differently from those at other times of day.  The example below does that. 
In order to apply this function over all of the datetimestamp values in the
column, it is necessary to use something like R's 'sapply' function.

SNAG:

The function below implements this approach.  A datetimestamp with only the
date part, including leading zeroes, is always length 10 (characters).   It
correctly interprets the datetimestamp values, but unfortunately translates
them into what appear to be numeric type.  I am actually uncertain precisely
what is happening, as I am very new to R and have most certainly stretched
myself in writing this code.  I think perhaps it returns a list and
something associated with this aspect makes it "forget" the data type is
POSIXct or at least how such a type should be displayed as text or what to
do about it.

PLEA:

Please, can anyone give any help whatsoever, however tenuous?

CODE, DATA & RESULTS:

Function to Read required data, intended to make the datetime column of the
data (example given further below) into POSIXct values:
<<<
spot_frequency_readin <- function(file,nrows=-1) {

# create temp class
setClass("t_class2_", representation("character"))
setAs("character", "t_class2_", function(from) {sapply(from, function(x) {
  if (nchar(x)==10) {
as.POSIXct(strptime(x,format="%d/%m/%Y"))
}
else {
as.POSIXct(strptime(x,format="%d/%m/%Y %H:%M:%S"))
}
}
)
}
)

#(for format symbols, see "R Reference Card")

# read the file (TSV)
file <- read.delim(file, header=TRUE, comment.char = "", nrows=nrows,
as.is=FALSE, col.names=c("DATETIME", "FREQ"), colClasses=c("t_class2_",
"numeric") )

# remove it now that we are done with it
removeClass("t_class2_")

return(file)
}
>>>
This appears to work apart as regards processing each row of data correctly,
but the values returned look like numeric equivalents of POSIXct, as opposed
to the expected character-based (string) equivalents:


Example Data:
<<<
DATETIME	FREQ
01/09/2009	59.036
01/09/2009 00:00:01	58.035
01/09/2009 00:00:02	53.035
01/09/2009 00:00:03	47.033
01/09/2009 00:00:04	52.03
01/09/2009 00:00:05	55.025
>>>


Example Function Call:
<<<
> spot = spot_frequency_readin("mydatafile.txt",4)
>>>


Result of Example Function Call:
<<<
> spot[1]
    DATETIME

1 1251759600
2 1251759601
3 1251759602
4 1251759603
>>>


What I ideally wanted to see (whether or not the time part of the
datetimestamp at midnight was displayed):
<<<
> spot[1]
    DATETIME

01/09/2009 00:00:00
01/09/2009 00:00:01
01/09/2009 00:00:02
01/09/2009 00:00:03
01/09/2009 00:00:04
>>>


For the function as defined above using 'sapply'
> spot[,1]
         01/09/2009 01/09/2009 00:00:01 01/09/2009 00:00:02 01/09/2009
00:00:03 
         1251759600          1251759601          1251759602         
1251759603

This was unexpected - it seems to have displayed the datetimestamp values
both as per my defined character-string representation and as numeric
values.  

Alternatively ifI replace the 'sapply' by a 'lapply' then I get something
closer to what I expect.  It is at least what looks like R's default text
representation for POSIXct datetimes, even if it is not in my preferred
format.
<<<
> spot[,1]

[[1]]
[1] "2009-09-01 BST"

[[2]]
[1] "2009-09-01 00:00:01 BST"

[[3]]
[1] "2009-09-01 00:00:02 BST"

[[4]]
[1] "2009-09-01 00:00:03 BST"
>>>

-- 
View this message in context: http://www.nabble.com/Date-Time-Stamp-input-method-for-user-specific-formats-tp25757018p25757018.html
Sent from the R help mailing list archive at Nabble.com.




More information about the R-help mailing list