[R] Date-Time-Stamp input method for user-specific formats

esp davidgaryesp at gmail.com
Tue Oct 6 17:29:28 CEST 2009


Another solution, as a fix to my original algorithm, was found by a colleague
(Matthew Roberts).  While he claims not too much for its elegance, it does
seem to work.  This fix is based on the use of the 'pmax' function.  This
function is a variant of the 'max' (maximum) function to return a vector of
results corresponding to vectors of inputs.  Example: max(1:3,4:8) == 8 but
pmax(1:3,4:6) == 4 5 6.  Thanks to this, it provides appropriate results for
all rows of the data.

In the code, there are two possible datetimestamp interpretations, midnight
and non midnight, each implemented by a 'strptime' call.  When a midnight
datetimestamp is encountered, only the midnight conversion will return a
proper (non NA) value.  Thanks to the "na.rm=TRUE" option, the NA result is
removed so 'pmax' returns just the proper value.  For a non midnight
datetimestamp, both midnight and non midnight conversions return proper
values, but only the non midnight conversion will give a result greater than
midnight, and it is this that is returned by the 'pmax'.  

The code is as follows:

spot_frequency_readin <- function(file,nrows=-1) {

# create temp class
setClass("t_class2_", representation("character"))
setAs("character", "t_class2_", function(from) {
as.POSIXct(pmax(strptime(from, format="%d/%m/%Y"),
                strptime(from, format="%d/%m/%Y %H:%M:%S"),
                na.rm=TRUE), tz="GMT")
}
)

#(for format symbols, see "R Reference Card")

# read the file (TSV)
file <- read.delim(file, header=TRUE, comment.char = "", nrows=nrows,
as.is=FALSE, col.names=c("DATETIME", "FREQ"), colClasses=c("t_class2_",
"numeric") )

# remove it now that we are done with it
removeClass("t_class2_")

return(file)
}


The result:
> spot
             DATETIME   FREQ
1 2009-09-01 00:00:00 50.036
2 2009-09-01 00:00:01 50.035
3 2009-09-01 00:00:02 50.035
4 2009-09-01 00:00:03 50.033


Confirm the nature of the result:
> str(spot)
'data.frame':   4 obs. of  2 variables:
 $ DATETIME: POSIXct, format: "2009-09-01 00:00:00" "2009-09-01 00:00:01"
"2009-09-01 00:00:02" "2009-09-01 00:00:03"
 $ FREQ    : num  50 50 50 50


(Note: 'str' means "Compactly display the internal structure of an R
object".  I can claim from experience that his and 'ls.str' are things that
the novice R user can benefit hugely from knowing about)
-- 
View this message in context: http://www.nabble.com/Date-Time-Stamp-input-method-for-user-specific-formats-tp25757018p25770983.html
Sent from the R help mailing list archive at Nabble.com.




More information about the R-help mailing list