[R] Question on creating Date variable

David Winsemius dwinsemius at comcast.net
Tue Jan 1 19:33:03 CET 2013


On Jan 1, 2013, at 9:02 AM, jim holtman wrote:

>> # can use sprintf to convert to a number with 2 digit fractions

Useful procedure to prevent loss of trailing ".00"'s, ... but just to  
clarify, sprintf never returns a numeric class object, but rather  
returns a character representation of one. (Which is an appropriate  
class for `as.POSIXct`.)


>> x <- c(10.30, 11, 11.01, 11.09, 11.15, 11.59, 12, 13)
>> as.POSIXct(sprintf("%.2f", x), format = "%H.%M")
> [1] "2013-01-01 10:30:00 EST" "2013-01-01 11:00:00 EST" "2013-01-01
> 11:01:00 EST"
> [4] "2013-01-01 11:09:00 EST" "2013-01-01 11:15:00 EST" "2013-01-01
> 11:59:00 EST"
> [7] "2013-01-01 12:00:00 EST" "2013-01-01 13:00:00 EST"

Just as those values are printed character representations of what is  
internally a numeric vector even though it will not admit to being  
such until coerced:

 > Sys.Date()
[1] "2013-01-01"
 > is.numeric(Sys.Date())
[1] FALSE
 > as.numeric(Sys.Date())
[1] 15706
 > as.numeric(as.POSIXct(Sys.Date()))
[1] 1356998400
 >

-- 
David.
>>
>
>
> On Tue, Jan 1, 2013 at 11:09 AM, arun <smartpink111 at yahoo.com> wrote:
>> HI,
>>
>> Just by taking David's solution:
>> y <- as.POSIXct(paste( floor(x), round(60*(x-floor(x))) ),  
>> format="%H %M")
>> y1<-data.frame(y,AM_PM=format(y,format="%p"))
>> y1[3,1]-y1[4,1]
>> #Time difference of -40 mins
>> y1[5,1]-y1[3,1]
>> #Time difference of -13 mins
>> head(y1,2)
>> #                    y AM_PM
>> #1 2013-01-01 11:00:00    AM
>> #2 2013-01-01 11:15:00    AM
>> A.K.
>>
>>
>>
>>
>>
>> ----- Original Message -----
>> From: Christofer Bogaso <bogaso.christofer at gmail.com>
>> To: David Winsemius <dwinsemius at comcast.net>; David L Carlson <dcarlson at tamu.edu 
>> >
>> Cc: r-help at r-project.org
>> Sent: Tuesday, January 1, 2013 12:40 AM
>> Subject: Re: [R] Question on creating Date variable
>>
>> On 01 January 2013 03:00:18, David Winsemius wrote:
>>>
>>> On Dec 31, 2012, at 11:57 AM, David Winsemius wrote:
>>>
>>>>
>>>> On Dec 31, 2012, at 11:54 AM, Christofer Bogaso wrote:
>>>>
>>>>> On 01 January 2013 01:29:53, David Winsemius wrote:
>>>>>>
>>>>>> On Dec 31, 2012, at 11:35 AM, Christofer Bogaso wrote:
>>>>>>
>>>>>>> On 01 January 2013 00:17:50, David Winsemius wrote:
>>>>>>>>
>>>>>>>> On Dec 31, 2012, at 9:12 AM, Christofer Bogaso wrote:
>>>>>>>>
>>>>>>>>> Hello all,
>>>>>>>>>
>>>>>>>>> Let say I have following (numeric) vector:
>>>>>>>>>
>>>>>>>>>> x
>>>>>>>>> [1] 11.00 11.25 11.35 12.01 11.14 13.00 13.25 13.35 14.01  
>>>>>>>>> 13.14
>>>>>>>>> 14.50
>>>>>>>>> 14.75 14.85 15.51 14.64
>>>>>>>>>
>>>>>>>>> Now, I want to create a 'Date' variable (i.e. I should be able
>>>>>>>>> to do
>>>>>>>>> all calculations pertaining to date/time and also time-series
>>>>>>>>> plotting etc.) like
>>>>>>>>>
>>>>>>>>> 2012-12-31 11:00:00 AM, 2012-12-31 11:25:00 AM, 2012-12-31  
>>>>>>>>> 11:35:00
>>>>>>>>> AM, 2012-12-31 12:01:00 PM, . . . .
>>>>>>>>>
>>>>>>>>
>>>>>>>> Those _times_ ( _not_ Dates) cannot possibly be in %M.%S"  
>>>>>>>> format,
>>>>>>>> given the number of items to the right of the decimal point  
>>>>>>>> that are
>>>>>>>> greater than 60. So will proceed on the arguably more likely
>>>>>>>> assumption that they are in fractional minutes. To recover  
>>>>>>>> from that
>>>>>>>> problem, one might consider:
>>>>>>>>
>>>>>>>>> as.POSIXct(paste( floor(x), round(60*(x-floor(x))) ),
>>>>>>>> format="%M %S")
>>>>>>>> [1] "2012-12-31 00:11:00 PST" "2012-12-31 00:11:15 PST"
>>>>>>>> [3] "2012-12-31 00:11:21 PST" "2012-12-31 00:12:01 PST"
>>>>>>>> [5] "2012-12-31 00:11:08 PST" "2012-12-31 00:13:00 PST"
>>>>>>>> [7] "2012-12-31 00:13:15 PST" "2012-12-31 00:13:21 PST"
>>>>>>>> [9] "2012-12-31 00:14:01 PST" "2012-12-31 00:13:08 PST"
>>>>>>>> [11] "2012-12-31 00:14:30 PST" "2012-12-31 00:14:45 PST"
>>>>>>>> [13] "2012-12-31 00:14:51 PST" "2012-12-31 00:15:31 PST"
>>>>>>>> [15] "2012-12-31 00:14:38 PST"
>>>>>>>>
>>>>>>>
>>>>>>> I understand that some of those elements are not "dates".  
>>>>>>> However
>>>>>>> what I want is the ***"PM/AM" suffix*** on those elements  
>>>>>>> which are
>>>>>>> considered as Dates.
>>>>>>>
>>>>>>> ***Getting those suffix*** and doing calculations on those  
>>>>>>> changed
>>>>>>> variables is my primary concern.
>>>>>>
>>>>>> That's the first time that AM/PM has bee mentioned and I  
>>>>>> suppose if
>>>>>> those were fractional hours rather than my guess of fractional  
>>>>>> minutes
>>>>>> that there might be representatives of both in the numeric data  
>>>>>> you
>>>>>> offered. Why don't you clarify what these number do in fact  
>>>>>> represent?
>>>>>> And what problem you are trying to solve?
>>>>>>
>>>>>
>>>>> Basically those are artificial data! Actually I do not have the
>>>>> right to give out the original data in any public forum. So I
>>>>> created those artificial data so that I can get the fundamental  
>>>>> idea
>>>>> ...........
>>>>>
>>>>> Each element (assuming they are legitimate time) represents the  
>>>>> time
>>>>> for a particular day when some event is pop-up. like, 11AM,  
>>>>> 11.30AM,
>>>>> 12.05PM etc.. I could work with something like 11.00, 11.30,  
>>>>> 12.05,
>>>>> 15.00 etc. however I believe adding AM/PM suffice will make my
>>>>> report more eye-catching.
>>>>>
>>>>> Please let me know if you need more clarification.
>>>>
>>>> So what's with the values above 59 in the minutes?
>>>
>>> Failing an answer to that question, this code shows how to input
>>> date-time vectors from character vectors and then output it from
>>> date-time class to character class:
>>>
>>> x <- scan(text="11.00 11.25 11.35 12.01 11.14 13.00 13.25 13.35  
>>> 14.01
>>> 13.14 14.50 14.75 14.85 15.51 14.64")  # This will come in as a
>>> numeric vector
>>>
>>> ?strptime     # for the available format specifications
>>> format( as.POSIXct(as.character(x), format="%H.%M"),  # That is the
>>> input format
>>>             format="%I.%M %p")     # the output format
>>> [1] NA         "11.25 AM" "11.35 AM" "12.01 PM" "11.14 AM" NA
>>> [7] "01.25 PM" "01.35 PM" "02.01 PM" "01.14 PM" "02.05 PM" NA
>>> [13] NA         "03.51 PM" NA
>>>
>>> I suspect that the NA when minutes are ".00" comes from the implicit
>>> loss of the trailing digits:
>>>
>>>> as.character(0.00)
>>> [1] "0"
>>>
>>> The claim that this data is proprietary and cannot presented in its
>>> original form sound somewhat ridiculous.  Simmply post:
>>>
>>> dput(head(dfrm$time_data_column_name, 20))
>>>
>>> How could that represent any disclosure of proprietary information  
>>> if
>>> presented with no context?
>>>
>>
>> 'How could that represent any disclosure of proprietary information  
>> if
>> presented with no context? ' I must agree with you. But I just dont
>> want to take any risk! (job scenario in my country is not very
>> optimistic and I want to give my boss minimal chance/reason to fire!)
>>
>> And secondly with your approach, I cant do any calculation. Let take
>> following example:
>>
>> y <- format( as.POSIXct(as.character(x), format="%H.%M"),  # That is
>> the input format
>>             format="%I.%M %p")
>>
>> y[3] - y[2]
>>
>> This gives me following error:
>>
>> Error in y[3] - y[2] : non-numeric argument to binary operator
>>
>> I am having same error with Devid's approach as well:
>>
>>> y <- as.POSIXct(paste( floor(x), round(60*(x-floor(x))) ),  
>>> format="%H %M")
>>> z <- format(y, format="%Y-%m-%d %I:%M %p")
>>> z[2] - z[1]
>> Error in z[2] - z[1] : non-numeric argument to binary operator.
>>
>> Thanks and regards,
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>
>
> -- 
> Jim Holtman
> Data Munger Guru
>
> What is the problem that you are trying to solve?
> Tell me what you want to do, not how you want to do it.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
Alameda, CA, USA




More information about the R-help mailing list