[R] Add a new row based on test set predicted values and time stamps

Jeff Newmiller jdnewm|| @end|ng |rom dcn@d@v|@@c@@u@
Tue Apr 13 20:23:38 CEST 2021


The date you get using as.Date on a POSIXct value depends on the timezone. That is, as.Date only pays attention to the underlying UTC seconds-since-epoch value, so it ignores the timezone which can be unexpected for most people. 

TL;DR as.Date is not the same as as.POSIXct( trunc( dtm, units="days" ) ) unless you are using GMT.

On April 13, 2021 10:55:04 AM PDT, Bert Gunter <bgunter.4567 using gmail.com> wrote:
>(Revealing my ignorance):
>
>Simpler still than the as.POSIXct() idiom is just to use the as.Date
>version:
>
>out <- with(out, out [order(Group, id, as.Date(Date)),])
>
>## all else the same...
>
>Bert Gunter
>
>"The trouble with having an open mind is that people keep coming along
>and
>sticking things into it."
>-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
>On Tue, Apr 13, 2021 at 10:47 AM Bert Gunter <bgunter.4567 using gmail.com>
>wrote:
>
>> It may not be necessary to insert the rows in that order -- R can
>identify
>> and use the information from the rows in in most cases without it.
>> So to combine the results as you described (the code you sent got
>garbled
>> a bit btw -- you should proofread more carefully in future), all you
>would
>> need to do is:
>>
>> ## with train and test your train and test data frames of course
>> out <- na.omit(rbind(train, cbind(test[,c(1,3,4)], Value =
>> test[,"value"])))
>> ## Note that the cbind() stuff is needed to create the correct
>"Value"
>> column for rbind(). See ?rbind for details
>>
>> If you insist that you need the row ordering as you specified, then
>follow
>> this by:
>>
>> out <- with(out, out[order(Group, id, as.POSIXct(Date,format =
>"%D%")), ])
>>
>> What this does is to first convert your text data column to POSIXct
>("See
>> ?DateTimeClasses for details) which gives them the desired calendar
>> ordering. The order() function (see ?order for details) then gives
>the
>> permutation ordering them from early to late within groups and id's,
>which
>> are then used as the row subscripts to reorder the rows in the data
>frame.
>>
>> DO NOTE: For this to work reliably, your Date column must be
>consistent
>> and correct in its formatting!
>>
>> Other note: It probably makes more sense to convert your Date column
>to a
>> POSIXct or POSIXlt dates from the beginning, as this will make things
>like
>> plotting in date order straightforward. There are also date-time
>packages
>> (in the "tidyverse" suite, I think,  as well as others) that simplify
>such
>> things. I am pretty ignorant about date-time stuff, so I can't really
>be
>> more specific. https://cran.r-project.org/web/views/TimeSeries.html 
>will
>> have lots of info on this if you need it. As well as searching, of
>course.
>>
>> HTH
>>
>> Bert Gunter
>>
>> "The trouble with having an open mind is that people keep coming
>along and
>> sticking things into it."
>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>
>>
>> On Tue, Apr 13, 2021 at 3:26 AM Elahe chalabi via R-help <
>> r-help using r-project.org> wrote:
>>
>>> Hi all,
>>>
>>> I have the prediction for my test set which are forecasted Value for
>>> "4/1/2020" for each match of "id" and "Group". I would like to add a
>fourth
>>> row to each group by (Group,id) in my train set and the values for
>this row
>>> should come from test set :
>>>
>>> my train set:
>>>
>>>      structure(list(Date = c("1/1/2020", "2/1/2020", "3/1/2020",
>>> "1/1/2020",
>>>      "2/1/2020", "3/1/2020", "1/1/2020", "2/1/2020", "3/1/2020", ""
>>>      ), Value = c(3.5, 2.7, 4, 2.5, 3.7, 0, 3, 0, 1, NA), Group =
>c("A",
>>>     "A", "A", "B", "B", "B", "C", "C", "C", ""), id = c(1L, 1L, 1L,
>>>     101L, 101L, 101L, 100L, 100L, 100L, NA)), class = "data.frame",
>>> row.names = c(NA,
>>>     -10L))
>>>
>>> test set:
>>>
>>>     structure(list(Date = c("4/1/2020", "4/1/2020", "4/1/2020", ""
>>>       ), Value = c(3.5, 2.5, 3, NA), Group = c("A", "B", "C", ""),
>>>     id = c(1L, 101L, 100L, NA), value = c(0.2, 0.7, 0.9, NA)), class
>=
>>> "data.frame", row.names = c(NA,
>>>      -4L))structure(list(Date = c("4/1/2020", "4/1/2020",
>"4/1/2020", ""
>>>     ), Value = c(3.5, 2.5, 3, NA), Group = c("A", "B", "C", ""),
>>>     id = c(1L, 101L, 100L, NA)), class = "data.frame", row.names =
>c(NA,
>>>     -4L))
>>>
>>> desired output:
>>>
>>>     structure(list(Date = c("1/1/2020", "2/1/2020", "3/1/2020",
>>> "4/1/2020",
>>>     "1/1/2020", "2/1/2020", "3/1/2020", "4/1/2020", "1/1/2020",
>>> "2/1/2020",
>>>     "3/1/2020", "4/1/2020"), Value = c(3.5, 2.7, 4, 0.2, 2.5, 3.7,
>>>      0, 0.7, 3, 0, 1, 0.9), Group = c("A", "A", "A", "A", "B", "B",
>>>     "B", "B", "C", "C", "C", "C"), id = c(1L, 1L, 1L, 1L, 101L,
>101L,
>>>     101L, 101L, 100L, 100L, 100L, 100L)), class = "data.frame",
>row.names
>>> = c(NA,
>>>    -12L))
>>>
>>> Data is dummy and I have milions of records in original data set.
>>>
>>> Thanks for any help.
>>> Elahe
>>>
>>>
>>> ______________________________________________
>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>
>	[[alternative HTML version deleted]]
>
>______________________________________________
>R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

-- 
Sent from my phone. Please excuse my brevity.



More information about the R-help mailing list