[R] Add a new row based on test set predicted values and time stamps

Bert Gunter bgunter@4567 @end|ng |rom gm@||@com
Tue Apr 13 19:55:04 CEST 2021


(Revealing my ignorance):

Simpler still than the as.POSIXct() idiom is just to use the as.Date
version:

out <- with(out, out [order(Group, id, as.Date(Date)),])

## all else the same...

Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Tue, Apr 13, 2021 at 10:47 AM Bert Gunter <bgunter.4567 using gmail.com> wrote:

> It may not be necessary to insert the rows in that order -- R can identify
> and use the information from the rows in in most cases without it.
> So to combine the results as you described (the code you sent got garbled
> a bit btw -- you should proofread more carefully in future), all you would
> need to do is:
>
> ## with train and test your train and test data frames of course
> out <- na.omit(rbind(train, cbind(test[,c(1,3,4)], Value =
> test[,"value"])))
> ## Note that the cbind() stuff is needed to create the correct "Value"
> column for rbind(). See ?rbind for details
>
> If you insist that you need the row ordering as you specified, then follow
> this by:
>
> out <- with(out, out[order(Group, id, as.POSIXct(Date,format = "%D%")), ])
>
> What this does is to first convert your text data column to POSIXct ("See
> ?DateTimeClasses for details) which gives them the desired calendar
> ordering. The order() function (see ?order for details) then gives the
> permutation ordering them from early to late within groups and id's, which
> are then used as the row subscripts to reorder the rows in the data frame.
>
> DO NOTE: For this to work reliably, your Date column must be consistent
> and correct in its formatting!
>
> Other note: It probably makes more sense to convert your Date column to a
> POSIXct or POSIXlt dates from the beginning, as this will make things like
> plotting in date order straightforward. There are also date-time packages
> (in the "tidyverse" suite, I think,  as well as others) that simplify such
> things. I am pretty ignorant about date-time stuff, so I can't really be
> more specific. https://cran.r-project.org/web/views/TimeSeries.html  will
> have lots of info on this if you need it. As well as searching, of course.
>
> HTH
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along and
> sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
> On Tue, Apr 13, 2021 at 3:26 AM Elahe chalabi via R-help <
> r-help using r-project.org> wrote:
>
>> Hi all,
>>
>> I have the prediction for my test set which are forecasted Value for
>> "4/1/2020" for each match of "id" and "Group". I would like to add a fourth
>> row to each group by (Group,id) in my train set and the values for this row
>> should come from test set :
>>
>> my train set:
>>
>>      structure(list(Date = c("1/1/2020", "2/1/2020", "3/1/2020",
>> "1/1/2020",
>>      "2/1/2020", "3/1/2020", "1/1/2020", "2/1/2020", "3/1/2020", ""
>>      ), Value = c(3.5, 2.7, 4, 2.5, 3.7, 0, 3, 0, 1, NA), Group = c("A",
>>     "A", "A", "B", "B", "B", "C", "C", "C", ""), id = c(1L, 1L, 1L,
>>     101L, 101L, 101L, 100L, 100L, 100L, NA)), class = "data.frame",
>> row.names = c(NA,
>>     -10L))
>>
>> test set:
>>
>>     structure(list(Date = c("4/1/2020", "4/1/2020", "4/1/2020", ""
>>       ), Value = c(3.5, 2.5, 3, NA), Group = c("A", "B", "C", ""),
>>     id = c(1L, 101L, 100L, NA), value = c(0.2, 0.7, 0.9, NA)), class =
>> "data.frame", row.names = c(NA,
>>      -4L))structure(list(Date = c("4/1/2020", "4/1/2020", "4/1/2020", ""
>>     ), Value = c(3.5, 2.5, 3, NA), Group = c("A", "B", "C", ""),
>>     id = c(1L, 101L, 100L, NA)), class = "data.frame", row.names = c(NA,
>>     -4L))
>>
>> desired output:
>>
>>     structure(list(Date = c("1/1/2020", "2/1/2020", "3/1/2020",
>> "4/1/2020",
>>     "1/1/2020", "2/1/2020", "3/1/2020", "4/1/2020", "1/1/2020",
>> "2/1/2020",
>>     "3/1/2020", "4/1/2020"), Value = c(3.5, 2.7, 4, 0.2, 2.5, 3.7,
>>      0, 0.7, 3, 0, 1, 0.9), Group = c("A", "A", "A", "A", "B", "B",
>>     "B", "B", "C", "C", "C", "C"), id = c(1L, 1L, 1L, 1L, 101L, 101L,
>>     101L, 101L, 100L, 100L, 100L, 100L)), class = "data.frame", row.names
>> = c(NA,
>>    -12L))
>>
>> Data is dummy and I have milions of records in original data set.
>>
>> Thanks for any help.
>> Elahe
>>
>>
>> ______________________________________________
>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list