[R] Using lapply in R data table

Bert Gunter bgunter.4567 at gmail.com
Mon Sep 26 22:27:05 CEST 2016


Ista:

Aha -- now I see the point. My bad. You are right. I was careless.

However, cut() with ifelse() might simplify the code a bit and/or make
it more readable. To be clear, this is just a matter of taste; e.g.
using your data and a data frame instead of a data table:

> DT <- within(DT,
        exposure <- {
          f <-cut(fini,as.Date(c("2000-01-01","2006-01-01","2006-06-30","2006-12-21")),
              labels= letters[1:3])
          ifelse(f == "a", 1,
                 ifelse( f == "c", .5,
                    difftime(as.Date("2007-01-01"), fini, units="days")/365.25))
}
        )


> DT
  id       fini group  exposure f
1  2 2005-04-20     A 1.0000000 a
2  2 2005-04-20     A 1.0000000 a
3  2 2005-04-20     A 1.0000000 a
4  5 2006-02-19     B 0.8651608 b
5  5 2006-06-29     B 0.5092402 b
6  7 2006-10-08     A 0.5000000 c
7  7 2006-10-08     A 0.5000000 c
Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Mon, Sep 26, 2016 at 12:07 PM, Ista Zahn <istazahn at gmail.com> wrote:
> On Mon, Sep 26, 2016 at 2:48 PM, Bert Gunter <bgunter.4567 at gmail.com> wrote:
>> I thought that that was a typo from the OP, as it disagrees with his
>> example. But the labels are arbitrary, so in fact cut() will do it
>> whichever way he meant.
>
> I don't see how cut will do it, at least not conveniently. Consider
> this slightly altered example:
>
> library(data.table)
> DT <- data.table(
>   id = rep(c(2, 5, 7), c(3, 2, 2)),
>   fini = rep(as.Date(c('2005-04-20',
>                        '2006-02-19',
>                        '2006-06-29',
>                        '2006-10-08')),
>              c(3, 1, 1, 2)),
>   group = rep(c("A", "B", "A"), c(3, 2, 2))  )
>
> DT[, exposure := vector(mode = "numeric", length = .N)]
> DT[fini < as.Date("2006-01-01"), exposure := 1]
> DT[fini >= as.Date("2006-01-01") & fini <= as.Date("2006-06-30"),
>    exposure := difftime(as.Date("2007-01-01"), fini, units="days")/365.25]
> DT[fini >= as.Date("2006-07-01"), exposure := 0.5]
>
> DT
>
> ##    id       fini group  exposure
> ## 1:  2 2005-04-20     A 1.0000000
> ## 2:  2 2005-04-20     A 1.0000000
> ## 3:  2 2005-04-20     A 1.0000000
> ## 4:  5 2006-02-19     B 0.8651608
> ## 5:  5 2006-06-29     B 0.5092402
> ## 6:  7 2006-10-08     A 0.5000000
> ## 7:  7 2006-10-08     A 0.5000000
>
> Best,
> Ista
>
>>
>> -- Bert
>> Bert Gunter
>>
>> "The trouble with having an open mind is that people keep coming along
>> and sticking things into it."
>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>
>>
>> On Mon, Sep 26, 2016 at 11:37 AM, Ista Zahn <istazahn at gmail.com> wrote:
>>> On Mon, Sep 26, 2016 at 1:59 PM, Bert Gunter <bgunter.4567 at gmail.com> wrote:
>>>> This seems like a job for cut() .
>>>
>>> I thought that at first two, but the middle group shouldn't be .87 but rather
>>>
>>> exposure" = "2007-01-01" - "fini"
>>>
>>> so, I think cut alone won't do it.
>>>
>>> Best,
>>> Ista
>>>>
>>>> (I made DT a data frame to avoid loading the data table package. But I
>>>> assume it would work with a data table too, Check this, though!)
>>>>
>>>>> DT <- within(DT, exposure <- cut(fini,as.Date(c("2000-01-01","2006-01-01","2006-06-30","2006-12-21")), labels= c(1,.87,.5)))
>>>>
>>>>> DT
>>>>   id       fini group exposure
>>>> 1  2 2005-04-20     A        1
>>>> 2  2 2005-04-20     A        1
>>>> 3  2 2005-04-20     A        1
>>>> 4  5 2006-02-19     B     0.87
>>>> 5  5 2006-02-19     B     0.87
>>>> 6  7 2006-10-08     A      0.5
>>>> 7  7 2006-10-08     A      0.5
>>>>
>>>>
>>>> (but note that exposure is a factor, not numeric)
>>>>
>>>>
>>>> Cheers,
>>>> Bert
>>>>
>>>> Bert Gunter
>>>>
>>>> "The trouble with having an open mind is that people keep coming along
>>>> and sticking things into it."
>>>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>>>
>>>>
>>>> On Mon, Sep 26, 2016 at 10:05 AM, Ista Zahn <istazahn at gmail.com> wrote:
>>>>> Hi Frank,
>>>>>
>>>>> lapply(DT) iterates over each column. That doesn't seem to be what you want.
>>>>>
>>>>> There are probably better ways, but here is one approach.
>>>>>
>>>>> DT[, exposure := vector(mode = "numeric", length = .N)]
>>>>> DT[fini < as.Date("2006-01-01"), exposure := 1]
>>>>> DT[fini >= as.Date("2006-01-01") & fini <= as.Date("2006-06-30"),
>>>>>       exposure := difftime(as.Date("2007-01-01"), fini, units="days")/365.25]
>>>>> DT[fini >= as.Date("2006-07-01"), exposure := 0.5]
>>>>>
>>>>> Best,
>>>>> Ista
>>>>>
>>>>> On Mon, Sep 26, 2016 at 11:28 AM, Frank S. <f_j_rod at hotmail.com> wrote:
>>>>>> Dear all,
>>>>>>
>>>>>> I have a R data table like this:
>>>>>>
>>>>>> DT <- data.table(
>>>>>>   id = rep(c(2, 5, 7), c(3, 2, 2)),
>>>>>>   fini = rep(as.Date(c('2005-04-20', '2006-02-19', '2006-10-08')), c(3, 2, 2)),
>>>>>>   group = rep(c("A", "B", "A"), c(3, 2, 2))  )
>>>>>>
>>>>>>
>>>>>> I want to construct a new variable "exposure" defined as follows:
>>>>>>
>>>>>> 1) If "fini" earlier than 2006-01-01 --> "exposure" = 1
>>>>>> 2) If "fini" in [2006-01-01, 2006-06-30] --> "exposure" = "2007-01-01" - "fini"
>>>>>> 3) If "fini" in [2006-07-01, 2006-12-31] --> "exposure" = 0.5
>>>>>>
>>>>>>
>>>>>> So the desired output would be the following data table:
>>>>>>
>>>>>>    id                fini exposure group
>>>>>> 1:  2 2005-04-20        1.00        A
>>>>>> 2:  2 2005-04-20        1.00        A
>>>>>> 3:  2 2005-04-20        1.00        A
>>>>>> 4:  5 2006-02-19        0.87        B
>>>>>> 5:  5 2006-02-19        0.87        B
>>>>>> 6:  7 2006-10-08        0.50        A
>>>>>> 7:  7 2006-10-08        0.50        A
>>>>>>
>>>>>>
>>>>>> I have tried:
>>>>>>
>>>>>> DT <- DT[ , list(id, fini, exposure = 0, group)]
>>>>>> DT.new <- lapply(DT, function(exposure){
>>>>>>       exposure[fini < as.Date("2006-01-01")] <- 1   # 1st case
>>>>>>       exposure[fini >= as.Date("2006-01-01") & fini <= as.Date("2006-06-30")] <- difftime(as.Date("2007-01-01"), fini, units="days")/365.25 # 2nd case
>>>>>>     exposure[fini >= as.Date("2006-07-01") & fini <= as.Date("2006-12-31")] <- 0.5       # 3rd case
>>>>>>       exposure  # return value
>>>>>>   })
>>>>>>
>>>>>>
>>>>>> But I get an error message.
>>>>>>
>>>>>> Thanks for any help!!
>>>>>>
>>>>>>
>>>>>> Frank S.
>>>>>>
>>>>>>
>>>>>>         [[alternative HTML version deleted]]
>>>>>>
>>>>>> ______________________________________________
>>>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>
>>>>> ______________________________________________
>>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list