[R] R function to convert person-level observations to person-period observations

Muhuri, Pradip (SAMHSA/CBHSQ) Pradip.Muhuri at samhsa.hhs.gov
Sat Jan 3 19:22:59 CET 2015


Hello David,

Thank you so much for your advice.    The revision of the code as "reve <- data[, event]" in the function (but with no changing of the example data) seems to provide the desired results (shown below).   These 3 subjects are followed for 5 years.  Subject A experienced the event in year 2, and subject C experienced the event in year 3 while subject B were censored at the end follow-up period (i.e., year 5).  The person-period observations now seem to be consistent with the person-level observations.  Do you see any issues? 

Regards,

Pradip

###########################################################################################
## person-level observations                                                   
 ID dead studyyrs
1  A    1        2
2  B    0        5
3  C    1        3

## person-period observation
   ID dead studyyrs
1   A    0        1
2   A    1        2
3   B    0        1
4   B    0        2
5   B    0        3
6   B    0        4
7   B    0        5
8   C    0        1
9   C    0        2
10  C    1        3

Pradip K. Muhuri, PhD
SAMHSA/CBHSQ
1 Choke Cherry Road, Room 2-1071
Rockville, MD 20857
Tel: 240-276-1070
Fax: 240-276-1260

-----Original Message-----
From: David Barron [mailto:dnbarron at gmail.com] 
Sent: Saturday, January 03, 2015 10:19 AM
To: Muhuri, Pradip (SAMHSA/CBHSQ)
Cc: r-help at r-project.org
Subject: Re: [R] R function to convert person-level observations to person-period observations

Your data are wrong. The 'event' variable (dead in your example) needs to be 1 for cases that end in an event and 0 for spells that are
censored: yours is the other way around.  If you change the 'dead'
variable to c(1,0,1) you will get the desired result.

If you really need to reverse the behaviour of the function, change the line reve <- !data[, event] to reve <- data[, event]

David

On 3 January 2015 at 13:20, Muhuri, Pradip (SAMHSA/CBHSQ) <Pradip.Muhuri at samhsa.hhs.gov> wrote:
> Hello,
>
> I was trying to convert person-level observations to person-period observations using an R custom function obtained from the UCLA web site (http://www.ats.ucla.edu/stat/r/faq/person_period.htm).  Please see my reproducible example below.  The function (PLPP) in the R script takes five arguments.
>
>
> 1)  data (i.e., the data set to be converted)
>
> 2)  id (i.e., the identifier for each observation)
>
> 3)  period (i.e., number pf periods the person or observation was 
> followed-up)
>
> 4)  event (i.e., the variable that indicates whether the event occurred or not or whether the observation was censored (depending on which direction you are converting).
>
> 5)  direction which "indicates whether the function should go from person-level to person-period or from person-period to person-level".
> On my example data set, the R script ran successfully.  Based on 3 person-level observations (A died in year 2, B is censored in year 5, C died in year 3), I get 10 period-level observations - correct results.   But the issue is that the value of the "dead" indicator variable is incorrect.  I have a gut feeling that the function needs to tweaked a bit to get desired results.
>
>
> Correct results
>   ID dead   studyyrs
> 1  A    1        2
> 2  B    0        5
> 3  C    1        3
>
> Incorrect results - the "dead" column
>
>    ID dead    studyyrs
>
> 1   A    0        1
>
> 2   A    0        2
>
> 3   B    0        1
>
> 4   B    0        2
>
> 5   B    0        3
>
> 6   B    0        4
>
> 7   B    1        5
>
> 8   C    0        1
>
> 9   C    0        2
>
> 10  C    0        3
>
>
>
>
> Desired results
>
>    ID dead    studyyrs
>
> 1   A    0        1
>
> 2   A    1        2
>
> 3   B    0        1
>
> 4   B    0        2
>
> 5   B    0        3
>
> 6   B    0        4
>
> 7   B    0        5
>
> 8   C    0        1
>
> 9   C    0        2
>
> 10  C    1        3
>
>
> I would appreciate receiving your help or hints for resolving the 
> issue.  Thanks,
>
>
>
> ##  Below is my reproducible code is shown below)
>
> ## Below is my data frame (3 observations) df <- data.frame( 
> ID=LETTERS[1:3], dead=c(1,0,1), studyyrs=c(2,5,3) ) df
>
> ## Person-Level Person-Period Converter Function - Source: 
> http://www.ats.ucla.edu/stat/r/faq/person_period.htm
> PLPP <- function(data, id, period, event, direction = c("period", "level")) {
>   ## Data Checking and Verification Steps
>   stopifnot(is.matrix(data) || is.data.frame(data))
>   stopifnot(c(id, period, event) %in% c(colnames(data), 1:ncol(data)))
>
>   if (any(is.na(data[, c(id, period, event)]))) {
>     stop("PLPP cannot currently handle missing data in the id, period, or event variables")
>   }
>
>   ## Do the conversion - Source: http://www.ats.ucla.edu/stat/r/faq/person_period.htm
>   switch(match.arg(direction),
>          period = {
>            index <- rep(1:nrow(data), data[, period])
>            idmax <- cumsum(data[, period])
>            reve <- !data[, event]
>            dat <- data[index, ]
>            dat[, period] <- ave(dat[, period], dat[, id], FUN = seq_along)
>            dat[, event] <- 0
>            dat[idmax, event] <- reve},
>          level = {
>            tmp <- cbind(data[, c(period, id)], i = 1:nrow(data))
>            index <- as.vector(by(tmp, tmp[, id],
>                                  FUN = function(x) x[which.max(x[, period]), "i"]))
>            dat <- data[index, ]
>            dat[, event] <- as.integer(!dat[, event])
>          })
>
>   rownames(dat) <- NULL
>   return(dat)
> }
>
> tpp <- PLPP(data = df, id = "ID", period = "studyyrs",
>             event = "dead", direction = "period") tpp
>
>
>
> Pradip K. Muhuri,
> SAMHSA/CBHSQ
>
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


More information about the R-help mailing list