[R] Coding columns for survival analysis

jim holtman jholtman at gmail.com
Sat Apr 14 02:01:58 CEST 2012


try this:

> x <- read.table(text = "   tree live1 live2 live3 live4 live5
+    1 tree1     0     0     0     1     1
+    2 tree2     0     0     1     1     0
+    3 tree3     0     1     1     0     0
+    4 tree4     1     1     0     0     0
+    6 tree4     1     1     1     1     0  # another test condition
+    5 tree5     1     0     0     0     0", header = TRUE)
>
> # get matrix of data columns
> z <- as.matrix(x[, -1])
> # process each row
> a <- apply(z, 1, function(.row){
+     # determine where found (will be a 2)
+     found <- pmin(cumsum(.row) + 1, 3) # cannot be greater than 3
+     # determined where it died
+     die <- cumsum(diff(c(0, .row)) != 0)
+     # replace value at die == 2 with 4
+     found[die == 2] <- 4
+     c(NA, "found", "alive", "mort")[found]
+ })
> t(a)  # result
  [,1]    [,2]    [,3]    [,4]    [,5]
1 NA      NA      NA      "found" "alive"
2 NA      NA      "found" "alive" "mort"
3 NA      "found" "alive" "mort"  "mort"
4 "found" "alive" "mort"  "mort"  "mort"
6 "found" "alive" "alive" "alive" "mort"
5 "found" "mort"  "mort"  "mort"  "mort"
>


On Fri, Apr 13, 2012 at 4:53 PM, Alexander Shenkin <ashenkin at ufl.edu> wrote:
> Hello Folks,
>
> I have 5 columns for thousands of tree records that record whether that
> tree was alive or dead.  I want to recode the columns such that the cell
> reads "found" when a live tree is first observed, "alive" for when a
> tree is found alive and is not just found, and "mort" when it was
> previously alive but is now dead.
>
> Given the following:
>
>    > tree_live = data.frame(tree =
> c("tree1","tree2","tree3","tree4","tree5"), live1 = c(0,0,0,1,1), live2
> = c(0,0,1,1,0), live3 = c(0,1,1,0,0), live4 = c(1,1,0,0,0), live5 = c(1,
> 0, 0, 0, 0))
>
>       tree live1 live2 live3 live4 live5
>    1 tree1     0     0     0     1     1
>    2 tree2     0     0     1     1     0
>    3 tree3     0     1     1     0     0
>    4 tree4     1     1     0     0     0
>    5 tree5     1     0     0     0     0
>
> I would like to end up with the following:
>
>    > tree_live_recode
>
>      live1 live2 live3 live4 live5
>    1    NA    NA    NA found alive
>    2    NA    NA found alive  mort
>    3    NA found alive  mort     0
>    4 found alive  mort     0     0
>    5 found  mort     0     0     0
>
> I've accomplished the recode in the past, but only by going over the
> dataset multiple times in messy and inefficient fashion.  I'm wondering
> if there are concise and efficient ways of going about it?
>
> (I haven't been using the Survival package for my analyses, but I'm
> starting to look into it.)
>
> Thanks,
> Allie
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.



More information about the R-help mailing list