[R] Discrepancy in the PBC data set

Terry Therneau therneau at mayo.edu
Mon Nov 24 14:39:49 CET 2008


  The data set in R is wrong. I've found mistakes on 2 lines in a quick look. 
  
  I don't know if the data is incorrect in the Appendix of Fleming and 
Harrington as well (someone seems to have borrowed my copy), which is where the 
data set appears to have been taken from, given all the "-9" codes in it. (Note, 
Tom Fleming originally got the data from me, so I'm fairly confident in calling 
my Mayo version the authoritative one).  I'll make sure this gets fixed.
  
  You can grab a correct data set from our department web page.  Code is below.
  
  	Terry Therneau
  	
  
pbcurl <- 
"http://mayoresearch.mayo.edu/mayo/research/biostat/upload/therneau_upload/pbc.d
at"

pbc <- read.table(pbcurl, header=F, 
                  col.names=c('id', 'time', 'status', 'trt',  'age', 'sex',
                              'ascites',  'hepato',  'spiders',  'edema',
                              'bili',  'chol',  'albumin',  'copper', 
                              'alk.phos',  'ast',  'trig',  'platelet',
                              'protime',  'stage'),
                  na.strings='.')
pbc$age <- pbc$age/365.25 

newfit <- coxph(Surv(time, status==2) ~ age + edema + log(bili) +
	log(protime) + log(albumin), data=pbc)

newfit
                coef exp(coef) se(coef)     z       p
age           0.0396    1.0404  0.00767  5.16 2.4e-07
edema         0.8963    2.4505  0.27141  3.30 9.6e-04
log(bili)     0.8636    2.3716  0.08294 10.41 0.0e+00
log(protime)  2.3868   10.8791  0.76851  3.11 1.9e-03
log(albumin) -2.5069    0.0815  0.65292 -3.84 1.2e-04

Likelihood ratio test=231  on 5 df, p=0  n=416 (2 observations deleted due to 
missingness)



More information about the R-help mailing list