[R] Green and Byar (1980) Prostate Cancer Data set from Andrews and Herzberg - Data

Ravi Varadhan rvaradhan at jhmi.edu
Wed Mar 25 04:51:51 CET 2009


Fine detective work, David.  Now, you can see the reasons for my frustration - multiplicity of data sets combined with non-existent documentation of the source of data in journal articles (e.g. Kay 1986; Lunn and McNeil 1995).    

Best,
Ravi.

____________________________________________________________________

Ravi Varadhan, Ph.D.
Assistant Professor,
Division of Geriatric Medicine and Gerontology
School of Medicine
Johns Hopkins University

Ph. (410) 502-2619
email: rvaradhan at jhmi.edu


----- Original Message -----
From: David Winsemius <dwinsemius at comcast.net>
Date: Tuesday, March 24, 2009 10:54 pm
Subject: Re: [R] Green and Byar (1980) Prostate Cancer Data set from Andrews and Herzberg - Data
To: Rolf Turner <r.turner at auckland.ac.nz>
Cc: R-help Forum <r-help at r-project.org>, Ravi Varadhan <rvaradhan at jhmi.edu>


>  On Mar 24, 2009, at 8:57 PM, Rolf Turner wrote:
>  
>  >
>  > On 25/03/2009, at 12:09 PM, Frank E Harrell Jr wrote:
>  >
>  > 	<snip>
>  >
>  >>> (2) Scrolling down to ``Byar and Green prostate cancer data''  
>  >>> appeared
>  >>> to get
>  >>> me to the right place.  But I couldn't see any signs of any ``R  
> 
>  >>> binary
>  >>> files''.
>  >>
>  >> Please look again.  It's under the heading "R".  Unfortunately I used
>  >> .sav suffix for save() files in the old days.
>  >
>  > 	Ah-ha.  Oh me of little faith.  I have been hanging around (in
>  > 	my current work environment) with too many SPSS users, and the
>  > 	*.sav extension seems to be the standard for SPSS data files.
>  > 	Whence my corrupted thinking.
>  >
>  >> The .xls fine opened with no problem in OpenOffice; has 506 rows.
>  >
>  > 	Hmmm.  When I opened it with Excel on the Mac I got a spread
>  > 	sheet with 503 rows --- the first row being the column names,
>  > 	so there were really 502 rows.
>  
>  The last "patnr" is "506" but there are only 502 lines of data. 471,  
> 
>  473, 475 and 488 are missing.
>  
>  And the CMU Statlib version for 2002 looks the same.
>  
>  
>  The version at this site is missing more than 25 cases:
>  
>  
>  Here are two other copies of the dataset the first of which appears 
> to  
>  have those missing cases:
>  This one has patient numbers:
>  
>  
>  This one has a description of the fields and cites the one above but  
> 
>  has not retained the patient numbers and has apparently only kept the 
>  
>  475 cases with complete data.
>  
>  
>  
>  >
>  
>  David Winsemius, MD
>  Heritage Laboratories
>  West Hartford, CT
>  
>  ______________________________________________
>  R-help at r-project.org mailing list
>  
>  PLEASE do read the posting guide 
>  and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list