[R] Green and Byar (1980) Prostate Cancer Data set from Andrewsand Herzberg - Data

Ravi Varadhan RVaradhan at jhmi.edu
Tue Mar 24 23:12:16 CET 2009


 
Rolf,

I was able to basically reproduce your problems.  Also, when I open the
".xls" file, with Excel, I got an error message "file error: data may have
been lost".  When I saved the file as .csv and got it into R, I found that
the data set only has 502 records, but the original dataset of Andrews and
Herzberg (from statlib) has 506 records.  May be this could be related to
the error about "data being lost". 

Of course, I don't know what the real "original" data set is?  I am
increasingly finding it frustrating to reproduce the reported results in
journal articles because the data sets and their sources are sloppily
documented.

Ravi.


----------------------------------------------------------------------------
-------

Ravi Varadhan, Ph.D.

Assistant Professor, The Center on Aging and Health

Division of Geriatric Medicine and Gerontology 

Johns Hopkins University

Ph: (410) 502-2619

Fax: (410) 614-9625

Email: rvaradhan at jhmi.edu

Webpage:  http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html

 

----------------------------------------------------------------------------
--------


-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On
Behalf Of Rolf Turner
Sent: Tuesday, March 24, 2009 5:50 PM
To: R-help Forum
Subject: Re: [R] Green and Byar (1980) Prostate Cancer Data set from
Andrewsand Herzberg - Data


On 25/03/2009, at 10:04 AM, Frank E Harrell Jr wrote:

> Ravi Varadhan wrote:
>> Hi,
>>
>> I am looking for a data set containing the information from a 
>> randomized trial evaluating the effect of DES (diethylsilbestrol) on 
>> multiple time-to-event endpoints, prostate cancer, CVD, and other 
>> causes.  The original source of this data is Green and Byar (1980).  
>> This is a popular competing risks problem that has subsequently been 
>> discussed in a number of statistical papers including Kay (1986).
>>
>> Does anyone have a digital version of this data set?
>>
>> This data is also presented in Andrews, D. F. and Herzberg, A. M.  
>> (1985). Data.   Does a digital version of all the data sets in A &  
>> H exist?
>>
>> Thanks very much,
>> Ravi.
>
> An R binary dataset is at http://biostat.mc.vanderbilt.edu/Datasets
>
> Note that there is something strange about the AP variable with a lot 
> of ties at some value near 1.0.  I have never been able to find any 
> documentation about this problem.  If you find any please let me know.

Out of idle curiosity I went to have a look at this data set.

I had problems.

(1) The given URL didn't work for me; when I clicked on it, I got an error
404.
But if I went to http://biostat.mc.vanderbilt.edu I found a link to
``Datasets'', and clicking on that got me to some data sets.

(2) Scrolling down to ``Byar and Green prostate cancer data''  
appeared to get
me to the right place.  But I couldn't see any signs of any ``R binary
files''.

The available formats appear to be *.sav (SPSS?), *.sdd (???), and *.xls.

(3) I downloaded the prostate.xls file O.K.  But when I tried to read it in
with the read.xls() function from the gdata package, I got an error to the
effect

 > X <- read.xls("prostate.xls")
Converting xls file to csv file... Done.
Reading csv file... Error in read.table(file = file, header = header, sep =
sep, quote = quote,  :
   no lines available in input

I was able to ``open'' the prostate.xls file with the version of Excel
available on my Mac, save it as a *.csv file, and then read *that* in with
read.csv()

What am I missing?  *Are* there ``R binary'' files lurking about that I am
somehow not seeing?  Why won't read.xls() work on this data set?

	cheers,

		Rolf Turner

######################################################################
Attention:\ This e-mail message is privileged and confid...{{dropped:9}}

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list