[Rd] read.xport and lookup.xport in foreign (PR#2385)

fharrell@virginia.edu fharrell@virginia.edu
Fri Dec 20 18:48:02 2002


Under
            
platform i686-pc-linux-gnu
arch     i686             
os       linux-gnu        
system   i686, linux-gnu  
status                    
major    1                
minor    6.1              
year     2002             
month    11               
day      01               
language R                

and using foreign 0.5-8 I am encountering errors when using read.xport.  Here's code for producing SAS transport files for testing:

libname x SASV5XPT "test.xpt";
libname y SASV5XPT "test2.xpt";
PROC FORMAT; VALUE race 1=green 2=blue 3=purple; RUN;
PROC FORMAT CNTLOUT=format;RUN;
data test;
LENGTH race 3 age 4;
age=30; label age="Age at Beginning of Study";
race=2;
d1='3mar2002'd ;
dt1='3mar2002 9:31:02'dt;
t1='11:13:45't;
output;

age=31;
race=4;
d1='3jun2002'd ;
dt1='3jun2002 9:42:07'dt;
t1='11:14:13't;
output;
format d1 mmddyy10. dt1 datetime. t1 time. race race.;
run;
PROC COPY IN=work OUT=x;SELECT test;RUN;
PROC COPY IN=work OUT=y;SELECT test format;RUN;

SAS output:

NOTE: Copying WORK.TEST to X.TEST (memtype=DATA).
NOTE: There were 2 observations read from the data set WORK.TEST.
NOTE: The data set X.TEST has 2 observations and 5 variables.
NOTE: PROCEDURE COPY used:
      real time           1.52 seconds
      cpu time            0.04 seconds
      
NOTE: Copying WORK.TEST to Y.TEST (memtype=DATA).
NOTE: There were 2 observations read from the data set WORK.TEST.
NOTE: The data set Y.TEST has 2 observations and 5 variables.
NOTE: Copying WORK.FORMAT to Y.FORMAT (memtype=DATA).
NOTE: There were 3 observations read from the data set WORK.FORMAT.
NOTE: The data set Y.FORMAT has 3 observations and 21 variables.
NOTE: PROCEDURE COPY used:

R results:

> library(foreign)
> read.xport('test.xpt')
      RACE      AGE    D1        DT1    T1
1 2.000063 30.00000 15402 1330767062 40425
2 4.000063 31.00000 15494 1338716527 40453

Note the corruption of RACE (a variable having a SAS length of 3 bytes).

> read.xport('test2.xpt')
            RACE           AGE            D1           DT1            T1
1   2.000063e+00  3.000000e+01  1.540200e+04  1.330767e+09  4.042500e+04
2   4.000063e+00  3.100000e+01  1.549400e+04  1.338717e+09  4.045300e+04
3   3.687825e-40  3.687825e-40  3.687825e-40  3.687896e-40  5.962240e+20
...
124 3.835229e-93  6.434447e-86            NA  3.687825e-40  3.687825e-40

Note corrupted data when trying to read a SAS transport file containing more than one SAS dataset.  According to the documentation, read.xport is supposed to work in this case and is supposed to return a list of data frames.

> names(lookup.xport('test2.xpt'))
[1] "TEST"

Note the inclusion of only one of the 2 datasets.


Also I would greatly benefit from having lookup.xport return all of the SAS variable attributes, especially variable label and format name.  I could then write a little function for the community that makes read.xport as comprehensive as read.spss in terms of creating factor variables and variable labels, if the user exports the PROC CONTENTS CNTLOUT= dataset.

Thanks.
-- 
Frank E Harrell Jr              Prof. of Biostatistics & Statistics
Div. of Biostatistics & Epidem. Dept. of Health Evaluation Sciences
U. Virginia School of Medicine  http://hesweb1.med.virginia.edu/biostat