[R] Reading sas7bdat files directly

Roger DeAngelis(xlr82sas) rdeangel at amgen.com
Tue Mar 2 01:38:08 CET 2010


Hi All,

 The hack below might help R users get going with Chris's DSREAD. I have not
had a chance to look at  Monday's version of DSREAD, can't wait.

Note Duncan Murdoch was most gracious to supply me with a R function to
translate floats in 16 char hex to R floats.

Your utility solves the 200 byte, 8 char name and potential precision 
errors with other methods of transfering SAS datasets to perl and R. 
Thanks. 

Importing SAS datasets(sas7bdat) into R 
(32 bit windows 2000, 32 bit SAS 9.2 and 
32 bit R version 2.9.0 (2009-04-17) 

 Here is what I want to accomplish, the double floats below show data 
from SAS to R. 
 They are exactly the same in R and SAS memory, bit for bit. 


  R Internal         SAS Internal 
  16 Byte Float      16 byte Float 


3FFAAAAAAAAAAAAB  3FFAAAAAAAAAAAAB 
4002AAAAAAAAAAAB  4002AAAAAAAAAAAB 
400D555555555555  400D555555555555 
3FF6666666666666  3FF6666666666666 
3FFCCCCCCCCCCCCD  3FFCCCCCCCCCCCCD 
400199999999999A  400199999999999A 
4004CCCCCCCCCCCD  4004CCCCCCCCCCCD 
3FF4924924924925  3FF4924924924925 
3FF9249249249249  3FF9249249249249 
3FFDB6DB6DB6DB6E  3FFDB6DB6DB6DB6E 
4001249249249249  4001249249249249 
3FF2E8BA2E8BA2E9  3FF2E8BA2E8BA2E9 
3FF5D1745D1745D1  3FF5D1745D1745D1 
3FF8BA2E8BA2E8BA  3FF8BA2E8BA2E8BA 
3FFBA2E8BA2E8BA3  3FFBA2E8BA2E8BA3 
3FF2762762762762  3FF2762762762762 
3FF4EC4EC4EC4EC5  3FF4EC4EC4EC4EC5 
3FF7627627627627  3FF7627627627627 
3FF9D89D89D89D8A  3FF9D89D89D89D8A 
1.7976931348623E  1.7976931348623E 
0010000000000000  0010000000000000 


I don't believe this high accuracy transfer is possible  with any 
other method except ODBC, 
but SAS ODBC is unsatisfactory for me. If you use CSV with the maximum 
assured decimal 
precision(15 significant digits?). The CSV decimal numbers will only 
approximate the double floats. 

I consider the Csv to be corrupt if the relative of absolute 
difference using the decimal 
Csv numbers and the memory floats is greater than 10^-12.  There are 
two sources of error first 
the SAS floats are decimally rounded and converted to decimal then the 
rounded decimal 
approximations are  converted into R floats. 


Status of     R Internal            CSV 
Csv           16 Byte Float 


Csv corrupt 3FFAAAAAAAAAAAAB   1.66666666666667    >10^-12 different 
Csv corrupt 4002AAAAAAAAAAAB   2.33333333333333 
Csv corrupt 400D555555555555   3.66666666666667 
Csv OK      3FF6666666666666   1.4 
Csv OK      3FFCCCCCCCCCCCCD   1.8 
Csv OK      400199999999999A   2.2 
Csv OK      4004CCCCCCCCCCCD   2.6 
Csv corrupt 3FF4924924924925   1.28571428571429 
Csv corrupt 3FF9249249249249   1.57142857142857 
Csv corrupt 3FFDB6DB6DB6DB6E   1.85714285714286 
Csv corrupt 4001249249249249   2.14285714285714 
Csv corrupt 3FF2E8BA2E8BA2E9   1.18181818181818 
Csv corrupt 3FF5D1745D1745D1   1.36363636363636 
Csv corrupt 3FF8BA2E8BA2E8BA   1.54545454545455 
Csv corrupt 3FFBA2E8BA2E8BA3   1.72727272727273 
Csv corrupt 3FF2762762762762   1.15384615384615 
Csv corrupt 3FF4EC4EC4EC4EC5   1.30769230769231 
Csv corrupt 3FF7627627627627   1.46153846153846 
Csv corrupt 3FF9D89D89D89D8A   1.61538461538462 
Csv corrupt 1.7976931348623E   1.7976931348623E+308 
Csv corrupt 0010000000000000   2.2250738585072E-308 


Bacground 


  1. Provide absolutely loss less transfer 
     of character(max 32756 bytes per character variable)  and numeric 
data from SAS to R 
     Since SAS has only two datatypes so this code should be 
exhaustive. 


  2. This code is useful because: 
     a. The SAS ODBC driver requires the user to not only have 
        SAS but the user must bring up a SAS session and 
        the session has to be closed manually. (SAS issue not a 
foreign issue) 
     b. The foreign package also requires interaction with SAS. (SAS 
issue) 
     c. SASxport only supports 8 character SAS names and a max of 
        200 byte character values. (This is a SAS issue not a SASxport 
issue) 
     d. SASxport creates floating point doubles that have an 8 bit 
exponent 
        and 56 bit mantissa while IEEE is 11 bit exponent and 53 bit 
mantissa 
        (sometimes defined slightly differently depending of where you 
consider 
        the sign bits). This results is the loss of some very small 
and 
        very large numbers. ( SAS issue not a SASxport issue) 


  3. How this code overcomes the issues above for import only. 


     You need the dsread exec in the previous mesage. Also the input 
SAS dataset must have 
16 byte character representations for the floats. I am working with 
the developer to see what we 
can do about this.. 
He will make it an option on the invocation to do the hex conversion 
for numerics. 


Here is the R code run inside a SAS datastep. Actually I can interact 
with the output of the R code 
in the same dataqstep. It is also possible to run perl, SAS procs and 
other SAS languages in the same datastep. 
Note the input pipe, no physical CSV file is produced). 


If there is interest I can provide the code that executes R. 


data _null_; 
  length pgm $1250; 
  pgm=compbl(" 
  library (SASxport); 
  library (foreign); 
  hexdigits <- function(s) {; 
      digits <- 0:15; 
      names(digits) <- c(0:9, LETTERS[1:6]); 
      digits[strsplit(s, '')[[1]]]; 
  }; 
  bytes <- function(s) {; 
      digits <- matrix(hexdigits(s), ncol=2, byrow=TRUE); 
      digits; 
      as.raw(digits %*% c(16,1)); 
  }; 
  todouble <- function(bytes) {; 
      con <- rawConnection(bytes); 
      val <- readBin(con, 'double', endian='big'); 
      close(con); 
      val; 
  }; 
  x <-c(1:21); 
  rc<-c(1:21); 
  ln<-c(1:21); 
  z<-read.table(pipe('C:\\tip\\dsread.exe -v C:\\tip\ 
\fix.sas7bdat'),header=TRUE,sep=',',colClasses='character'); 
  st<-z$STR; 
  lin<-z$LIN; 
  d<-as.numeric(z$DECIMAL_REPRESENTATION); 
  h<-as.character(z$HEXIDECIMAL_REPRESENTATION); 
  for ( i in 1:21 ) {; 
    x[i]  <- todouble(bytes(h[i])); 
    rc[i] <- if (((abs( x[i] - d[i] )       > 1E-12 )) || ; 
             (abs((x[i] - d[i])/x[i] ) > 1E-12 )) 0 else 1; 
    ln[i] <- nchar(st[i], type = 'bytes'); 
  }; 
  R_ntrnl    <-h ; 
  SASntrnl   <-h ; 
  R_deciml   <-sprintf('%.14e',x); 
  SAS_deciml <-sprintf('%.14e',x); 
  Csv_stmat  <-z$DECIMAL_UNTOUCHED; 
  Corrupt    <-rc; 
  datfrm     <- 
data.frame(R_ntrnl ,SASntrnl ,R_deciml ,SAS_deciml ,Csv_stmat
,Corrupt,ln,lin); 
  write.xport(datfrm,file='C:\\utl\ 
\datfrm.xpt',autogen.formats=FALSE); 
  "); 
  call rxeq(pgm); 
  call getxpt('datfrm'); 
run; 


SAS code to create fix.sas7bdat 


options xsync xwait;run; 
%let fac=1000; 
data "c:\tip\fix.sas7bdat"(drop=prime nonprime byt); 
  retain byt 0  str; 
  length str $%eval(&fac * 32); 
  do prime=3,5,7,11,13; 
    do nonprime=2,4,6,8; 
      byt+&fac; 
      str=repeat(byte(64+byt/&fac),byt); 
      decimal_representation    =nonprime/prime+1; 
      hexidecimal_representation=put(decimal_representation,hex16.); 
      decimal_untouched         =cats(put(round(decimal_representation, 
1e-14),best32.)); 
      lin=length(str); 
      if decimal_representation ne 3 then output; 
    end; 
  end; 
  decimal_representation    =constant('big'); 
  hexidecimal_representation=put(constant('big'),e20.); 
  decimal_untouched         =cats(put(decimal_representation,e20.)); 
  str=repeat('@',%eval(&fac * 30)); 
  lin=length(str); 
  output; 
  decimal_representation    =constant('small'); 
  hexidecimal_representation=put(constant('small'),hex16.); 
  decimal_untouched         =cats(put(decimal_representation,e20.)); 
  str=repeat('@',%eval(&fac * 32)); 
  lin=length(str); 
  output; 
  format _numeric_ e20.; 
run; 



-- 
View this message in context: http://n4.nabble.com/Reading-sas7bdat-files-directly-tp1469515p1574458.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list