[Rd] Problem with read.xport() from foreigh package (PR#7389)

Prof Brian Ripley ripley at stats.ox.ac.uk
Thu Dec 9 15:42:35 CET 2004


Have you looked at the latest version of foreign, 0.8-2?  The issue has 
already been resolved, AFAIK.

On Thu, 9 Dec 2004, Werner Engl wrote:

> Dear R-devel list,
>
> This is to confirm Prof. Ripley's analysis of the
> read.xport issue.
>
> The section on missing data in TS140 is pertinent
> to numeric variables only. In SAS, character
> variables are of fixed length (between 1 and 200
> for the xport format). Shorter strings are padded
> with trailing blanks when assigned to a variable.
>
> An uninitialized character variable is stored as
> all blanks in the xport format file. This is the
> only representation of 'missing' data for SAS
> character variables. 'Special missing' codes
> (.A to .Z and ._) are available for numeric
> variables only.
>
> Please find enclosed a patch to the
> R-2.0.1/src/library/Recommended/foreign/SASxport.c
> file and a xport file that I used for testing. The
> xport file was created by SAS V8.2 on Linux, but
> should be plattform and version independent (except
> for the header information). I have simply commented
> out the code lines that try to detect missing character
> values.
>
> The code in SASxport.c already does a good job in
> removing trailing blanks from character values.
> For missing character data (all blanks) the result
> is the empty string (""), which is fine for me.
> There is no equivalent to the R missing character
> representation in SAS (as far as I know).
>
> The enclosed gzipped tar file contains:
>
> diff_SASxport_c.txt	diff for SASxport.c
> xptchar1.xpt	test file in xport format
> xptchar.sas	trivial SAS program used to
> 	generate xptchar1.xpt
> xptchar_SAS_System_Viewer9_1.csv	xptchar1.xpt
> 	converted to comma separated file using SAS
> 	System Viewer 9.1 (on Win XP)
>
> With the patch applied, read.xport produces the same
> data frame from xptchar1.xpt as read.csv does from
> xptchar_SAS_System_Viewer9_1.csv (tested on i386 Linux
> with R Version 2.0.1) except that read.csv converts empty
> strings to NAs. As explained above, the empty string is
> closer to the meaning of an all-blanks value in SAS.
>
> There is renewed interest in this old data format in
> the pharmaceutical industry, because the US Food and
> Drug Administration requests clinical and
> pre-clinical data to be submitted in this format. I
> spent some time analyzing the xport file format to
> be sure of what is actually submitted to FDA with
> these files.
>
> Thank you for considering this patch (and for the
> great R system, of course)!
>
>
> Best regards,
>
> Werner Engl
>
>
>
> _____________________________________
> Werner Engl, PhD, CStat
> Senior Manager, Biostatistics
> Baxter AG, Vienna, Austria
> e-mail: werner_engl at baxter.com
> --- Please disregard any text below this line ---
>
> -- 
>
> GMX DSL-Netzanschluss + Tarif zum supergünstigen Komplett-Preis!

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595


More information about the R-devel mailing list