[R] read.spss question warning compression bias

Marc Schwartz MSchwartz at medanalytics.com
Wed Dec 17 17:01:15 CET 2003


Greetings all,

In follow up to this thread (I am copying all participants), I want to
provide some additional data.

In review, Peter Flom the original poster, received the following
warning message when using read.spss() to import a .SAV format SPSS data
set into R:

Warning message: 
c:\NDRI\cvar\data\cvar2rev3.sav: Compression bias (0) is not the usual
value of 100. 

That warning message is generated in file sfm-read.c, which is a part of
the foreign package. The code in that file to read SPSS datasets was
provided by Ben Pfaff, who has authored an open source version of SPSS,
called PSPP (http://www.gnu.org/software/pspp/pspp.html).

The bias setting is part of the routine that transforms data byte codes
in compressed .SAV files. This value is stored in the SPSS data file
header along with a compression TRUE/FALSE flag. The bias setting is not
used in non-compressed .SAV files.

During offlist exchanges with Peter, he indicated that the SPSS data
file in question was created via the use of DBMS/Copy rather than via
SPSS itself.  In this case, a SAS dataset was converted into the SPSS
dataset via DBMS/Copy. Peter was then attempting to import the SPSS .SAV
file into R using read.spss().

For those unfamiliar, DBMS/Copy (http://www.dataflux.com/dbms/copy.asp)
is a file transformation application that can take input files from one
format and generate output files in alternate formats. There is at least
one other similar data mapping/transformation application that I am
familiar with called DataJunction
(http://pervasive.datajunction.com/djcosmos).

DBMS/Copy was originally published by a company called Conceptual, which
in 2002 sold the product to SAS, where it is now sold via Dataflux,
which is a SAS subsidiary.

Last week, I communicated with the Dataflux/SAS tech support folks to
try to pursue a better understanding of the etiology of the problem. It
turns out that the original author of DBMS/Copy is now employed at SAS
and was available to review this issue.

The bottom line is that in DBMS/Copy, the default is to generate a
non-compressed SPSS format file. Thus, the author's code sets the bias
value to 0 by default. In the case of a user generating a compressed
.SAV file, the bias setting is set to 100.

It is unclear at this time if this was a part of any formal SPSS
specification. However, from all available documentation, there is no
indication that the bias value can be otherwise adjusted by a user,
either directly or indirectly. Thus, to my knowledge at this point, it
can take only two values, 0 and 100. If accurate, it would seem to be
redundant to the compression TRUE/FALSE flag.

In the case of SPSS itself, the bias value of 100 is set by default,
whether the .SAV file is compressed or not. Therefore, if using
read.spss() on a .SAV file that was generated by SPSS natively, the
warning that Peter experienced would not be issued.

I hope that this information is of help to folks. With this confirmation
in hand, I would like to reiterate my suggestion to add a note to the
help for read.spss(), which could read as follows:


"NOTE: You may receive the following message:

 Warning message: 
 FileName: Compression bias (X) is not the usual value of 100.

Where 'FileName' will be the SPSS file that you are reading and 'X' will
be a numeric value, possibly 0. This may be the result of reading an
UNCOMPRESSED SPSS file that was not generated via SPSS natively (ie. via
a third party application such as DBMS/Copy). As the exact meaning of
this cannot be confirmed in all cases, it is recommended that you verify
the integrity of your imported SPSS data after using read.spss()."


As an aside, the Dataflux folks indicate that DBMS/Copy, at this time,
cannot read SPSS version 11 files. Thus it would seem that there has
been some change in the native .SAV file structure of unknown scope.
Presumably, this could have an impact on read.spss().

Best regards,

Marc Schwartz


P.S. to Thomas. It would seem worthy of consideration to forward this
information to Ben Pfaff. Let me know if you want me to do this or if
you would prefer otherwise.




More information about the R-help mailing list