[R] read.spss question warning compression bias

Mulholland, Tom Tom.Mulholland at health.wa.gov.au
Mon Dec 15 01:50:51 CET 2003


>So it would appear that if the above is correct, there is no user adjustment to the bias value.
>The only scenario that I can envision is if the user SAVE's the ".sav" file in an uncompressed
>format, where the bias value **might** be set to 0.

>Perhaps a r-help reader with access to current SPSS manuals can confirm the above.


The windows version 11.5.0 appears the same (I assume the negative sign on -99 was somehow dropped)

COMPRESSED and UNCOMPRESSED Subcommands

COMPRESSED saves the file in compressed form. UNCOMPRESSED saves the file in uncom-pressed form.
In a compressed file, small integers (from −99 to 155) are stored in one byteinstead of the
eight bytes used in an uncompressed file. 

The only specification is the keyword COMPRESSED or UNCOMPRESSED. There are noadditional specifications. 

Compressed data files occupy less disk space than do uncompressed data files.

Compressed data files take longer to read than do uncompressed data files.

The GET command, which reads SPSS-format data files, does not need to specify whetherthe files it reads are compressed or uncompressed.

Only one of the subcommands COMPRESSED or UNCOMPRESSED can be specified perSAVE command. COMPRESSED is usually the default, though UNCOMPRESSED may bethe default on some systems.

Ciao, Tom

_________________________________________________
 
Tom Mulholland
Senior Policy Officer
WA Country Health Service
Tel: (08) 9222 4062
 
The contents of this e-mail transmission are confidential and may be protected by professional privilege. The contents are intended only for the named recipients of this e-mail. If you are not the intended recipient, you are hereby notified that any use, reproduction, disclosure or distribution of the information contained in this e-mail is prohibited. Please notify the sender immediately.


-----Original Message-----
From: Marc Schwartz [mailto:MSchwartz at medanalytics.com] 
Sent: Friday, 12 December 2003 3:56 AM
To: Thomas Lumley
Cc: r-help at stat.math.ethz.ch
Subject: Re: [R] read.spss question warning compression bias


On Thu, 2003-12-11 at 12:32, Thomas Lumley wrote:
> On Thu, 11 Dec 2003, Marc Schwartz wrote:
> >
> > An additional question might be, if the file is not compressed, what 
> > is the default bias value set by SPSS? If it is 0, then the check is 
> > meaningless. On the other hand, if the default value is 100, whether 
> > or not the file is compressed, then the warning message would serve 
> > a purpose in flagging the possibility of other issues. Reasonably, 
> > that setting may be SPSS version specific.
> >
> 
> I think the issue is that the format is not documented, so the author 
> of the code (Ben Pfaff) didn't know what a change in the value would 
> imply. If the file is apparently read correctly it seems that it 
> doesn't imply anything.
> 
> 	-thomas



Thanks for the clarification Thomas.

I did some searching of the PSPP site and found the following:

http://www.gnu.org/software/pspp/manual/pspp_18.html#SEC170

The compression bias is defined as:

flt64 bias;
        Compression bias. Always set to 100. The significance of this
        value is that only numbers between (1 - bias) and (251 - bias)
        can be compressed.
        

So it would seem to potentially impact aspects of the file compression data structure, when compression is used.

I am not sure if the "Always set to 100" is unique to PSPP in how Ben elected to do things. Presumably if that is always the case, even with SPSS, one might reasonably wonder: why have it, if it does not vary?

It leaves things unclear as to under what circumstances this value would change. 

I did some Googling and found the following text snippet from a presumably dated SPSS manual for the syntax of the SAVE command:


SAVE OUTFILE=file 

[/VERSION={3**}] {2 } 

[/UNSELECTED=[{RETAIN}] {DELETE} 

[/KEEP={ALL** }] [/DROP=varlist] {varlist} 

[/RENAME=(old varlist=new varlist)...] 

[/MAP] 

[/{COMPRESSED }] {UNCOMPRESSED} 

**Default if the subcommand is omitted.


COMPRESSED and UNCOMPRESSED Subcommands 

COMPRESSED saves the file in compressed form. UNCOMPRESSED saves the file in uncompressed form. In a compressed file, small integers (from 
99 to 155) are stored in one byte instead of the eight bytes used in an uncompressed file.

The only specification is the keyword COMPRESSED or UNCOMPRESSED. There are no additional specifications. 

Compressed data files occupy less disk space than do uncompressed data files. 

Compressed data files take longer to read than do uncompressed data files. 

The GET command, which reads SPSS-format data files, does not need to specify whether the files it reads are compressed or uncompressed. 

Only one of the subcommands COMPRESSED or UNCOMPRESSED can be specified per SAVE command. COMPRESSED is usually the default, though UNCOMPRESSED may be the default on some systems.




So it would appear that if the above is correct, there is no user adjustment to the bias value. The only scenario that I can envision is if the user SAVE's the ".sav" file in an uncompressed format, where the bias value **might** be set to 0.

Perhaps a r-help reader with access to current SPSS manuals can confirm the above.

Until demonstrated otherwise, it seems reasonable to leave the warning message in place as a warning (as opposed to an error), though it might be helpful to folks to add a comment to the read.spss help file on this for clarification. The text might read:

"NOTE: You may receive the following message:

 Warning message: 
 FileName: Compression bias (X) is not the usual value of 100.

Where 'FileName' will be the file that you are reading and 'X' will be a numeric value, possibly 0. This *may* be the result of reading an UNCOMPRESSED SPSS file. It is recommended that you verify the integrity of your imported SPSS data after using read.spss() if you receive this warning."


The wording is subject to change and of course, the integrity check should be done under any circumstances... :-)

HTH,

Marc Schwartz

______________________________________________
R-help at stat.math.ethz.ch mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help




More information about the R-help mailing list