[R] Possible bug in foreign library import of Stata datasets

(Ted Harding) Ted.Harding at nessie.mcc.ac.uk
Wed Apr 28 10:14:43 CEST 2004


On 28-Apr-04 Paul Johnson wrote:
> The negative valued observations get mixed up in R:
> 
>  > library(foreign)
>  > dat2 <- read.dta("table2.dta")
>  > table(deml)
> deml
>     0    1    2    3    4    5    6    7    8    9   10  246  247
>    94  103  169  108  404  634  154  281  923  258 2352  826 3829
>   248  249  250 251  252  253  254  255
>   2161 6847  541 451  152  306  145  252
> 
> The read.dta has translated the negative values as (256-deml).
> 
> Is this the kind of thing that is a bug, or have I missed something in 
> the documentation about the handling of negative numbers?  Should a 
> formal bug report be filed?

This observation suggests a fairly clear diagnostic: the original
negative numbers (tabulated as "-10.00" etc) are coming through
as what C would call "signed char" -- positive for N=0 to 127,
negative (N-256) for N=128 to 255, but are being interpreted as
positive integers in (0,255). An unusual though feasible type.

The question is where this is occurring. The Stata tabulation
represents them as apparent reals; but the storage in the .dta file
may be 1-byte for economy of space. If so, then whether or not this
is a bug in read.dta may depend on whether the .dta file includes a
"flag" for such 1-byte data that they really are intended to represent
signed values (and possibly on whether there is a further flag for
real versus integer types). If not, then 1-byte data will not be
distinguishable from unsigned short integers, and read.dta can
hardly be blamed for getting the wrong impression.

Since I'm not familiar with Stata data file formats, I can't
comment further!

Ted.




More information about the R-help mailing list