[Rd] foreign generates bad Stata data files (PR#13820)

peter.muhlberger at gmail.com peter.muhlberger at gmail.com
Sat Jul 11 22:05:11 CEST 2009


Full_Name: peter muhlberger
Version: 2.7.1
OS: Ubuntu x86_64 dual core
Submission from: (NULL) (70.238.206.13)


I've spent half a day generating .dta files using write.dta only to have them
crash my copy of Stata.  I eventually discovered that removing a string variable
with a maximum observed length of 280 characters allows Stata to read the file
without problems.  A Stata limit is that the length of a string variable cannot
exceed 244.  R gives no warning about this problem.  I assume it does not
abbreviate either.  The following code creates a .dta file that causes my copy
of Stata to suddenly disappear when I try to open the file:

x=c("XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX")
x=as.data.frame(x, stringsAsFactors=F)
names(x)[1]="x"
write.dta(x,  version = 10, file="/home/peterm/Desktop/x.dta")

That's the main problem I wanted to report.  There are several others you might
want to look into.  Above, leave out names(x)[1]="x" and take a look at what the
name of the variable is in the dataframe--it's a line of code.  Also, write.dta
is supposed to turn factors into variable labels in Stata, but I get no variable
labels in Stata (starting w/ data that has factors).  Finally, when I try to
read an spss dataset created by Sawtooth into R using read.spss, I get a
multitude of variables that aren't in the original dataset and have nothing in
them.


Below, my sessionInfo:

> sessionInfo()
R version 2.7.1 (2008-06-23) 
x86_64-pc-linux-gnu 

locale:
LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=en_US.UTF-8;LC_ADDRESS=en_US.UTF-8;LC_TELEPHONE=en_US.UTF-8;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] foreign_0.8-36 JGR_1.6-7      iplots_1.1-3   JavaGD_0.5-2   rJava_0.6-3   
boot_1.2-37

Hope this helps!

Cheers,
Peter



More information about the R-devel mailing list