[R] Ever see a stata import problem like this?

Paul Johnson pauljohn at ku.edu
Wed Sep 22 00:34:32 CEST 2004


Greetings Everybody:

I generated a 1.2MB dta file based on the general social survey with 
Stata8 for linux. The file can be re-opened with Stata, but when I bring 
it into R, it says all the values are missing for most of the variables.

This dataset is called "morgen.dta" and I dropped a copy online in case 
you are interested

http://www.ku.edu/~pauljohn/R/morgen.dta

looks like this to R (I tried various options on the read.dta command):

 > myDat <- read.dta("morgen.dta")
 > summary(myDat)
      CASEID              year            id         hrs1         hrs2
Min.   :   19721   Min.   :1972   Min.   :   1   NAP :    0   NAP :    0
1st Qu.: 1983475   1st Qu.:1978   1st Qu.: 445   DK  :    0   DK  :    0
Median : 1996808   Median :1987   Median : 905   NA  :    0   NA  :    0
Mean   : 9963040   Mean   :1986   Mean   : 990   NA's:40933   NA's:40933
  3rd Qu.:19872187   3rd Qu.:1994   3rd Qu.:1358
  Max.   :20002817   Max.   :2000   Max.   :3247

       prestige      agewed        age          educ        paeduc
  DK,NA,NAP:    0   NAP :    0   DK  :    0   NAP :    0   NAP :    0
  NA's     :40933   DK  :    0   NA  :    0   DK  :    0   DK  :    0
                    NA  :    0   NA's:40933   NA  :    0   NA  :    0
                    NA's:40933                NA's:40933   NA's:40933



   maeduc       speduc                 income
  NAP :    0   NAP :    0   $25000 OR MORE:14525
  DK  :    0   DK  :    0   $10000 - 14999: 5022
  NA  :    0   NA  :    0   $15000 - 19999: 3869
  NA's:40933   NA's:40933   $20000 - 24999: 3664
                            REFUSED       : 1877
                            (Other)       : 8523
                            NA's          : 3453
 >


Here's what Stata sees when I load the same thing:

summarize, detail

                  Case identification number
-------------------------------------------------------------
       Percentiles      Smallest
  1%       197432          19721
  5%       199649          19722
10%      1974116          19723       Obs               40933
25%      1983475          19724       Sum of Wgt.       40933

50%      1996808                      Mean            9963040
                         Largest       Std. Dev.       9006352
75%     1.99e+07       2.00e+07
90%     2.00e+07       2.00e+07       Variance       8.11e+13
95%     2.00e+07       2.00e+07       Skewness         .18931
99%     2.00e+07       2.00e+07       Kurtosis       1.045409

                 GSS YEAR FOR THIS RESPONDENT
-------------------------------------------------------------
       Percentiles      Smallest
  1%         1972           1972
  5%         1973           1972
10%         1974           1972       Obs               40933
25%         1978           1972       Sum of Wgt.       40933

50%         1987                      Mean           1986.421
                         Largest       Std. Dev.       8.61136
75%         1994           2000
90%         1998           2000       Variance       74.15552
95%         2000           2000       Skewness      -.0789223
99%         2000           2000       Kurtosis       1.799939

                     RESPONDENT ID NUMBER
-------------------------------------------------------------
       Percentiles      Smallest
  1%           18              1
  5%           89              1
10%          178              1       Obs               40933
25%          445              1       Sum of Wgt.       40933

50%          905                      Mean           989.9129
                         Largest       Std. Dev.      689.0596
75%         1358           3244
90%         2027           3245       Variance       474803.2
95%         2437           3246       Skewness       .8359211
99%         2867           3247       Kurtosis       3.311248

               NUMBER OF HOURS WORKED LAST WEEK
-------------------------------------------------------------
       Percentiles      Smallest
  1%            6              0
  5%           15              0
10%           21              0       Obs               23279
25%           37              0       Sum of Wgt.       23279

50%           40                      Mean           41.05206
                         Largest       Std. Dev.      13.95931
75%           48             89
90%           60             89       Variance       194.8624
95%           65             89       Skewness        .195045
99%           82             89       Kurtosis       4.448998

              NUMBER OF HOURS USUALLY WORK A WEEK
-------------------------------------------------------------
       Percentiles      Smallest
  1%            4              0
  5%           15              0
10%           20              1       Obs                 774
25%           38              2       Sum of Wgt.         774

50%           40                      Mean           39.79199
                         Largest       Std. Dev.      13.43383
75%           45             89
90%           55             89       Variance       180.4677
95%           60             89       Skewness      -.0002332
99%           80             89       Kurtosis       5.009869

            RS OCCUPATIONAL PRESTIGE SCORE  (1970)
-------------------------------------------------------------
       Percentiles      Smallest
  1%           14             12
  5%           17             12
10%           20             12       Obs               24267
25%           30             12       Sum of Wgt.       24267

50%           39                      Mean           39.35645
                         Largest       Std. Dev.      14.03712
75%           48             82
90%           60             82       Variance       197.0407
95%           62             82       Skewness       .2927414
99%           76             82       Kurtosis       2.775553

                    AGE WHEN FIRST MARRIED
-------------------------------------------------------------
       Percentiles      Smallest
  1%           15             12
  5%           17             12
10%           17             12       Obs               25382
25%           19             12       Sum of Wgt.       25382

50%           21                      Mean           22.09609
                         Largest       Std. Dev.      4.813944
75%           24             63
90%           28             68       Variance       23.17405
95%           31             73       Skewness       2.002265
99%           39             73       Kurtosis       11.28279

                       AGE OF RESPONDENT
-------------------------------------------------------------
       Percentiles      Smallest
  1%           19             18
  5%           21             18
10%           24             18       Obs               40790
25%           30             18       Sum of Wgt.       40790

50%           42                      Mean           45.14798
                         Largest       Std. Dev.      17.53519
75%           58             89
90%           71             89       Variance       307.4828
95%           77             89       Skewness       .4774907
99%           86             89       Kurtosis       2.239618

               HIGHEST YEAR OF SCHOOL COMPLETED
-------------------------------------------------------------
       Percentiles      Smallest
  1%            3              0
  5%            7              0
10%            8              0       Obs               40806
25%           11              0       Sum of Wgt.       40806

50%           12                      Mean           12.48152
                         Largest       Std. Dev.      3.176226
75%           14             20
90%           16             20       Variance       10.08841
95%           18             20       Skewness      -.3389303
99%           20             20       Kurtosis       3.960311

             HIGHEST YEAR SCHOOL COMPLETED, FATHER
-------------------------------------------------------------
       Percentiles      Smallest
  1%            0              0
  5%            3              0
10%            4              0       Obs               29347
25%            8              0       Sum of Wgt.       29347

50%           11                      Mean           10.20994
                         Largest       Std. Dev.      4.342143
75%           12             20
90%           16             20       Variance       18.85421
95%           17             20       Skewness      -.1628909
99%           20             20       Kurtosis       2.826482

             HIGHEST YEAR SCHOOL COMPLETED, MOTHER
-------------------------------------------------------------
       Percentiles      Smallest
  1%            0              0
  5%            3              0
10%            6              0       Obs               34151
25%            8              0       Sum of Wgt.       34151

50%           12                      Mean           10.41478
                         Largest       Std. Dev.      3.709352
75%           12             20
90%           14             20       Variance       13.75929
95%           16             20       Skewness      -.6324499
99%           18             20       Kurtosis       3.605715

             HIGHEST YEAR SCHOOL COMPLETED, SPOUSE
-------------------------------------------------------------
       Percentiles      Smallest
  1%            4              0
  5%            7              0
10%            8              0       Obs               22780
25%           12              0       Sum of Wgt.       22780

50%           12                      Mean           12.53095
                         Largest       Std. Dev.      3.103418
75%           14             20
90%           16             20       Variance       9.631203
95%           18             20       Skewness       -.287755
99%           20             20       Kurtosis       4.051822

                      TOTAL FAMILY INCOME
-------------------------------------------------------------
       Percentiles      Smallest
  1%            1              1
  5%            3              1
10%            5              1       Obs               37480
25%            9              1       Sum of Wgt.       37480

50%           11                      Mean            9.75619
                         Largest       Std. Dev.      2.994967
75%           12             13
90%           12             13       Variance       8.969825
95%           13             13       Skewness       -1.29205
99%           13             13       Kurtosis       3.759778

.


-- 
Paul E. Johnson                       email: pauljohn at ku.edu
Dept. of Political Science            http://lark.cc.ku.edu/~pauljohn
1541 Lilac Lane, Rm 504
University of Kansas                  Office: (785) 864-9086
Lawrence, Kansas 66044-3177           FAX: (785) 864-5700




More information about the R-help mailing list