[R] Getting information encoded in a SAS, SPSS or Stata command file into R.

andrewH ahoerner at rprogress.org
Wed Nov 14 07:33:37 CET 2012


Wow!  After reading Jan's post, I said "Great, I'll do that," because it was
the closest to what I originally had in mind. Then I read Ista's post, and
said "I think I'l try that first," because it got me back on the track of
following directions in the R Data Import/Export manual. Then I read
Anthony's post. Now, I am not so thrilled to go the database route, because
frankly have hardly ever used them before, and this would make an already
complex project take longer. 

But, I know that I will need to use the sample survey package for what I am
trying to do. So i think I am going to try to get the data into SQLite
format, and just hope the effort builds character.  Anthony, I have not used
your packages yet, but they look great!

It will probably be more than a week before i get all this worked out and
implemented. Given how much work this will be, I do not want to do it twice,
so I think I will go back to IPUMS and get the rest of the variables, and
break the file up into smaller chunks at the same time, both so I really
have the whole thing, and also so that it is easier to work with.   The
IPUMS version of the file is rectangular (it duplicates the household data
in each individual), and IPUMS has done a lot of valuable work in cleaning
the data and harmonizing variable names and definitions that have changed
over the history of the CPS. (Annoyingly, however, they have not connected
the cross-sections between years. All the CPS samples consist of two sets of
four consecutive months, eight months apart, so the March Supplement always
consist half of people who were interviewed in the last year and half of
people who will be interviewed in the next year (barring turnover)). 

Anyway, when I have figured out my route to import I will report back here.
In the meantime, I have three more questions that one of you may be able to
answer:
1.   Anthony, does the read.SAScii.sqlite function  preserve the label names
for factors in a data frame it imports into SQLite, when those labels are
coded in the command file? 
2.   If I want to make the resulting SQLite database available to the R
community, is there a good place for me to put it? Assume it is 10-20 gigs
in size.  Ideally, it would be set up so that it could be queried remotely
and extracts downloaded. Setting this up is beyond my competence today, but
maybe not in a couple of months.  (I'd like to do the same thing with the 30
years of Consumer Expenditure Survey data I have. I don't have access to SAS
any more, but I converted it all to flat flies while I still did. Currently
the BLS only makes 2011 microdata available free. Earlier years on cd are
$200/year. But they have told me that they have no objection to my making
them available). 
3. I have not yet been able to determine whether CPS micro data from the
period 1940-1961 exists. Does anyone know? It is not on
http://thedataweb.rm.census.gov/ftp/cps_ftp.html, and  IPUMS and NBER
(http://www.nber.org/data/current-population-survey-data.html)  both only
give data back to 1962. I wrote to Census a week ago, but I have not heard
back from them, and in the past they have not been very helpful about
historical micro data.

Thanks to all! Andrew




--
View this message in context: http://r.789695.n4.nabble.com/Getting-information-encoded-in-a-SAS-SPSS-or-Stata-command-file-into-R-tp4649353p4649466.html
Sent from the R help mailing list archive at Nabble.com.




More information about the R-help mailing list