[R] Getting information encoded in a SAS, SPSS or Stata command file into R.

andrewH ahoerner at rprogress.org
Wed Nov 14 23:33:40 CET 2012


Dear Anthony – 

On closer examination, what I am talking about is not factor levels, but
something different (but analogous). The data that is categorical all has
integer codes, so the file is entirely numeric. The SAS proc format then
gives text strings for each code for each categorical variable. Like this:

value REGION_f
  11 = "New England Division"
  12 = "Middle Atlantic Division"
  21 = "East North Central Division" 
  22 = "West North Central Division"
  31 = "South Atlantic Division"
  32 = "East South Central Division"
  33 = "West South Central Division"
  41 = "Mountain Division"
  42 = "Pacific Division"
  97 = "State not identified"

So it would make sense to have a lookup table of these codes linked to the
variables. I’m not sure if it makes more sense to have that table live in R
or in the database. For R purposes, I imagine it would make sense to convert
these integer-valued variables into factors. 

What I do not understand is how SAS knows where the variables begin and end.
I managed to break off a little hunk of the beginning of my file and look at
it in an editor, and it is numbers without any obvious delimiters. Is the
delimiter a particular numeric string? I thought the SAS command file would
contain the starting location for each of the fixed-length fields, but I do
not see anything in the file that could be interpreted that way – just a
little wraparound code and then a long list of variable names followed by
triplets of a code, an equals sign, and a text string, terminating with a
semicolon. 

I’m sorry if I am being obtuse. When I said before that I had saved the SAS
files as flat files, what I really meant was that I had an intern do it.
When I was doing my own analysis, I mainly used TSP, before I switched to R
about a year ago. I’ve never used SAS. 

I find your data project very interesting.  Very.   It is not actually
necessary to wait for BLS to release the older CEX files, if you can lay
your hands on the CDs. I spoke to the BLS data products office about  2
years ago, and they have no problem with people republishing purchased data
in any format they like, including simple duplication.  In fact, they seemed
to like the idea.  I think the sale of data was forced on them by some kind
of mandate from above. 

I'll be playing with your code (which is a model of readability, and a
lesson to me on same, BTW) and keep you posted on my progress. 

Warmly, Andrew




--
View this message in context: http://r.789695.n4.nabble.com/Getting-information-encoded-in-a-SAS-SPSS-or-Stata-command-file-into-R-tp4649353p4649541.html
Sent from the R help mailing list archive at Nabble.com.




More information about the R-help mailing list