[R] Getting information encoded in a SAS, SPSS or Stata command file into R.

Daniel Nordlund djnordlund at frontier.com
Thu Nov 15 02:52:05 CET 2012


> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
> On Behalf Of andrewH
> Sent: Wednesday, November 14, 2012 2:34 PM
> To: r-help at r-project.org
> Subject: Re: [R] Getting information encoded in a SAS, SPSS or Stata
> command file into R.
> 
> Dear Anthony –
> 
> On closer examination, what I am talking about is not factor levels, but
> something different (but analogous). The data that is categorical all has
> integer codes, so the file is entirely numeric. The SAS proc format then
> gives text strings for each code for each categorical variable. Like this:
> 
> value REGION_f
>   11 = "New England Division"
>   12 = "Middle Atlantic Division"
>   21 = "East North Central Division"
>   22 = "West North Central Division"
>   31 = "South Atlantic Division"
>   32 = "East South Central Division"
>   33 = "West South Central Division"
>   41 = "Mountain Division"
>   42 = "Pacific Division"
>   97 = "State not identified"
> 
> So it would make sense to have a lookup table of these codes linked to the
> variables. I’m not sure if it makes more sense to have that table live in
> R
> or in the database. For R purposes, I imagine it would make sense to
> convert
> these integer-valued variables into factors.
> 
> What I do not understand is how SAS knows where the variables begin and
> end.
> I managed to break off a little hunk of the beginning of my file and look
> at
> it in an editor, and it is numbers without any obvious delimiters. Is the
> delimiter a particular numeric string? I thought the SAS command file
> would
> contain the starting location for each of the fixed-length fields, but I
> do
> not see anything in the file that could be interpreted that way – just a
> little wraparound code and then a long list of variable names followed by
> triplets of a code, an equals sign, and a text string, terminating with a
> semicolon.
> 
> I’m sorry if I am being obtuse. When I said before that I had saved the
> SAS
> files as flat files, what I really meant was that I had an intern do it.
> When I was doing my own analysis, I mainly used TSP, before I switched to
> R
> about a year ago. I’ve never used SAS.
> 
> I find your data project very interesting.  Very.   It is not actually
> necessary to wait for BLS to release the older CEX files, if you can lay
> your hands on the CDs. I spoke to the BLS data products office about  2
> years ago, and they have no problem with people republishing purchased
> data
> in any format they like, including simple duplication.  In fact, they
> seemed
> to like the idea.  I think the sale of data was forced on them by some
> kind
> of mandate from above.
> 
> I'll be playing with your code (which is a model of readability, and a
> lesson to me on same, BTW) and keep you posted on my progress.
> 
> Warmly, Andrew
> 

Andrew,

R-help is not really the venue for discussing SAS programming and how the SAS data step reads fixed width files.  If you want to email me (off-list) the SAS program/script for reading the data, I would be willing to explain what it is doing.

Dan

Daniel Nordlund
Bothell, WA USA
 




More information about the R-help mailing list