[R] Getting codebook data into R

Douglas Bates bates at stat.wisc.edu
Thu Feb 9 22:47:45 CET 2012


On Thu, Feb 9, 2012 at 2:51 PM, barny <garyb.davies at btinternet.com> wrote:
> I've been trying to get some data from the National Survey for Family Growth
> into R - however, the data is in a .dat file and the data I need doesn't
> have any spaces or commas separating fields - rather you have to look into
> the codebook and what number of digits along the line the data you need is.
> The data I want are the following, where 1,12,int means that the data I'm
> interested starts in column 1 and finishes in column 12 and is an integer.
>
>            ('caseid', 1, 12, int),
>             ('nbrnaliv', 22, 22, int),
>            ('babysex', 56, 56, int),
>            ('birthwgt_lb', 57, 58, int),
>            ('birthwgt_oz', 59, 60, int),
>            ('prglength', 275, 276, int),
>            ('outcome', 277, 277, int),
>            ('birthord', 278, 279, int),
>            ('agepreg', 284, 287, int),
>            ('finalwgt', 423, 440, float)
>
> How can I do this using R? I've written a python programme which basically
> does it but it'd be nicer if I could skip the Python bit and just do it
> using R. Cheers for any help.

?read.fwf

You should realize that read.fwf is not overly smart about how it does
things.  You may want to consider readLines to read each line as a
text string and then use substring to pull out the fields.

It's amazing how these old habits of storing data like this persist.
The reason for fixed-format records was that you couldn't read free
format in a Fortran program in a standard way before Fortran-77.  And
35 years afterwards we are still jumping through hoops to read
fixed-format records.  Sigh.



More information about the R-help mailing list