[R] Read variable column width data

Duncan Murdoch murdoch.duncan at gmail.com
Mon Aug 15 17:54:22 CEST 2011


On 15/08/2011 11:47 AM, R Saba wrote:
> Reading data with variable column widths.
> Here are several lines of a txt data set I would like to read.
> The number of variables is fixed at 13 . The problem is how to read the
> first variable when it can contain blank space-- for example " Alabama
> (Seasonally Adjusted)" , "St. Clair", etc.

I assume those commas are thousands separators.

I'd do it in several steps:
1.  Use readLines to read the data into a character variable, without 
parsing the lines.
2.  Remove all the commas.
3.  Replace the last 12 spaces with some unique separator (e.g. a comma, 
now they're all gone).

This step is the hardest.  There's likely a regular expression that does 
that; another way to do it would be to replace the last space, 12 
times.  The regular expression for the last space, with the rest of the 
line matched as well, is " ([^ ]*)$".  So this should work:

for (i in 1:12)
   sub(" ([^ ]*)$", ",\\1", lines)

4.  Now read the text strings using read.csv() or whatever.

Duncan Murdoch

> Alabama (Seasonally Adjusted) 2,168,870 2,162,604 2,122,787 1,954,895
> 1,956,026 1,925,007 213,975 206,578 197,780 9.9% 9.6% 9.3%
> Alabama (Not Seasonally Adjusted) 2,185,690 2,155,322 2,135,467 1,955,512
> 1,951,696 1,930,257 230,178 203,626 205,210 10.5% 9.4% 9.6%
> Autauga 24,743 24,472 24,234 22,355 22,373 22,394 2,388 2,099 1,840 9.7%
> 8.6% 7.6%
> Baldwin 86,185 84,039 83,698 78,160 76,934 76,736 8,025 7,105 6,962 9.3%
> 8.5% 8.3%
> Barbour 9,954 9,706 9,737 8,611 8,546 8,588 1,343 1,160 1,149 13.5% 12.0%
> 11.8%
> ......
> St. Clair 36,821 36,139 35,964 33,233 33,021 32,540 3,588 3,118 3,424 9.7%
> 8.6% 9.5%
> .......
> Winston 9,150 8,986 9,295 7,779 7,717 7,933 1,371 1,269 1,362 15.0% 14.1%
> 14.7%
> United States (Seasonally Adjusted) 153,421,000 153,693,000 153,684,000
> 139,334,000 139,779,000 139,092,000 14,087,000 13,914,000 14,593,000 9.2%
> 9.1% 9.5%
> United States (Not Seasonally Adj.) 154,538,000 153,449,000 154,767,000
> 140,129,000 140,028,000 139,882,000 14,409,000 13,421,000 14,885,000 9.3%
> 8.7% 9.6%
>
> Thanks,
> Richard Saba
>
> --
> View this message in context: http://r.789695.n4.nabble.com/Read-variable-column-width-data-tp3744922p3744922.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list