[R] Reading text file with fortran format

Nordlund, Dan (DSHS/RDA) NordlDJ at dshs.wa.gov
Wed Oct 1 00:18:32 CEST 2014


> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> project.org] On Behalf Of Steven Yen
> Sent: Tuesday, September 30, 2014 2:04 PM
> To: r-help
> Subject: [R] Reading text file with fortran format
> 
> Hello
> 
> I read data with fortran format:
> mydata<-read.fortran('foo.txt',
>                       c("4F10.4","F8.3","3F3.0","20F2.0"))
> colnames(mydata)<-c("q1","q2","q3","q4","income","hhsize",
>   "weekend","dietk","quart1","quart2","quart3","male","age35",
>   "age50","age65","midwest","south","west","nonmetro",
>   "suburb","black","asian","other","hispan","hhtype1",
>   "hhtype2","hhtype3","emp_stat")
> dstat(mydata,digits=6)
> 
> I produced the following sample statistics for the first 4
> variables (q1,q2,q3,q4):
> 
>               Mean  Std.dev Min       Max  Obs
> q1       0.000923 0.002509   0  0.035245 5649
> q2       0.000698 0.001681   0  0.038330 5649
> q3       0.000766 0.002138   0  0.040100 5649
> q4       0.000373 0.001140   0  0.026374 5649
> 
> The correct sample statistics are:
> Variable|       Mean       Std.Dev.     Minimum      Maximum
> --------+----------------------------------------------------
>        Q1|     9.227632     25.09311          0.0     352.4508
>        Q2|     6.983078     16.80984          0.0     383.2995
>        Q3|     7.657381     21.38337          0.0     400.9950
>        Q4|     3.727952     11.40446          0.0     263.7398
>    INCOME|     16.01603     13.70296          0.0        100.0
>    HHSIZE|     2.586475     1.464282          1.0         16.0
> 
> In other words, values for q1-q4 were scaled down by a factor of
> 10,000.
> My raw data look like (with proper format)
> 
>      0.0000    0.0000    0.0000    0.0000  48.108...
>      0.0000    0.0000    0.0000    0.0000  11.640...
>     35.3450    0.0000   95.7656    0.0000   4.667...
>      0.0000    0.0000    0.0000    0.0000   9.000...
>     84.0000    4.8038    0.0000    3.1886   2.923...
>      0.0000    0.0000    0.0000    1.1636  10.000...
>      0.0000   10.7818  109.7884    0.0000  17.000...
>      0.0000    7.9528    0.0000    4.7829  35.000...
> 
> True that the data here are space delimited. But I need to read data
> elsewhere where data are not space delimited.
> 
> Any idea/suggestion would be appreciated.
> 

The read.fortran function appears to work differently from how FORTRAN would read the data if there are already decimals points in the numbers.  If memory serves, FORTRAN ignores the decimal portion of the format if it finds a decimal in what it reads.  The read.fortran function appears to read the number 'as is' and then multiplies by 10^-d, where d is the number of decimal places in the format.  Since you have decimals specified, you should specify the format with 0 decimal places, i.e.

c("4F10.0","F8.0","3F3.0","20F2.0"))


hope this is helpful,

Dan


Daniel J. Nordlund, PhD
Research and Data Analysis Division
Services & Enterprise Support Administration
Washington State Department of Social and Health Services




More information about the R-help mailing list