[R] Reading a tab delimted file of varying length using read.table

Rolf Turner r.turner at auckland.ac.nz
Mon Jan 18 00:01:27 CET 2016


On 18/01/16 10:48, Uwe Ligges wrote:
> This is not a tab delimited file (as you apparently assume given the
> code), but a fixed width format, hence I'd try:
>
> url <- "http://data.princeton.edu/wws509/datasets/divorce.dat"
> widths <- c(9, 13, 10, 8, 10, 6)
> f5 <- read.fwf(url, widths = widths, skip = 1, strip.white = TRUE)
>
> names(f5) <- as.character(unlist(read.fwf(url, widths = widths,
> strip.white=TRUE, n=1)))
>
> Not sure why reading it simply with header=TRUE des not work, but no
> time to investiagte this now.

Dear Uwe,

I have fiddled around a bit and the situation seems to me to be of the 
nature of a bug in read.fwf.  It would seem that in order for 
header=TRUE to work, the entries of the header need to be separated by
the sep delimiter which defaults to "\t".  In the case in question the 
entries are separated by blanks, so presumably the header gets read in 
as a single entity, rather than 6 such, leading to a mismatch between 
the length of the header and the number of columns.

It seems that the specified widths get ignored when the header line is 
dealt with.

It also seems that if one specifies sep="" then the header gets read 
correctly but then strings of blanks get interpreted as field separators 
throughout and then blanks within the fields result in the
wrong number of columns.

I think that the code of read.fwf is easy enough to fix; a slight 
adjustment will make the header get treated the same way as the body of 
the file.

I don't see any problems/drawbacks with so-doing, and experimenting with 
my modified function resulted in the divorce data being read in with 
header=TRUE with no problems.

If this mod is made, I see no reason to keep the "sep" argument in 
read.fwf --- except maybe for backward compatibility issues, and I don't 
think there would be any since it never worked properly anyhow.

cheers,

Rolf

P. S. I can send you my modified version of read.fwf off-list if this 
would be of any use to you.

R.

-- 
Technical Editor ANZJS
Department of Statistics
University of Auckland
Phone: +64-9-373-7599 ext. 88276



More information about the R-help mailing list