[R] read.table with different row lengths

Marc Schwartz marc_schwartz at comcast.net
Wed Dec 10 20:33:32 CET 2008


on 12/10/2008 12:50 PM Chris Poliquin wrote:
> Hi,
> 
> I need to read in a series of text files with a time series on each
> row.  The series are of different lengths and I'd like to just use the
> first row as the length and have R ignore extra values in rows that go
> over this length.
> 
> For example:
> 
> 1 0 3 4 5
> 1 3 5 6 8 7 7
> 2 1 1 1 4 7 7 7
> 
> So the 7s would be ignored and I would have a 5x3 matrix.  I tried
> creating a series of colClasses with NULLs for the extra values by using
> max(count.fields(file)) - min(count.fields(file)) but this didn't work
> and would be too time consuming for lots of files.
> 
> fill=T doesn't seem to be working either.  When I use fill=T I get extra
> rows for some reason in the table.  R doesn't seem to just be appending
> NAs to the end of the short rows.
> 
> Any way to accomplish this?
> 
> - Chris

Not sure why you had issues with 'fill = TRUE'.

Presuming that you do not know 'a priori' the resultant matrix size, you
could do something like the following.

Essentially, use read.table() to get the following initial result,
filling in the short rows, converting the 7's to NA values:

DF <- read.table("clipboard", fill = TRUE, na.strings = 7)

> DF
  V1 V2 V3 V4 V5 V6 V7 V8
1  1  0  3  4  5 NA NA NA
2  1  3  5  6  8 NA NA NA
3  2  1  1  1  4 NA NA NA

We can then use complete.cases() on the transposed data frame to get the
indices of the columns that have NAs:

> complete.cases(t(DF))
[1]  TRUE  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE

Thus:

> DF[, complete.cases(t(DF))]
  V1 V2 V3 V4 V5
1  1  0  3  4  5
2  1  3  5  6  8
3  2  1  1  1  4


HTH,

Marc Schwartz



More information about the R-help mailing list