[R] Reading in a table with unequal columns

Sundar Dorai-Raj sundar.dorai-raj at pdf.com
Tue Nov 15 16:44:37 CET 2005



Mike Jones wrote:
> Hi, 
> 
> Wasn't sure how to explain this problem succinctly in a title.  I am
> trying to read in a text file that looks like:
> 
> 0   1000  175  1  2  3
> 1   1000  58   0  2  9
> 2   1000  35   0  1  3 10
> 3   1000  300  0  2  4  5  10  11  18
> 4   1000  150  3  5  6
> 5   1000  100 3  4  6  7  18
> 6   1000   50  4  5  7  8
> 7   1000  155  5  6  8  19
> 8   1000  255  6  7 19
> 9   1000  200  1 10 12
> 10  1000  52   2  3  9  11  12  13
> 11  1000  70  3  10 14 15  16  17  18  19
> 12  1000  250 9  10 13
> 13  1000  40 10 12 14
> 14  1000  235 11 13 15
> 15  1000  127 11 14 16 17
> 16  1000  177 11 15 17
> 17  1000  358 11 15 16
> 18  1000  296 3  5  11  19
> 19  1000  120 7  8  11  18
> 
> The problem with this is that the 12th row (row with 11 in the first
> column) doesn't get read in correctly.  To read into R, I'm using a
> command like:
> 
> matrix(unlist(read.table(datafile, sep="",fill=T)),
>              ncol=max(count.fields(datafile, sep="")),byrow=F)
> 
> but that gives
> 
>       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11]
>  [1,]    0   19 1000  358   11   14   15   NA   NA    NA    18
>  [2,]    1 1000 1000  296   11   15   16   NA   NA    NA    NA
>  [3,]    2 1000  175  120    3   15   17   17   NA    NA    NA
>  [4,]    3 1000   58    1    7    5   16   NA   NA    NA    NA
>  [5,]    4 1000   35    0    2    8   11   NA   NA    NA    NA
>  [6,]    5 1000  300    0    2    3   11   19   NA    NA    NA
>  [7,]    6 1000  150    0    1    9   NA   18   NA    NA    NA
>  [8,]    7 1000  100    3    2    3   NA   NA   NA    NA    NA
>  [9,]    8 1000   50    3    5    4   10   NA   NA    NA    NA
> [10,]    9 1000  155    4    4    6    5   NA   NA    NA    NA
> [11,]   10 1000  255    5    5    6   NA   10   NA    NA     0
> [12,]   11 1000  200    6    6    7    7   NA   11    NA     1
> [13,]   19 1000   52    1    7    8    8   18   NA    18     2
> [14,]   12   NA   70    2   10   19   19   NA   NA    NA     3
> [15,]   13 1000   NA    3    3   12   NA   NA   NA    NA     4
> [16,]   14 1000  250   NA   10    9   NA   NA   NA    NA     5
> [17,]   15 1000   40    9   NA   14   11   NA   NA    NA     6
> [18,]   16 1000  235   10   10   NA   15   12   NA    NA     7
> [19,]   17 1000  127   11   12   13   NA   16   13    NA     8
> [20,]   18 1000  177   11   13   14   NA   NA   17    NA     9
> 
> I've tried other things, but this is as close as I've been able to get
> and I'm at a loss at this point.  Any input would be
> helpful...thanks...mj
> 


There are two ways that I know of to get around this. I'm sure there are 
others:

## read in the file to determine the max number of columns
x <- scan("file.txt", what = "", sep = "\n")
x <- strsplit(x, "[ \t]+") # split string by white space
max.col <- max(sapply(x, length))

## option 1
## specify col.names as ?read.table suggests
cn <- paste("V", 1:max.col, sep = "")
z1 <- read.table("file.txt", fill = TRUE, col.names = cn)

## option 2
## parse `x' yourself and construct a matrix
z2 <- t(sapply(x, function(i) {
   n <- length(i)
   y <- rep(NA, max.col)
   y[1:n] <- as.numeric(i)
   y
}))




More information about the R-help mailing list