[R] stacking imported data

Gabor Grothendieck ggrothendieck at myway.com
Tue Nov 2 02:31:27 CET 2004


Sundar Dorai-Raj <sundar.dorai-raj <at> pdf.com> writes:

: 
: Hi all,
:    I have a question that I don't have a good answer for (note the word 
: "good"; I have an answer, but I consider it not "good"). Take the 
: following data in a single tab-delimited text file:
: 
: <text>
: 
: A
: Labels	Value	SE	2.5%	97.5%
: R90	0.231787	1.148044	0.035074	1.531779
: R0	0.500861	0.604406	0.185336	1.353552
: 
: B
: Labels	Value	SE	2.5%	97.5%
: (Intercept)	1.367514	0.036431	1.287975	1.451964
: </text>
: 
: (Note: the <text> tags are not present and are added here only to show 
: blank lines.)
: 
: I would like to read the data into a single data.frame which looks like
: 
: Labels	Value	SE	2.5%	97.5%
: A.R90	0.231787	1.148044	0.035074	1.531779
: A.R0	0.500861	0.604406	0.185336	1.353552
: B.(Intercept)	1.367514	0.036431	1.287975	1.451964
: 
: A few rules:
: 
: 1. the number of rows in "A" and "B" will vary from 1 to ???. Here "A" 
: has 1 row (excluding header) and B has 2 rows (excluding header).
: 2. the number of columns in "A" and "B" will always be the same.
: 4. the headers for "A" and "B" will always be the same.
: 3. there is always an empty line at the beginning of the file and in 
: between "A" and "B".
: 

Read the lines into vector z, one line per element.

Define a grouping variable, g, which is 1 for the lines
starting at the first blank line and 2 for the lines
starting at the 2nd.  Define a function f which accepts such
a group of lines and creates the appropriate data frame from
them.  tapply the lines, grouped by g, and bind the rows of
the data frame produced from each group together into one
large data frame.  

z <- readLines("file.dat")

g <- cumsum(nchar(z) == 0)
f <- function(x) {
	x[-(1:3)] <- paste(trim(x[2]), x[-(1:3)], sep = ".")
	read.table(textConnection(x[-(1:2)]), header = TRUE)
}
do.call("rbind", tapply(z, cumsum(nchar(z) == 0), f))


Note: if the blank lines or the A and B lines contain
whitespace trim this off first.  That is, insert these
two lines after the readLines statement:

trim <- function(x) gsub("^[[:space:]]+|[[:space:]]+$", "", x)
z <- trim(z)




More information about the R-help mailing list