[R] stacking imported data

Gabor Grothendieck ggrothendieck at myway.com
Tue Nov 2 02:41:34 CET 2004


Gabor Grothendieck <ggrothendieck <at> myway.com> writes:

: 
: Sundar Dorai-Raj <sundar.dorai-raj <at> pdf.com> writes:
: 
: : 
: : Hi all,
: :    I have a question that I don't have a good answer for (note the word 
: : "good"; I have an answer, but I consider it not "good"). Take the 
: : following data in a single tab-delimited text file:
: : 
: : <text>
: : 
: : A
: : Labels	Value	SE	2.5%	97.5%
: : R90	0.231787	1.148044	0.035074	1.531779
: : R0	0.500861	0.604406	0.185336	1.353552
: : 
: : B
: : Labels	Value	SE	2.5%	97.5%
: : (Intercept)	1.367514	0.036431	1.287975	1.451964
: : </text>
: : 
: : (Note: the <text> tags are not present and are added here only to show 
: : blank lines.)
: : 
: : I would like to read the data into a single data.frame which looks like
: : 
: : Labels	Value	SE	2.5%	97.5%
: : A.R90	0.231787	1.148044	0.035074	1.531779
: : A.R0	0.500861	0.604406	0.185336	1.353552
: : B.(Intercept)	1.367514	0.036431	1.287975
	1.451964
: : 
: : A few rules:
: : 
: : 1. the number of rows in "A" and "B" will vary from 1 to ???. Here "A" 
: : has 1 row (excluding header) and B has 2 rows (excluding header).
: : 2. the number of columns in "A" and "B" will always be the same.
: : 4. the headers for "A" and "B" will always be the same.
: : 3. there is always an empty line at the beginning of the file and in 
: : between "A" and "B".
: : 
: 
: Read the lines into vector z, one line per element.
: 
: Define a grouping variable, g, which is 1 for the lines
: starting at the first blank line and 2 for the lines
: starting at the 2nd.  Define a function f which accepts such
: a group of lines and creates the appropriate data frame from
: them.  tapply the lines, grouped by g, and bind the rows of
: the data frame produced from each group together into one
: large data frame.  
: 
: z <- readLines("file.dat")
: 
: g <- cumsum(nchar(z) == 0)
: f <- function(x) {
: 	x[-(1:3)] <- paste(trim(x[2]), x[-(1:3)], sep = ".")
: 	read.table(textConnection(x[-(1:2)]), header = TRUE)
: }
: do.call("rbind", tapply(z, cumsum(nchar(z) == 0), f))

A correction:

 z <- readLines("file.dat")
 
 g <- cumsum(nchar(z) == 0)
 f <- function(x) {
 	x[-(1:3)] <- paste(x[2], x[-(1:3)], sep = ".")
 	read.table(textConnection(x[-(1:2)]), header = TRUE)
 }
 do.call("rbind", tapply(z, cumsum(nchar(z) == 0), f))


: 
: Note: if the blank lines or the A and B lines contain
: whitespace trim this off first.  That is, insert these
: two lines after the readLines statement:
: 
: trim <- function(x) gsub("^[[:space:]]+|[[:space:]]+$", "", x)
: z <- trim(z)
: 
: ______________________________________________
: R-help <at> stat.math.ethz.ch mailing list
: https://stat.ethz.ch/mailman/listinfo/r-help
: PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
: 
:




More information about the R-help mailing list