[R] Help parsing from .txt

Wed Oct 23 06:50:47 CEST 2013

Hi,
You may try:
?list.files()
nm1 <- list.files(pattern=".txt")

res <- lapply(nm1,function(x) {
                                ln1 <- readLines(x)
                                 indx1 <- grep("DATE PROCESSED",ln1)
                                 indx2 <- grep("[A-Z]",ln1)
                                 ln2 <- if(max(indx2)==indx1) ln1[1:length(ln1)] else ln1[1:(indx2[match(indx1,indx2)+1]-1)]
                                 ln2 <- ln2[ln2!=""]
                                 indx3 <- grepl("[A-Z]",ln2)
                                 indx4 <- cumsum(c(TRUE,diff(which(!indx3))>1))
                                mat1 <- do.call(cbind, split(ln2[!indx3],indx4))
                                 colnames(mat1) <-  ln2[indx3][-1]
                                 write.table(mat1,paste0(ln2[indx3][1],".txt"),row.names=FALSE,quote=FALSE,sep="\t")})

A.K.

I have a number of .txt files (1,200) from which I need to parse a 
number of pieces of information.  The files are read into R as such: 

TITLE 
EXAMPLE 
example 1 
example 2 
RELATED TITLE 
related title 1 
DATE PROCESSED 
06/12/2011 

Some of the files have examples 1-4, others 1-12 and beyond.   

How can I create a script that will grab the information from 
the different .txt files, put it in a matrix, and spit it out in a .csv 
file with appropriately named columns (the column titles are in CAPS 
above, where the information that will in the column is lower case). 

Thanks in advance.