[R] Help parsing from .txt

arun smartpink111 at yahoo.com
Wed Oct 23 06:50:47 CEST 2013


Hi,
You may try:
?list.files()
nm1 <- list.files(pattern=".txt")

res <- lapply(nm1,function(x) {
                                ln1 <- readLines(x)
                                 indx1 <- grep("DATE PROCESSED",ln1)
                                 indx2 <- grep("[A-Z]",ln1)
                                 ln2 <- if(max(indx2)==indx1) ln1[1:length(ln1)] else ln1[1:(indx2[match(indx1,indx2)+1]-1)]
                                 ln2 <- ln2[ln2!=""]
                                 indx3 <- grepl("[A-Z]",ln2)
                                 indx4 <- cumsum(c(TRUE,diff(which(!indx3))>1))
                                mat1 <- do.call(cbind, split(ln2[!indx3],indx4))
                                 colnames(mat1) <-  ln2[indx3][-1]
                                 write.table(mat1,paste0(ln2[indx3][1],".txt"),row.names=FALSE,quote=FALSE,sep="\t")})



A.K.


I have a number of .txt files (1,200) from which I need to parse a 
number of pieces of information.  The files are read into R as such: 

TITLE 
EXAMPLE 
example 1 
example 2 
RELATED TITLE 
related title 1 
DATE PROCESSED 
06/12/2011 

Some of the files have examples 1-4, others 1-12 and beyond.   

How can I create a script that will grab the information from 
the different .txt files, put it in a matrix, and spit it out in a .csv 
file with appropriately named columns (the column titles are in CAPS 
above, where the information that will in the column is lower case). 

Thanks in advance.



More information about the R-help mailing list