[R] using filter while Reading files -

jim holtman jholtman at gmail.com
Thu Sep 17 02:15:17 CEST 2009


Here is one way to create a list of dataframe with the names in TABLE:

> x <- readLines('/tempxx.txt')  # read in your data
> # assume that 'x' was read in with readLines
> input <- textConnection(x)
> # find the "TABLE" lines and use as the names of the dataframes to read
> indx <- c(grep("^TABLE", x), length(x) + 1)  # add index for end of data
> indx.diff <- diff(indx)  # sizes of each section
> # assume first line is a TABLE
> result <- list()  # initialize output list of dataframes
> for (i in seq(length(indx.diff))){
+ df.name <- readLines(input, n=1)  # read in the name
+ result[[df.name]] <- read.table(input, header=TRUE, nrows=indx.diff[i] - 2,
+         colClasses=rep('numeric', 6))
+ }
> close(input)
> result
$`TABLE NO. 1: Gold `
  R1     T1     T2      T3     T4      T5
1  0 36.800 1410.0 4940.00 23.300 49.0000
2 43 37.787 2462.6 4442.27 23.139 48.4272
3 -1 36.787 1462.6 4442.27 23.139 48.4271

$`TABLE NO. 2: Silver `
  R1     T1     T2      T3     T4      T5
1  0 36.800 1416.6 4540.00 28.900 49.0000
2 56 36.787 5462.6 4942.27 24.239 48.4272
3 -1 86.787 9462.6 4942.27 23.139 48.4271


On Wed, Sep 16, 2009 at 6:30 PM, Santosh <santosh2005 at gmail.com> wrote:
> Hi R'sians
> As the experts here suggested, I am using "scan" and "readLines" to read
> text files. I notice that read.table takes a long time read and process
> character vectors of 30000+ rows.
>
> How do I separate out the columns in the resulting character vector? The
> function "read.fwf" appears to be a bit cumbersome to use as number of
> columns in the text files is not constant, and some preprocessing to obtain
> number of columns is needed.
>
> Would really appreciate your ideas!!
>
> Below is the embedded data from the attached text file
> "TABLE NO. 1: Gold"
> " R1           T1           T2           T3           T4
> T5          "
> "            0  3.68000E+01  1.41000E+03  4.94000E+03  2.33000E+01
> 4.90000E+01"
> "           43  3.77870E+01  2.46260E+03  4.44227E+03  2.31390E+01
> 4.84272E+01"
> "           -1  3.67870E+01  1.46260E+03  4.44227E+03  2.31390E+01
> 4.84271E+01"
> "TABLE NO. 2: Silver"
> " R1           T1           T2           T3           T4
> T5          "
> "            0  3.68000E+01  1.41660E+03  4.54000E+03  2.89000E+01
> 4.90000E+01"
> "           56  3.67870E+01  5.46260E+03  4.94227E+03  2.42390E+01
> 4.84272E+01"
> "           -1  8.67870E+01  9.46260E+03  4.94227E+03  2.31390E+01
> 4.84271E+01"
>
> Thanks,
> Santosh
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?




More information about the R-help mailing list