[R] Gobbling up a repeating, irregular list of data

MacQueen, Don macqueen1 at llnl.gov
Fri Nov 11 16:58:44 CET 2016


Like Peter, I too will assume that all the white space consists of space
characters, not tabs.

In that case, I would probably start with read.fwf().
I would expect that to get me a data frame with lots of NA in the first
four columns. Then (also like Peter says) you'll have to figure out how to
fill the empty cells.

By the way, I wouldn't worry too much about using "bad form." If it works,
would be reasonably easy for someone else looking at your code to
understand
(or for you to understand 5 years from now), and runs fast enough,
that's good enough. But I do appreciate the satisfaction of doing
something "the R way."


Here's another way:

dat <- scan(textConnection("  1    1.00E+00  1.24E+03  7.79E+00  1.925E-01
 1.88E-01
                                     3.850E-01  1.88E-01
                                     5.775E-01  1.88E-01
                                     7.700E-01  1.88E-01
                                     9.626E-01  1.88E-01
                                     1.155E+00  1.88E-01
                                     1.347E+00  1.88E-01
  1    2.00E+00  1.26E+03  7.80E+00  1.925E-01  2.80E-01
                                     1.732E+00  2.80E-01
                                     1.925E+00  2.80E-01
                                     2.310E+00  2.93E-01
                                     2.502E+00  2.22E-01
                                     2.695E+00  1.88E-01
                                     2.887E+00  1.88E-01
  1    3.00E+00  1.28E+03  7.70E+00  1.925E-01  1.03E-01
                                     3.850E-01  1.30E-01
                                     5.775E-01  1.48E-01
                                     7.701E-01  1.61E-01
                                     9.626E-01  1.72E-01
                                     1.155E+00  1.86E-01
                                     1.347E+00  1.93E-01
  1    4.00E+00  1.29E+03  7.60E+00  1.901E-01  1.80E-01
                                     3.803E-01  1.80E-01
                                     5.705E-01  1.38E-01
                                     7.607E-01  1.32E-01
                                     2.282E+00  1.86E-01
                                     2.472E+00  1.98E-01
                                     2.662E+00  2.00E-01"),
  what=list(0,0,0,0,0,0),fill=TRUE
  )
datf <- do.call(cbind, dat)

Then in datf you just have to move the first 2 columns over to be the last
two, in rows where there are missing values, and then fill in the missing
values in the first four columns from the non-missing values above them.



-Don

-- 
Don MacQueen

Lawrence Livermore National Laboratory
7000 East Ave., L-627
Livermore, CA 94550
925-423-1062





On 11/10/16, 8:26 PM, "R-help on behalf of Morway, Eric"
<r-help-bounces at r-project.org on behalf of emorway at usgs.gov> wrote:

>What would be the sophisticated R method for reading the data shown below
>into a list?  The data is output from a numerical model.  Pasting the
>second block of example R commands (at the end of the message) results in
>a
>failure ("Error in scan...line 2 did not have 6 elements").  I no doubt
>could cobble together some script for reading line-by-line using for
>loops,
>and then appending vectors with values from each line, but this strikes me
>as bad form.
>
>One final note, the lines with 6 values contain important values that
>should somehow remain associated with the data appearing in columns 5 & 6
>(the continuous data).  The first value, which is always 1, can be
>discarded, but the second value on these lines contain the time step
>number
>("1.00E+00", "2.00E+00", etc.), the 3rd and 4th values are contain a depth
>and thickness, respectively. Columns 5 & 6 are a depth and water content
>pairing and should be associated with the time steps.
>
>Thanks, Eric
>
>Start of example output data (Use of an R script to read in this data
>below)
>
>  1    1.00E+00  1.24E+03  7.79E+00  1.925E-01  1.88E-01
>                                     3.850E-01  1.88E-01
>                                     5.775E-01  1.88E-01
>                                     7.700E-01  1.88E-01
>                                     9.626E-01  1.88E-01
>                                     1.155E+00  1.88E-01
>                                     1.347E+00  1.88E-01
>  1    2.00E+00  1.26E+03  7.80E+00  1.925E-01  2.80E-01
>                                     1.732E+00  2.80E-01
>                                     1.925E+00  2.80E-01
>                                     2.310E+00  2.93E-01
>                                     2.502E+00  2.22E-01
>                                     2.695E+00  1.88E-01
>                                     2.887E+00  1.88E-01
>  1    3.00E+00  1.28E+03  7.70E+00  1.925E-01  1.03E-01
>                                     3.850E-01  1.30E-01
>                                     5.775E-01  1.48E-01
>                                     7.701E-01  1.61E-01
>                                     9.626E-01  1.72E-01
>                                     1.155E+00  1.86E-01
>                                     1.347E+00  1.93E-01
>  1    4.00E+00  1.29E+03  7.60E+00  1.901E-01  1.80E-01
>                                     3.803E-01  1.80E-01
>                                     5.705E-01  1.38E-01
>                                     7.607E-01  1.32E-01
>                                     2.282E+00  1.86E-01
>                                     2.472E+00  1.98E-01
>                                     2.662E+00  2.00E-01
>
>Same data as above, but scan function fails.
>
>dat <- read.table(textConnection("  1    1.00E+00  1.24E+03  7.79E+00
> 1.925E-01  1.88E-01
>                                     3.850E-01  1.88E-01
>                                     5.775E-01  1.88E-01
>                                     7.700E-01  1.88E-01
>                                     9.626E-01  1.88E-01
>                                     1.155E+00  1.88E-01
>                                     1.347E+00  1.88E-01
>  1    2.00E+00  1.26E+03  7.80E+00  1.925E-01  2.80E-01
>                                     1.732E+00  2.80E-01
>                                     1.925E+00  2.80E-01
>                                     2.310E+00  2.93E-01
>                                     2.502E+00  2.22E-01
>                                     2.695E+00  1.88E-01
>                                     2.887E+00  1.88E-01
>  1    3.00E+00  1.28E+03  7.70E+00  1.925E-01  1.03E-01
>                                     3.850E-01  1.30E-01
>                                     5.775E-01  1.48E-01
>                                     7.701E-01  1.61E-01
>                                     9.626E-01  1.72E-01
>                                     1.155E+00  1.86E-01
>                                     1.347E+00  1.93E-01
>  1    4.00E+00  1.29E+03  7.60E+00  1.901E-01  1.80E-01
>                                     3.803E-01  1.80E-01
>                                     5.705E-01  1.38E-01
>                                     7.607E-01  1.32E-01
>                                     2.282E+00  1.86E-01
>                                     2.472E+00  1.98E-01
>                                     2.662E+00  2.00E-01"),header=FALSE)
>
>	[[alternative HTML version deleted]]
>
>______________________________________________
>R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list