[R] Read multiple files into dataframe?

jim holtman jholtman at gmail.com
Tue Sep 1 23:32:49 CEST 2009


I would put the data into a 'long' instead of 'wide' format since you
say you have files of different lengths.  I took you data and
replicated it 3 time and changed the file name for the duration:

> fileNames <- Sys.glob('/da_zone*')  # files to process
> result <- lapply(fileNames, function(.file){
+     # read in data after skipping 11 lines
+     .input <- read.csv(.file, skip=11)
+     # extract the duration from file name
+     .dur <- sub(".*_([[:digit:]]+)hr_.*", "\\1", .file, perl=TRUE)
+     # add to the data frame
+     .input$dur <- .dur
+     .input
+ })
> # put into a single data.frame
> do.call(rbind, result)
   avgppt areasqmi dur
1    7.67        0  15
2    7.60        1  15
3    7.52        5  15
4    7.32       10  15
5    6.91       20  15
6    5.90       50  15
7    5.02      100  15
8    4.09      200  15
9    3.55      300  15
10   2.96      500  15
11   2.27     1000  15
12   1.64     2000  15
13   0.82     5000  15
14   0.77     5360  15
15   7.67        0   1
16   7.60        1   1
17   7.52        5   1
18   7.32       10   1
19   6.91       20   1
20   5.90       50   1
21   5.02      100   1
22   4.09      200   1
23   3.55      300   1
24   2.96      500   1
25   2.27     1000   1
26   1.64     2000   1
27   0.82     5000   1
28   0.77     5360   1
29   7.67        0   3
30   7.60        1   3
31   7.52        5   3
32   7.32       10   3
33   6.91       20   3
34   5.90       50   3
35   5.02      100   3
36   4.09      200   3
37   3.55      300   3
38   2.96      500   3
39   2.27     1000   3
40   1.64     2000   3
41   0.82     5000   3
42   0.77     5360   3


On Tue, Sep 1, 2009 at 4:24 PM, Douglas M.
Hultstrand<dmhultst at metstat.com> wrote:
> Hello,
>
> I am fairly new to R programming and am stuck with the following problem.
>
> I am trying to read in multiple files (see attached file or at end of
> email), the files all have the same general header information and different
> precipitation (avgppt) and area (areasqmi) values.  Some times the number of
> records are different in the files.
>
> I want to read in all files (.stdsummary), and create a dataframe that
> contains the area and precipitation for each file (files are different
> duration), and supply a header name that represents the duration (sixth line
> down in header information or extracted from data file
> "da_zone1_15hr_1166.stdsummary").
> For example, this is what the final dataframe would look like for 1hr, 3hr,
> and 15hr datafiles:
> 1hrppt      1hrarea    3hrppt      3hrarea    15hrppt      15hrarea 3.8    0
>    6.86    0    7.67    0
> 3.71    1    6.78    1    7.6    1
> 3.69    5    6.72    5    7.52    5
> 3.56    10    6.55    10    7.32    10
> 3.33    20    6.17    20    6.91    20
> 2.87    50    5.25    50    5.9    50
> 2.45    100    4.35    100    5.02    100
> 1.94    200    3.34    200    4.09    200
> 1.67    300    2.78    300    3.55    300
>
> The end result is to perform QC statistics and then plot each set of data.
>  Also, is there away to create a dataframe that has different # of records?
>
> Datafile example of file below:
>
> Storm number: 1166
> Zone number: 1 (ALL zones)
> Number of stations: 172
> Total analyzed area (sq mi):     5360.8
> Average station density (stns per 1000 sq mi):   na
> Duration window (hours): 15
> CPP beg hour index: 1
> CPP end hour index: 15
> Ishohyet interval step (inches): 0.2
> Standard area size summary
> Begin run date/time: Tue Aug 25 01:17:43 2009
> avgppt,  areasqmi
> 00007.67,0000000.00
> 00007.60,0000001.00
> 00007.52,0000005.00
> 00007.32,0000010.00
> 00006.91,0000020.00
> 00005.90,0000050.00
> 00005.02,0000100.00
> 00004.09,0000200.00
> 00003.55,0000300.00
> 00002.96,0000500.00
> 00002.27,0001000.00
> 00001.64,0002000.00
> 00000.82,0005000.00
> 00000.77,0005360.00
>
> --
> ---------------------------------
> Douglas M. Hultstrand, MS
> Senior Hydrometeorologist
> Metstat, Inc. Windsor, Colorado
> voice: 970.686.1253
> email: dmhultst at metstat.com
> web: http://www.metstat.com
> ---------------------------------
>
>
> Storm number: 1166
> Zone number: 1 (ALL zones)
> Number of stations: 172
> Total analyzed area (sq mi):     5360.8
> Average station density (stns per 1000 sq mi):   na
> Duration window (hours): 15
> CPP beg hour index: 1
> CPP end hour index: 15
> Ishohyet interval step (inches): 0.2
> Standard area size summary
> Begin run date/time: Tue Aug 25 01:17:43 2009
> avgppt,  areasqmi
> 00007.67,0000000.00
> 00007.60,0000001.00
> 00007.52,0000005.00
> 00007.32,0000010.00
> 00006.91,0000020.00
> 00005.90,0000050.00
> 00005.02,0000100.00
> 00004.09,0000200.00
> 00003.55,0000300.00
> 00002.96,0000500.00
> 00002.27,0001000.00
> 00001.64,0002000.00
> 00000.82,0005000.00
> 00000.77,0005360.00
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?




More information about the R-help mailing list