[R] how to read a group of files into one dataset?

Dennis Murphy djmuser at gmail.com
Thu Aug 25 15:05:16 CEST 2011


Hi:

Similar in vein to the other respondents, you could try something like this:

On Thu, Aug 25, 2011 at 1:17 AM, Jie TANG <totangjie at gmail.com> wrote:
> for example : I have files with the name
>  "ma01.dat","ma02.dat","ma03.dat","ma04.dat",I want to read the data in
> these files into one data.frame
>
# Your file names (assuming they are in your startup directory -
# see list.files() for a more general approach, as mentioned previously)
> flnm <- paste("obs",101:114,"_err.dat",sep="")

This following assumes each data frame in flnm has the same set of
variables and  the same number of columns.

# Method 1:  base R code

  newdata <- lapply(flnm, read.table, skip = 2)
  bigdf <- do.call(rbind, newdata)

# Method 2: Use the plyr package

library('plyr')
bdf <- ldply(mlply(files, read.csv, header = TRUE), rbind)

bigdf and bdf should have the same number of rows; bdf will have one
more column than bigdf because the first column of bdf is an indicator
of the initial data frame it came from, with a numerical rather than a
character index.

The inner call, mlply, is analogous to the lapply() function from
method 1, and the outer call, ldply, has a similar effect to
do.call().

Here's an example. I have ten files named file_01.csv - file_10.csv in
my startup directory; each has 20 rows and 2 columns, with the same
column names in each.

> files <- list.files(pattern = '^file')
> files
 [1] "file_01.csv" "file_02.csv" "file_03.csv" "file_04.csv" "file_05.csv"
 [6] "file_06.csv" "file_07.csv" "file_08.csv" "file_09.csv" "file_10.csv"

### Method 1:
> filelist <- lapply(files, read.csv, header = TRUE)
> bigdf <- ldply(filelist, rbind)
> dim(bigdf)
[1] 200   2
# Show this is right by returning the numbers of rows and cols
# in each list component of filelist
> sapply(filelist, nrow)
 [1] 20 20 20 20 20 20 20 20 20 20
> sapply(filelist, ncol)
 [1] 2 2 2 2 2 2 2 2 2 2

# Method 2:
library('plyr')
> bdf <- ldply(mlply(files, read.csv, header = TRUE), rbind)
> dim(bdf)
[1] 200   3
> head(bdf, 3)
  X1 id count
1  1  1    47
2  1  2    36
3  1  3    53
> head(bigdf, 3)
  id count
1  1    47
2  2    36
3  3    53
> table(bdf$X1)

 1  2  3  4  5  6  7  8  9 10
20 20 20 20 20 20 20 20 20 20

HTH,
Dennis

> newdata<-read.table(flnm,skip=2)
> data<-(flnm,skip=2)
> but the data only contains data from the flnm[1]
> I  also tried as below :
> for (i in 1:9) {
> data<-read.table(flnm[i],skip=2)
> }
>
>
> but i failed how could I modified my script?
>
> is there any advices?
> --
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list