[R] Combining 4th column from 30 files

Henrik Bengtsson hb at stat.berkeley.edu
Wed Jul 23 20:25:03 CEST 2008


A few things that will help you, if not now then in the future:

1) Preallocate the result object.  This allow you to avoid using
cbind()/rbind(), which constantly creates a new large copy in each
iteration.  That will eventually bite you if you have a lot of data.
In your case you know the number of files, but maybe not the number of
rows, but that can be inferred in the first iteration.

2) Read only the columns you need.  This will save memory and speed up
the reading, especially for large data files. In read.table() you can
specify 'colClasses' and set it to "NULL" for unwanted columns.  If
you know the number of columns in each file, say it is 23, the do:
colClasses <- rep("NULL", 23); colClasses[4] <- "double" (if it is
doubles you are reading).

This is how I would do it.  It works for small and rather large data sets.

pathnames <- dir(pattern="data");
nbrOfFiles <- length(pathnames);
colClasses <- rep("NULL", nbrOfFiles); colClasses[4] <- "double";
res <- NULL;
for (kk in seq(length=nbrOfFiles)) {
  pathname <- pathnames[kk];
  values <- read.table(pathname, colClasses=colClasses)[,1];
  if (is.null(res)) {
     # Allocate a matrix of the same data type as the data read.
     res <- matrix(values[1], nrow=length(values), ncol=nbrOfFiles);
  }
  res[,kk] <- values;
  rm(values);
}

My $.02

/Henrik


On Wed, Jul 23, 2008 at 4:24 AM, Henrique Dallazuanna <wwwhsd at gmail.com> wrote:
> Maybe:
>
> sapply(lapply(dir(pattern="data"), read.table), '[[', 4)
>
> On Wed, Jul 23, 2008 at 5:21 AM, Daren Tan <daren76 at hotmail.com> wrote:
>>
>> Better approach than this brute force ?
>>
>> mm <- NULL
>> for (i in dir(pattern="data")) { m <- readTable(i); mm <- cbind(mm, m[,4]) }
>> _________________________________________________________________
>>
>>
>>        [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
> --
> Henrique Dallazuanna
> Curitiba-Paraná-Brasil
> 25° 25' 40" S 49° 16' 22" O
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list