[R] apply --> data.frame

William Dunlap wdunlap at tibco.com
Fri Aug 31 20:38:52 CEST 2012


It is hard to help when you don't give an example of your input data
and what you want to be computed (in a form one can source or copy
into an R session).  Is the following something like what you are doing?

Suppose you have a function that takes a file name and
returns a list of things of various types extracted from the
file.  A toy example would be
    fileExtract <- function(fileName) {
       fi <-  file.info(fileName)
       byte0 <- if (fi$isdir || fi$size < 1) NA_integer_ else readBin(fileName, what="integer", size=1, n=1)    
       list(Name=basename(fileName), IsDir=fi$isdir, Size=fi$size, FirstByte = byte0, ModTime=fi$mtime)
   } 
Then you can get the list of rows that you want converted to a data.frame
with
   rows <- lapply(dir(R.home(), full.names=TRUE), fileExtract)
E.g., I get
  > dput(rows[1:2])
  list(structure(list(Name = "bin", IsDir = TRUE, Size = 0, FirstByte = NA_integer_, 
      ModTime = structure(1343316337, class = c("POSIXct", "POSIXt"
      ))), .Names = c("Name", "IsDir", "Size", "FirstByte", "ModTime"
  )), structure(list(Name = "CHANGES", IsDir = FALSE, Size = 28204, 
      FirstByte = 87L, ModTime = structure(1340406834, class = c("POSIXct", 
      "POSIXt"))), .Names = c("Name", "IsDir", "Size", "FirstByte", 
  "ModTime")))
Note that the j'th element of each row has a fixed type.
You want a data.frame with columns named "Name", "IsDir",
"Size", and "FirstByte" where the i'th row contains the data in row[[i]].

If that is what you want then here is a function that does a pretty good job of it:
function (listOfRows, nItemsPerRow = unique(vapply(listOfRows, 
    length, 0)), col.names = names(rowTemplate), rowTemplate = listOfRows[[1]], 
    ...) 
{
    stopifnot(length(nItemsPerRow) == 1, nItemsPerRow == length(rowTemplate))
    if (is.null(col.names)) {
        col.names <- sprintf("V%d", seq_len(nItemsPerRow))
    }
    else {
        stopifnot(nItemsPerRow == length(col.names))
    }
    columns <- lapply(structure(seq_len(nItemsPerRow), names = col.names), 
        FUN = function(i) {
            v <- vapply(listOfRows, function(Row) Row[[i]], rowTemplate[[i]])
            if (is.matrix(v)) { # for when length(rowTemplate[[i]])>1 
                v <- t(v)
            }
            v
        })
    data.frame(columns, ...)
}
E.g.,
> str(f(rows))
'data.frame':   19 obs. of  5 variables:
 $ Name     : Factor w/ 19 levels "bin","CHANGES",..: 1 2 3 4 5 6 7 8 9 10 ...
 $ IsDir    : logi  TRUE FALSE FALSE TRUE TRUE TRUE ...
 $ Size     : num  0 28204 18351 0 0 ...
 $ FirstByte: int  NA 87 9 NA NA NA NA 101 NA 82 ...
 $ ModTime  : num  1.34e+09 1.34e+09 1.34e+09 1.34e+09 1.34e+09 ...
Note that the POSIXct item, ModTime, got converted to numeric because
vapply didn't handle that class properly.

An advantage of vapply is that it will do some type checking:
> f(list(list(a=1,b=11), list(a=2,b="Twelve")))
Error in vapply(listOfRows, function(Row) Row[[i]], rowTemplate[[i]]) : 
  values must be type 'double',
 but FUN(X[[2]]) result is type 'character'
It will also deal with things like the following, where each row element
contains a few vectors and you want the each vector element in its
own column:
  > str(f(list(list(1:2, 1+1i, letters[1:3]), list(11:12, 11+11i, letters[4:6]))))
  'data.frame':   2 obs. of  6 variables:
   $ V1.1: int  1 11
   $ V1.2: int  2 12
   $ V2  : cplx  1+1i 11+11i
   $ V3.1: Factor w/ 2 levels "a","d": 1 2
   $ V3.2: Factor w/ 2 levels "b","e": 1 2
   $ V3.3: Factor w/ 2 levels "c","f": 1 2

There are other ways to do this, but I don't know if this is the problem
you want to solve.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf
> Of Sam Steingold
> Sent: Friday, August 31, 2012 9:11 AM
> To: r-help at r-project.org; David Winsemius
> Subject: Re: [R] apply --> data.frame
> 
> > * David Winsemius <qjvafrzvhf at pbzpnfg.arg> [2012-08-30 10:14:34 -0700]:
> >
> >> str( as.data.frame( do.call(rbind, strsplit(c("a,1","b,2","c,3"),
> > ",") ) , stringsAsFactors=FALSE) )
> > 'data.frame':	3 obs. of  2 variables:
> >  $ V1: chr  "a" "b" "c"
> >  $ V2: chr  "1" "2" "3"
> 
> do.call/rbind appeared to be TRT. I tried it and got a data frame with
> list columns (instead of vectors);
> 
> as.data.frame(do.call(rbind,lapply(list.files(...), function (name) {
>     ....
>     c(name,list(num1,num2,num3), # num* come from some calculations above
>       strsplit(sub("[^-]*(train|test)[^-]*(-(S)?pca([0-9]*))?-s([0-9]*)c([0-9.]*)\\.score",
>                    "\\1,\\3,\\4,\\5,\\6",name),",")[[1]])
>   })), stringsAsFactors = FALSE)
> 
> 'data.frame':	2 obs. of  8 variables:
>  $ file        :List of 2
>   ..$ : chr "zzz_test_0531_0630-Spca181-s0c10.score"
>   ..$ : chr "zzz_train_0531_0630-Spca181-s0c10.score"
>  $ lift.quality:List of 2
>   ..$ : num 0.59
>   ..$ : num 0.621
>  $ proficiency :List of 2
>   ..$ : num 0.0472
>   ..$ : num 0.0472
>  $ set         :List of 2
>   ..$ : chr "test"
>   ..$ : chr "train"
>  $ scale       :List of 2
>   ..$ : chr "S"
>   ..$ : chr "S"
>  $ pca         :List of 2
>   ..$ : chr "181"
>   ..$ : chr "181"
>  $ s           :List of 2
>   ..$ : chr "0"
>   ..$ : chr "0"
>  $ c           :List of 2
>   ..$ : chr "10"
>   ..$ : chr "10"
> 
> I guess the easiest way is to replace c(...list()...) with c(...) but
> that would mean converting num1,num2,num3 to string and back which I
> want to avoid for aesthetic reasons. Any better suggestions?
> 
> thanks a lot!
> 
> --
> Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
> http://www.childpsy.net/ http://jihadwatch.org http://thereligionofpeace.com
> http://palestinefacts.org http://ffii.org http://pmw.org.il
> I don't have an attitude problem. You have a perception problem.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list