[R] idiom for constructing data frame

Henrik Bengtsson henrik.bengtsson at ucsf.edu
Wed Apr 1 02:06:50 CEST 2015


I've got dataFrame() in R.utils for this purpose, e.g.

> df <- dataFrame(colClasses=c(a="integer", b="double", c="character"), nrow=10L)
> str(df)
'data.frame':   10 obs. of  3 variables:
 $ a: int  0 0 0 0 0 0 0 0 0 0
 $ b: num  0 0 0 0 0 0 0 0 0 0
 $ c: chr  "" "" "" "" ...

Related: You can use the colClasses() function to generate the
'colClasses' argument "dynamically", e.g.

> cols <- colClasses("idc")
> names(cols) <- c("a", "b", "c")
> str(cols)
 Named chr [1:3] "integer" "double" "character"
 - attr(*, "names")= chr [1:3] "a" "b" "c"

> cols <- colClasses(sprintf("c2d%di", 4))
> df <- dataFrame(colClasses=cols, nrow=10L)
str(df)
'data.frame':   10 obs. of  7 variables:
 $ : chr  "" "" "" "" ...
 $ : num  0 0 0 0 0 0 0 0 0 0
 $ : num  0 0 0 0 0 0 0 0 0 0
 $ : int  0 0 0 0 0 0 0 0 0 0
 $ : int  0 0 0 0 0 0 0 0 0 0
 $ : int  0 0 0 0 0 0 0 0 0 0
 $ : int  0 0 0 0 0 0 0 0 0 0


dataFrame() is basically implemented as:

dataFrame <- function(colClasses, nrow=1L, ...) {
  df <- vector("list", length=length(colClasses))
  names(df) <- names(colClasses)
  for (kk in seq(along=df)) {
    df[[kk]] <- vector(colClasses[kk], length=nrow)
  }
  attr(df, "row.names") <- seq(length=nrow)
  class(df) <- "data.frame"
  df
} # dataFrame()

/Henrik

On Tue, Mar 31, 2015 at 4:42 PM, Sarah Goslee <sarah.goslee at gmail.com> wrote:
> On Tue, Mar 31, 2015 at 6:35 PM, Richard M. Heiberger <rmh at temple.edu> wrote:
>> I got rid of the extra column.
>>
>> data.frame(r=seq(8), foo=NA, bar=NA, row.names="r")
>
> Brilliant!
>
> After much fussing, including a disturbing detour into nested lapply
> statements from which I barely emerged with my sanity (arguable, I
> suppose), here is a one-liner that creates a data frame of arbitrary
> number of rows given an existing data frame as template for column
> number and name:
>
>
> n <- 8
> df1 <- data.frame(A=runif(9), B=runif(9))
>
> do.call(data.frame, setNames(c(list(seq(n), "r"), as.list(rep(NA,
> ncol(df1)))), c("r", "row.names", colnames(df1))))
>
> It's not elegant, but it is fairly R-ish. I should probably stop
> hunting for an elegant solution now.
>
> Thanks, everyone!
>
> Sarah
>
>
>> Rich
>>
>> On Tue, Mar 31, 2015 at 6:18 PM, Sven E. Templer <sven.templer at gmail.com> wrote:
>>> If you don't mind an extra column, you could use something similar to:
>>>
>>> data.frame(r=seq(8),foo=NA,bar=NA)
>>>
>>> If you do, here is another approach (see function body):
>>>
>>> empty.frame <- function (r = 1, n = 1, fill = NA_real_) {
>>>   data.frame(setNames(lapply(rep(fill, length(n)), rep, times=r), n))
>>> }
>>> empty.frame()
>>> empty.frame(, seq(3))
>>> empty.frame(8, c("foo", "bar"))
>>>
>>> I could not put it in one line either, without retyping at least one
>>> argument (n in this case).
>>> So I suggest a function is the way to go for a simplified syntax ...
>>>
>>> Thanks to all for the ideas!
>>> Sven
>>>
>>> On 31 March 2015 at 20:55, William Dunlap <wdunlap at tibco.com> wrote:
>>>
>>>> You can use structure() to attach the names to a list that is input to
>>>> data.frame.
>>>> E.g.,
>>>>
>>>> dfNames <- c("First", "Second Name")
>>>> data.frame(lapply(structure(dfNames, names=dfNames),
>>>> function(name)rep(NA_real_, 5)))
>>>>
>>>>
>>>> Bill Dunlap
>>>> TIBCO Software
>>>> wdunlap tibco.com
>>>>
>>>> On Tue, Mar 31, 2015 at 11:37 AM, Sarah Goslee <sarah.goslee at gmail.com>
>>>> wrote:
>>>>
>>>> > Hi,
>>>> >
>>>> > Duncan Murdoch suggested:
>>>> >
>>>> > > The matrix() function has a dimnames argument, so you could do this:
>>>> > >
>>>> > > names <- c("strat", "id", "pid")
>>>> > > data.frame(matrix(NA, nrow=10, ncol=3, dimnames=list(NULL, names)))
>>>> >
>>>> > That's a definite improvement, thanks. But no way to skip matrix()? It
>>>> > just seems unRlike, although since it's only full of NA values there
>>>> > are no coercion issues with column types or anything, so it doesn't
>>>> > hurt. It's just inelegant. :)
>>>> >
>>>> > Sarah
>
> --
> Sarah Goslee
> http://www.functionaldiversity.org
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list