[R] printing a data.frame that contains a list-column of S4 objects

boB Rudis bob at rudis.net
Thu Jan 14 12:26:34 CET 2016


Martin, I'm pretty sure the use of Matrix here (actually by someone
else than Dr Bryan) was to make an easy, inline, reproducible example.
The actual "ugh" column comes from using git2r. I'm assuming there's
an API call returning some pretty gnarly structures that are getting
shoehorned into a data.frame. That happens more often than I'd like in
modern API calls (really complex/nested JSON being returned).

On Thu, Jan 14, 2016 at 3:34 AM, Martin Maechler
<maechler at stat.math.ethz.ch> wrote:
>>>>>> boB Rudis <bob at rudis.net>
>>>>>>     on Tue, 12 Jan 2016 13:51:50 -0500 writes:
>
>     > I wonder if something like:
>     > format.list <- function(x, ...) {
>     > rep(class(x[[1]]), length(x))
>     > }
>
>     > would be sufficient? (prbly needs more 'if's though)
>
> Dear Jenny,
> for a different perspective (and a lot of musings), see inline below
>
>     > On Tue, Jan 12, 2016 at 12:15 PM, Jenny Bryan <jenny at stat.ubc.ca> wrote:
>     >> Is there a general problem with printing a data.frame when it has a
>     >> list-column of S4 objects? Or am I just unlucky in my life choices?
>     >>
>     >> I ran across this with objects from the git2r package but maintainer
>     >> Stefan Widgren points out this example below from Matrix as well. I note
>     >> that the offending object can be printed if sent through
>     >> dplyr::tbl_df(). I accept that that printing doesn't provide much info
>     >> on S4 objects. I'd just like those vars to not prevent data.frame-style
>     >> inpsection of the entire object.
>     >>
>     >> I asked this on stack overflow, where commenter provided the lead to the
>     >> workaround below. Is that the best solution?
>     >>
>     >> library(Matrix)
>     >>
>     >> m <- new("dgCMatrix")
>     >> isS4(m)
>     >> #> [1] TRUE
>     >> df <- data.frame(id = 1:2)
>     >> df$matrices <- list(m, m)
>
> This only works by accident (I think), and fails for
>
>   df <- data.frame(id = 1)
>   df$matrices <- list(m, m)
>
>     > df <- data.frame(id = 1)
>     > df$matrices <- list(m, m)
>     Error in `$<-.data.frame`(`*tmp*`, "matrices", value = list(<S4 object of class "dgCMatrix">,  :
>     replacement has 2 rows, data has 1
>     >
>
>
>     >> df
>     >> #> Error in prettyNum(.Internal(format(x, trim, digits, nsmall, width, 3L, : first argument must be atomic
>     >> #> Error in prettyNum(.Internal(format(x, trim, digits, nsmall, width, 3L, : first argument must be atomic
>
> Hmm,
> As 'data.frame' is just an S3 class there is no formal
> definition to go with and in this sense you are of course entitled
> to all expectations. ;-)
> Even though data frames are internally coded as lists, I
> strongly believe data frames should be taught as (and thought of)
>          "generalized matrices"
> in the sense that data frames should be thought of n (say) rows
> and p (say) columns.
>
> The help pages  for  data.frame()  and as.data.frame()
> should make it clear that you can *not* put all kinds of entries
> into data frame columns, but I agree the documentation is vague
> and probably has to remain vague,
> because if you provide  as.data.frame()  methods for your class
> you should be able to go quite far.
>
> In addition, the data frame columns need to fulfill properties, e.g.,
> subsetting (aka "indexing") and also subassignment ( df[i,j] <- v )
>
> Now the real "problem" here is that the '$<-' and '[<-'  methods
> for data frames which you call via  df$m <- v  or  df[,co] <- V
> are too "forgiving". They only check that NROW(.) of the new
> entry corresponds to the nrow(<data.frame>).
> Currently they allow very easy construction of illegal data
> frames(*), as in your present case.
>
> --
> *) Yes, it is hard to say when a data.frame is illegal, as there
>    is no formal definition
>
> There is more to be said and thought about if you really want
> sparse matrices in a data frame, and as 'Matrix' maintainers,
> I'm quite interested *why* you'd want that, but I won't go there
> now.
>
> One last issue though: The idea of allowing to put 'matrix' or
> 'array' into data frames is that each column of the matrix
> becomes a separate column of the data frame
>
>> data.frame(D = diag(3), M = matrix(1:12, 3,4))
>   D.1 D.2 D.3 M.1 M.2 M.3 M.4
> 1   1   0   0   1   4   7  10
> 2   0   1   0   2   5   8  11
> 3   0   0   1   3   6   9  12
>
> .... and that would be quite inefficient for large sparse matrices.
>
> ---------
>
> Final recommendation as a summary:
>
> If  data.frame(.., .., ..) does not work to put entries into a
> data frame, then don't do it, but rather think about how to make
> data.frame() work with your objects -- namely by ensuring that
> as.data.frame() works .. possibly by providing an
> as.data.frame() method.
>
> Best regards,
> Martin Maechler
>



More information about the R-help mailing list