[Rd] Corrupt internal row names when creating a data.frame with `attributes<-`

Davis Vaughan d@v|@ @end|ng |rom r@tud|o@com
Tue Feb 16 20:50:33 CET 2021


This originally came up in this dplyr issue:
https://github.com/tidyverse/dplyr/issues/5745

Where `tibble::column_to_rownames()` failed because it eventually checks
`.row_names_info(.data) > 0L` to see if there are automatic row names,
which is in line with the documentation that Kevin pointed out: "type = 1
the latter with a negative sign for ‘automatic’ row names."

Davis

On Tue, Feb 16, 2021 at 2:29 PM Bill Dunlap <williamwdunlap using gmail.com>
wrote:

> as.matrix.data.frame does not take the absolute value of that number:
>   > dPos <-
> structure(list(X=101:103,201:203),class="data.frame",row.names=c(NA_integer_,+3L))
>   > dNeg <-
> structure(list(X=101:103,201:203),class="data.frame",row.names=c(NA_integer_,-3L))
>   > rownames(as.matrix(dPos))
>   [1] "1" "2" "3"
>   > rownames(as.matrix(dNeg))
>   NULL
>
> -Bill
>
> On Tue, Feb 16, 2021 at 11:06 AM Kevin Ushey <kevinushey using gmail.com> wrote:
> >
> > Strictly speaking, I don't think this is a "corrupt" representation,
> > given that any APIs used to access that internal representation will
> > call abs() on the row count encoded within. At least, as far as I can
> > tell, there aren't any adverse downstream effects from having the row
> > names attribute encoded with this particular internal representation.
> >
> > On the other hand, the documentation in ?.row_names_info states, for
> > the 'type' argument:
> >
> > integer. Currently type = 0 returns the internal "row.names" attribute
> > (possibly NULL), type = 2 the number of rows implied by the attribute,
> > and type = 1 the latter with a negative sign for ‘automatic’ row
> > names.
> >
> > so one could argue that it's incorrect in light of that documentation
> > (the row names are "automatic", but the row count is not marked with a
> > negative sign). Or perhaps this is a different "type" of internal
> > automatic row name, since it was generated from an already-existing
> > integer sequence rather than "automatically" in a call to
> > data.frame().
> >
> > Kevin
> >
> > On Sun, Feb 14, 2021 at 6:51 AM Davis Vaughan <davis using rstudio.com> wrote:
> > >
> > > Hi all,
> > >
> > > I believe that the internal row names object created at this line in
> > > `row_names_gets()` should be using `-n`, not `n`.
> > >
> https://github.com/wch/r-source/blob/b30641d3f58703bbeafee101f983b6b263b7f27d/src/main/attrib.c#L71
> > >
> > > This can currently generate corrupt internal row names when using
> > > `attributes<-` or `structure()`, which calls `attributes<-`.
> > >
> > > # internal row names are typically `c(NA, -n)`
> > > df <- data.frame(x = 1:3)
> > > .row_names_info(df, type = 0L)
> > > #> [1] NA -3
> > >
> > > # using `attributes()` materializes their non-internal form
> > > attrs <- attributes(df)
> > > attrs
> > > #> $names
> > > #> [1] "x"
> > > #>
> > > #> $class
> > > #> [1] "data.frame"
> > > #>
> > > #> $row.names
> > > #> [1] 1 2 3
> > >
> > > # let's make a data frame from scratch with `attributes<-`
> > > data <- list(x = 1:3)
> > > attributes(data) <- attrs
> > >
> > > # oh no!
> > > .row_names_info(data, type = 0L)
> > > #> [1] NA  3
> > >
> > > # Note: Must have `nrow(df) > 2` to demonstrate this bug, as otherwise
> > > # internal row names are not attempted to be created in the C level
> > > # `row_names_gets()`
> > >
> > > Thanks,
> > > Davis
> > >
> > >         [[alternative HTML version deleted]]
> > >
> > > ______________________________________________
> > > R-devel using r-project.org mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
> > ______________________________________________
> > R-devel using r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
>

	[[alternative HTML version deleted]]



More information about the R-devel mailing list