[Rd] Corrupt internal row names when creating a data.frame with `attributes<-`

Bill Dunlap w||||@mwdun|@p @end|ng |rom gm@||@com
Tue Feb 16 20:29:48 CET 2021


as.matrix.data.frame does not take the absolute value of that number:
  > dPos <- structure(list(X=101:103,201:203),class="data.frame",row.names=c(NA_integer_,+3L))
  > dNeg <- structure(list(X=101:103,201:203),class="data.frame",row.names=c(NA_integer_,-3L))
  > rownames(as.matrix(dPos))
  [1] "1" "2" "3"
  > rownames(as.matrix(dNeg))
  NULL

-Bill

On Tue, Feb 16, 2021 at 11:06 AM Kevin Ushey <kevinushey using gmail.com> wrote:
>
> Strictly speaking, I don't think this is a "corrupt" representation,
> given that any APIs used to access that internal representation will
> call abs() on the row count encoded within. At least, as far as I can
> tell, there aren't any adverse downstream effects from having the row
> names attribute encoded with this particular internal representation.
>
> On the other hand, the documentation in ?.row_names_info states, for
> the 'type' argument:
>
> integer. Currently type = 0 returns the internal "row.names" attribute
> (possibly NULL), type = 2 the number of rows implied by the attribute,
> and type = 1 the latter with a negative sign for ‘automatic’ row
> names.
>
> so one could argue that it's incorrect in light of that documentation
> (the row names are "automatic", but the row count is not marked with a
> negative sign). Or perhaps this is a different "type" of internal
> automatic row name, since it was generated from an already-existing
> integer sequence rather than "automatically" in a call to
> data.frame().
>
> Kevin
>
> On Sun, Feb 14, 2021 at 6:51 AM Davis Vaughan <davis using rstudio.com> wrote:
> >
> > Hi all,
> >
> > I believe that the internal row names object created at this line in
> > `row_names_gets()` should be using `-n`, not `n`.
> > https://github.com/wch/r-source/blob/b30641d3f58703bbeafee101f983b6b263b7f27d/src/main/attrib.c#L71
> >
> > This can currently generate corrupt internal row names when using
> > `attributes<-` or `structure()`, which calls `attributes<-`.
> >
> > # internal row names are typically `c(NA, -n)`
> > df <- data.frame(x = 1:3)
> > .row_names_info(df, type = 0L)
> > #> [1] NA -3
> >
> > # using `attributes()` materializes their non-internal form
> > attrs <- attributes(df)
> > attrs
> > #> $names
> > #> [1] "x"
> > #>
> > #> $class
> > #> [1] "data.frame"
> > #>
> > #> $row.names
> > #> [1] 1 2 3
> >
> > # let's make a data frame from scratch with `attributes<-`
> > data <- list(x = 1:3)
> > attributes(data) <- attrs
> >
> > # oh no!
> > .row_names_info(data, type = 0L)
> > #> [1] NA  3
> >
> > # Note: Must have `nrow(df) > 2` to demonstrate this bug, as otherwise
> > # internal row names are not attempted to be created in the C level
> > # `row_names_gets()`
> >
> > Thanks,
> > Davis
> >
> >         [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-devel using r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
>
> ______________________________________________
> R-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



More information about the R-devel mailing list