[Rd] Change 77844 breaking pkgs [Re: dimnames incoherence?]

William Dunlap wdun|@p @end|ng |rom t|bco@com
Sat Feb 22 23:18:20 CET 2020


> but then, it seems people want to perpetuate the
> claim of R to be slow

More charitably, I think that the thinking may have been that since x[[i]]
gives you one element of x,
they should use x[[i]]<-value, for scalar i, to stick in one element.

Bill Dunlap
TIBCO Software
wdunlap tibco.com


On Sat, Feb 22, 2020 at 12:44 PM Martin Maechler <maechler using stat.math.ethz.ch>
wrote:

> >>>>> Martin Maechler
> >>>>>     on Sat, 22 Feb 2020 20:20:49 +0100 writes:
>
> >>>>> William Dunlap
> >>>>>     on Fri, 21 Feb 2020 14:05:49 -0800 writes:
>
>     >> If we change the behavior  NULL--[[--assignment from
>
>     >> `[[<-`(NULL, 1, "a" ) # gives  "a"  (*not* a list)
>
>     >> to
>
>     >> `[[<-`(NULL, 1, "a" ) # gives  list("a")
>
>     >> then we have more consistency there *and* your bug is fixed too.
>     >> Of course, in other situations back-compatibility would be
>     >> broken as well.
>
>     >> Would that change the result of
>     >> L <- list(One=1) ; L$Two[[1]] <- 2
>     >> from the current list(One=1,Two=2) to list(One=1, Two=list(2))
>
>     >> and the result of
>     >> F <- 1L ; levels(F)[[1]] <- "one"
>     >> from structure(1L, levels="one") to structure(1L,
> levels=list("one"))?
>
>     > Yes (twice).
>
>     > This is indeed what happens in current R-devel, as I had
>     > committed the proposition above yesterday.
>     > So R-devel (with svn rev >= 77844 )  does this :
>
>     >> L <- list(One=1) ; L$Two[[1]] <- 2 ; dput(L)
>     > list(One = 1, Two = list(2))
>     >> F <- 1L ; levels(F)[[1]] <- "one" ; dput(F)
>     > structure(1L, .Label = list("one"))
>     >>
>
>     > but I find that still considerably more logical than current
>     > (pre R-devel) R's
>
>     >> L <- list(One=1) ; L$Two[[1]] <- 2 ; dput(L)
>     > list(One = 1, Two = 2)
>     >> L <- list(One=1) ; L$Two[[1]] <- 2:3 ; dput(L)
>     > list(One = 1, Two = list(2:3))
>     >>
>     >> F <- 1L ; levels(F)[[1]] <- "one" ; dput(F)
>     > structure(1L, .Label = "one")
>     >> F <- 1L ; levels(F)[[1]] <- c("one", "TWO") ; dput(F)
>     > structure(1L, .Label = list(c("one", "TWO")))
>     >>
>
>
>     >> This change would make L$Name[[1]] <- value act like L$Name$one <-
> value
>     >> in cases when L did not have a component named "Name" and value
>     >> had length 1.
>
>     > (I don't entirely get what you mean, but)
>     > indeed,
>     > the  [[<-  assignments will be closer to corresponding $<-
> assignments...
>     > which I thought would be another good thing about the change.
>
>     >> I have seen users use [[<- where [<- is more appropriate in cases
> like
>     >> this.  Should there be a way to generate warnings about the change
> in
>     >> behavior as you've done with other syntax changes?
>
>     > Well, good question.
>     > I'd guess one would get such warnings "all over the place",  and
>     > if a warning is given only once per session it may not be
>     > effective  ... also the warning be confusing to the 99.9% of R users
> who
>     > don't even get what we are talking about here ;-)
>
>     > Thank you for your comments.. I did not get too many.
>
> Well, there's one situation where semi-experienced package
> authors are bitten by the new R-devel behavior...
>
> I'm seeing a few dozen CRAN packages breaking in R-devel >= r77884.
>
> One case is exactly as you (Bill) mention above: people using
> dd[[.]] <- ..   where they should use single [.].
>
> In one package, I see an inefficient for loop over all rows of a
> data frame 'dd'
>
> for(i in 1:nrow(dd)) {
>
>  ...
>
>  dd$<nonexisting_column>[[i]] <-  <one character string>
>
> }
>
> This used to work -- as said quite inefficiently:
> for i=1 it created the **full** data frame column  and then,
> once the column exists, it presumably does assign one entry
> after the other...
>
> Now this code breaks (later!) in the package now, because the
> new column ends up as a *list* of strings, instead of a vector
> of strings.
>
> I think there are quite a few such cases also in other CRAN
> packages which now break with the latest R-devel.
>
> Coming back to Bill Dunlap's question: Should we not warn here?
> And now when our toplevel list is a data frame, maybe we should
> warn indeed, if we can easily limit ourselves to such "bizarre"
> ways of growng a data frame  ...
>
>
>   dd $ foo [[i]] <- vv
>
> <==>
>
>   `*tmp*` <- dd
>   dd <- `$<-`(`*tmp*`, value = `[[<-`(`*tmp*`$foo, i, vv))
>   rm(`*tmp*`)
>
> but then really we have the same problem as previously: The
>  `[[<-`(NULL, i, vv)  part does not "know" anything about the
> fact that we are in a data frame column creation context.
>
> If the R package author had used  '[i]' instead of '[[i]]'
> he|she would have been safe
>
> (as they would be if they worked more efficiently and created
> the whole variable as a vector and only then added it to the
> data frame ... but then, it seems people want to perpetuate the
> claim of R to be slow ... even if it's them who make R run
> slowly ... ;-))
>
>

	[[alternative HTML version deleted]]



More information about the R-devel mailing list