[Rd] Invisible names problem

Duncan Murdoch murdoch@dunc@n @end|ng |rom gm@||@com
Wed Jul 22 23:29:38 CEST 2020


On 22/07/2020 3:29 p.m., Pan Domu wrote:
> I ran into strange behavior when removing names.
> 
> Two ways of removing names:
> 
>      i <- rep(1:4, length.out=20000)
>      k <- c(a=1, b=2, c=3, d=4)
> 
>      x1 <- unname(k[i])
>      x2 <- k[i]
>      x2 <- unname(x2)
> 
> Are they identical?
> 
>      identical(x1,x2) # TRUE
> 
> but no
> 
>      identical(serialize(x1,NULL),serialize(x2,NULL)) # FALSE
> 
> But problem is with serialization type 3, cause:
> 
>      identical(serialize(x1,NULL,version = 2),serialize(x2,NULL,version =
> 2)) # TRUE
> 
> It seems that the second one keeps names somewhere invisibly.
> 
> Some function can lost them, e.g. head:
> 
>      identical(serialize(head(x1, 20001),NULL),serialize(head(x2,
> 20001),NULL)) # TRUE
> 
> But not saveRDS (so files are bigger), tibble family keeps them but base
> data.frame seems to drop them.
> 
>  From my test invisible names are in following cases:
> 
>     x1 <- k[i] %>% unname()
>     x3 <- k[i]; x3 <- unname(x3)
>     x5 <- k[i]; x5 <- `names<-`(x5, NULL)
>     x6 <- k[i]; x6 <- unname(x6)
> 
> but not in this one
>     x2 <- unname(k[i])
>     x4 <- k[i]; names(x4) <- NULL
> 
> What kind of magick is that?
> 
> It hits us when we upgrade from 3.5 (when serialization changed) and had
> impact on parallelization (cause serialized objects were bigger).

You can use .Internal(inspect(x1)) and .Internal(inspect(x2)) to see 
that the two objects are not identical:

 > .Internal(inspect(x1))
@1116b7000 14 REALSXP g0c7 [REF(2)] (len=20000, tl=0) 1,2,3,4,1,...
 > .Internal(inspect(x2))
@7f9c77664ce8 14 REALSXP g0c0 [REF(2)]  wrapper [srt=-2147483648,no_na=0]
   @10e7b7000 14 REALSXP g0c7 [REF(6),ATT] (len=20000, tl=0) 1,2,3,4,1,...
   ATTRIB:
     @7f9c77664738 02 LISTSXP g0c0 [REF(1)]
       TAG: @7f9c6c027890 01 SYMSXP g1c0 [MARK,REF(65535),LCK,gp=0x4000] 
"names" (has value)
       @10e3ac000 16 STRSXP g0c7 [REF(65535)] (len=20000, tl=0)
	@7f9c6ab531c8 09 CHARSXP g1c1 [MARK,REF(10066),gp=0x61] [ASCII] 
[cached] "a"
	@7f9c6ae9a678 09 CHARSXP g1c1 [MARK,REF(10013),gp=0x61] [ASCII] 
[cached] "b"
	@7f9c6c0496c0 09 CHARSXP g1c1 [MARK,REF(10568),gp=0x61,ATT] [ASCII] 
[cached] "c"
	@7f9c6ad3df40 09 CHARSXP g1c1 [MARK,REF(10029),gp=0x61,ATT] [ASCII] 
[cached] "d"
	@7f9c6ab531c8 09 CHARSXP g1c1 [MARK,REF(10066),gp=0x61] [ASCII] 
[cached] "a"
	...


It looks as though x2 is a tiny ALTREP object acting as a wrapper on the 
original k[i], but I might be misinterpreting those displays.  I don't 
know how to force ALTREP objects to standard representation: 
unserializing the serialized x2 gives something like x2, not like x1. 
Maybe you want to look at one of the contributed low level packages. 
The stringfish package has a "materialize" function that is advertised 
to convert anything to standard format, but it doesn't change x2.

Duncan Murdoch



More information about the R-devel mailing list