[Rd] [External] Clearing attributes returns ALTREP, serialize still saves them

iuke-tier@ey m@iii@g oii uiow@@edu iuke-tier@ey m@iii@g oii uiow@@edu
Sat Jul 3 15:40:41 CEST 2021


Please do not cross post. You have already rased this on bugzilla. I
will follow up there later today.

luke

On Sat, 3 Jul 2021, Zafer Barutcuoglu wrote:

> Hi all,
>
> Setting names/dimnames on vectors/matrices of length>=64 returns an ALTREP wrapper which internally still contains the names/dimnames, and calling base::serialize on the result writes them out. They are unserialized in the same way, with the names/dimnames hidden in the ALTREP wrapper, so the problem is not obvious except in wasted time, bandwidth, or disk space.
>
> Example:
>   v1 <- setNames(rnorm(64), paste("element name", 1:64))
>   v2 <- unname(v1)
>   names(v2)
>   # NULL
>   length(serialize(v1, NULL))
>   # [1] 2039
>   length(serialize(v2, NULL))
>   # [1] 2132
>   length(serialize(v2[TRUE], NULL))
>   # [1] 543
>
>   con <- rawConnection(raw(), "w")
>   serialize(v2, con)
>   v3 <- unserialize(rawConnectionValue(con))
>   names(v3)
>   # NULL
>   length(serialize(v3, NULL))
>   # 2132
>
>   # Similarly for matrices:
>   m1 <- matrix(rnorm(64), 8, 8, dimnames=list(paste("row name", 1:8), paste("col name", 1:8)))
>   m2 <- unname(m1)
>   dimnames(m2)
>   # NULL
>   length(serialize(m1, NULL))
>   # [1] 918
>   length(serialize(m2, NULL))
>   # [1] 1035
>   length(serialize(m2[TRUE, TRUE], NULL))
>   # 582
>
> Previously discussed here, too:
> https://r.789695.n4.nabble.com/Invisible-names-problem-td4764688.html
>
> This happens with other attributes as well, but less predictably:
>   x1 <- structure(rnorm(100), data=rnorm(1000000))
>   x2 <- structure(x1, data=NULL)
>   length(serialize(x1, NULL))
>   # [1] 8000952
>   length(serialize(x2, NULL))
>   # [1] 924
>
>   x1b <- rnorm(100)
>   attr(x1b, "data") <- rnorm(1000000)
>   x2b <- x1b
>   attr(x2b, "data") <- NULL
>   length(serialize(x1b, NULL))
>   # [1] 8000863
>   length(serialize(x2b, NULL))
>   # [1] 8000956
>
> This is pretty severe, trying to track down why serializing a small object kills the network, because of which large attributes it may have once had during its lifetime around the codebase that are still secretly tagging along.
>
> Is there a plan to resolve this? Any suggestions for maybe a C++ workaround until then? Or an alternative performant serialization solution?
>
> Best,
> --
> Zafer
>
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

-- 
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa                  Phone:             319-335-3386
Department of Statistics and        Fax:               319-335-3017
    Actuarial Science
241 Schaeffer Hall                  email:   luke-tierney using uiowa.edu
Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu



More information about the R-devel mailing list