[Rd] problem in levels<- and other inconsistencies

Wed Sep 28 10:50:47 CEST 2016

Hervé,

Good point, but easy to solve:

since 

list[i] # always is.list

deleting a list element with

list[i] <- NULL # !is.list(NULL)

does not lead into a contradiction

whereas 

list[[i]] <- NULL 

should do the same as 

list[i] <- list(NULL)

YES, I know that this would be major change, but NO, this is no justification to not fix a mistake in a language. Unless one has given up to fix the language, in which case we all should switch to another one (Julia, ...)

Jens

Gesendet: Dienstag, 27. September 2016 um 23:20 Uhr
Von: "Hervé Pagès" <hpages at fredhutch.org>
An: "Dr. Jens Oehlschlägel" <Jens.Oehlschlaegel at truecluster.com>, r-devel at r-project.org
Betreff: Re: [Rd] problem in levels<- and other inconsistencies
Hi,

I totally agree that having foo(x) <- foo(x) behave like a no-op
is a must. This is something I try to be careful about when I design
my own objects and their getters and setters.

Just wanted to mention though that there is notorious violation of
this:

x <- list(3:-1, NULL)
x[[2]] <- x[[2]]
x
# [[1]]
# [1] 3 2 1 0 -1

Now of course, not just because there is a precedent means the factor
API shouldn't be improved.

Cheers,
H.

On 09/27/2016 12:33 PM, Dr. Jens Oehlschlägel wrote:
> # A couple of years ago
> # I helped making R's character NA handling more consistent
> # Today I report an issue with R's factor NA handling
> # The core problem is that
> # levels(g) <- levels(g)
> # can change the levels of g
> # more details below
> # Kind regards
> # Jens Oehlschlägel
>
> # Say I have an NA element in a vector or list
>
> x <- c("a","b",NA)
>
> # then using split() it gets lost
>
> split(x, x)
>
> # as it is (somewhat) when converting to a default factor
>
> table(as.factor(x))
>
> # for table the workaround is
>
> table(as.factor(x), exclude=NULL)
>
> # but for split we need
>
> f <- factor(x, exclude=NULL)
>
> split(x, f)
>
> # conclusion: we MUST use an NA level
>
> # so far so good
>
> g <- f
> levels(g)
>
> # but re-assigning the levels changes them
>
> levels(g) <- levels(g)
> levels(g)
>
> # which I consider a severe problem.
> # Yes, I read the help page of levels<-
> # about removing levels by assigning NAs to them
> # but that implies: we MUST NOT use an NA level
>
> # If a language suggests
> # that we MUST and we MUST NOT use an NA level
> # the language has limited usefulness
> # (and a user who depends on the language
> # is put into a DOUBLE BIND)
> # SUGGESTION: assure the above assignment does not change levels
>
> # trying to apply the levels of f to new data also fails
>
> g <- factor(x, levels=levels(f))
> g
>
> # and giving both arguments even stops
>
> h <- factor(x, levels=levels(f), labels=levels(f))
>
> # I do understand that exclude= meaningfully has effect
> # if levels= are to be determined automatically, but
> # SUGGESTION: with explicit levels= exclude= should be ignored.
>
> # SUGGESTION: give split(x, y, exclude=NA) an exclude= argument,
> # which when set to NULL will prevent dropping NA levels
> # when coercing y to factor
> # (it still remains open what should have priority
> # if y is a factor with an NA-level and exclude=NA)
>
> table(f, exclude=NA)
>
> # here existing levels win over exclude=
> # which is consistent with my suggestion for factor(, levels=, exclude=)
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fredhutch.org
Phone: (206) 667-5791
Fax: (206) 667-1319