[Rd] [External] Re: 1954 from NA

Adrian Dușa du@@@@dr|@n @end|ng |rom gm@||@com
Wed May 26 19:09:50 CEST 2021


Yes, that is even better.
Best,
Adrian

On Wed, May 26, 2021 at 7:05 PM Duncan Murdoch <murdoch.duncan using gmail.com>
wrote:

> After 5 minutes more thought:
>
> - code non-missing as missingKind = NA, not 0, so that missingKind could
> be a character vector, or missingKind = 0 could be supported.
>
> - print methods should return the main argument, so mine should be
>
> print.MultiMissing <- function(x, ...) {
>    vals <- as.character(x)
>    if (!is.character(x) || inherits(x, "noquote"))
>      print(noquote(vals))
>    else
>      print(vals)
>    invisible(x)
> }
>
> This still needs a lot of improvement to be a good print method, but
> I'll leave that to you.
>
> Duncan Murdoch
>
> On 26/05/2021 11:43 a.m., Duncan Murdoch wrote:
> > On 26/05/2021 10:22 a.m., Adrian Dușa wrote:
> >> Dear Duncan,
> >>
> >> On Wed, May 26, 2021 at 2:27 AM Duncan Murdoch <
> murdoch.duncan using gmail.com
> >> <mailto:murdoch.duncan using gmail.com>> wrote:
> >>
> >>      You've already been told how to solve this:  just add attributes
> to the
> >>      objects. Use the standard NA to indicate that there is some kind of
> >>      missingness, and the attribute to describe exactly what it is.
> Stick a
> >>      class on those objects and define methods so that subsetting and
> >>      arithmetic preserves the extra info you've added. If you do some
> >>      operation that turns those NAs into NaNs, big deal:  the attribute
> will
> >>      still be there, and is.na <http://is.na>(NaN) still returns TRUE.
> >>
> >>
> >> I've already tried the attributes way, it is not so easy.
> >
> > If you have specific operations that are needed but that you can't get
> > to work, post the issue here.
> >
> >> In the best case scenario, it unnecessarily triples the size of the
> >> data, but perhaps this is the only way forward.
> >
> > I don't see how it could triple the size.  Surely an integer has enough
> > values to cover all possible kinds of missingness.  So on integer or
> > factor data you'd double the size, on real or character data you'd
> > increase it by 50%.  (This is assuming you're on a 64 bit platform with
> > 32 bit integers and 64 bit reals and pointers.)
> >
> > Here's a tiny implementation to show what I'm talking about:
> >
> > asMultiMissing <- function(x) {
> >     if (isMultiMissing(x))
> >       return(x)
> >     missingKind <- ifelse(is.na(x), 1, 0)
> >     structure(x,
> >               missingKind = missingKind,
> >               class = c("MultiMissing", class(x)))
> > }
> >
> > isMultiMissing <- function(x)
> >     inherits(x, "MultiMissing")
> >
> > missingKind <- function(x) {
> >     if (isMultiMissing(x))
> >       attr(x, "missingKind")
> >     else
> >       ifelse(is.na(x), 1, 0)
> > }
> >
> > `missingKind<-` <- function(x, value) {
> >     class(x) <- setdiff(class(x), "MultiMissing")
> >     x[value != 0] <- NA
> >     x <- asMultiMissing(x)
> >     attr(x, "missingKind") <- value
> >     x
> > }
> >
> > `[.MultiMissing` <- function(x, i, ...) {
> >     missings <- missingKind(x)
> >     x <- NextMethod()
> >     missings <- missings[i]
> >     missingKind(x) <- missings
> >     x
> > }
> >
> > print.MultiMissing <- function(x, ...) {
> >     vals <- as.character(x)
> >     if (!is.character(x) || inherits(x, "noquote"))
> >       print(noquote(vals))
> >     else
> >       print(vals)
> > }
> >
> > `[<-.MultiMissing` <- function(x, i, value, ...) {
> >     missings <- missingKind(x)
> >     class(x) <- setdiff(class(x), "MultiMissing")
> >     x[i] <- value
> >     missings[i] <- missingKind(value)
> >     missingKind(x) <- missings
> >     x
> > }
> >
> > as.character.MultiMissing <- function(x, ...) {
> >     missings <- missingKind(x)
> >     result <- NextMethod()
> >     ifelse(missings != 0,
> >            paste0("NA.", missings), result)
> >
> > }
> >
> > This is incomplete.  It doesn't do printing very well, and it doesn't
> > handle the case of assigning a MultiMissing value to a regular vector at
> > all.  (I think you'd need an S4 implementation if you want to support
> > that.)  But it does the basics:
> >
> >   > x <- 1:10
> >   > missingKind(x)[4] <- 23
> >   > x
> >    [1] 1     2     3     NA.23 5     6     7     8     9
> > [10] 10
> >   > is.na(x)
> >    [1] FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE
> > [10] FALSE
> >   > missingKind(x)
> >    [1]  0  0  0 23  0  0  0  0  0  0
> >   >
> >
> > Duncan Murdoch
> >
> >>
> >>      Base R doesn't need anything else.
> >>
> >>      You complained that users shouldn't need to know about attributes,
> and
> >>      they won't:  you, as the author of the package that does this, will
> >>      handle all those details.  Working in your subject area you know
> all
> >>      the
> >>      different kinds of NAs that people care about, and how they code
> >>      them in
> >>      input data, so you can make it all totally transparent.  If you do
> it
> >>      well, someone in some other subject area with a completely
> different
> >>      set
> >>      of kinds of missingness will be able to adapt your code to their
> use.
> >>
> >>
> >> But that is the whole point: the package author does not define possible
> >> NAs (the possibilities are infinite), users do that.
> >> The package should only provide a simple method to achieve that.
> >>
> >>
> >>      I imagine this has all been done in one of the thousands of
> packages on
> >>      CRAN, but if it hasn't been done well enough for you, do it better.
> >>
> >>
> >> If it were, I would have found it by now...
> >>
> >> Best wishes,
> >> Adrian
> >
>
>

	[[alternative HTML version deleted]]



More information about the R-devel mailing list