[Rd] unlist errors on a nested list of empty lists

Martin Maechler m@echler @ending from @t@t@m@th@ethz@ch
Thu May 10 09:33:23 CEST 2018


>>>>> Steven Nydick <swnydick at gmail.com>
>>>>>     on Wed, 9 May 2018 13:25:11 +0000 writes:

    > I do not have access to the bug reporting system. If somebody can get me
    > access, I can create a formal bug report.

    > The latter issues seem like duplicates of:
    > https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=12572 (with slightly
    > different output), but as that bug was reported nearly 10 years ago, it
    > might be worth creating an update under R version 3. I could not find the
    > first issue when searching the bug reports (which I ran into when trying to
    > parse JSON files), which is why I posted on r-devel.

Indeed, thanks a lot Steven (and Duncan!),  I've found the
following:

1. The first issue is a new bug, in R "only" since R version
  3.4.0, i.e. working upto R 3.3.3.
  Duncan's patch basically fixes.
  I've found that the C code there can be simplified and
  deconvoluted, and after that, I will commit basically the bug
  fix of Duncan Murdoch.  	   

2. The second issues indeed are an entirely different bug, and I
   would say actually point to a "design problem" of the whole    thing.
   The C code in islistfactor() talks about arbitrary trees with
   all leaves factors,  whereas the R code -- in the
   islistfactor() is TRUE -- actually only correctly deals with
   simple trees, namely of depth exactly 1. That are those you typically
   get from e.g., lapply(), and so this old design-bug triggers
   relatively rarely.

Last but not least: I have created an account for you, Steven,
on the bugzilla site.

Given we have holidays till the weekend and private duties of
mine, I won't get to more for now.

Best
Martin Maechler

   > On Tue, May 8, 2018 at 7:51 PM Duncan Murdoch <murdoch.duncan at gmail.com>
    > wrote:

    >> On 08/05/2018 4:50 PM, Steven Nydick wrote:
    >> > It also does the same thing if the factor is not on the first level of
    >> > the list, which seems to be due to the fact that the islistfactor is
    >> > recursive, but if a list is a list-factor, the first level lists are
    >> > coerced into character strings.
    >> >
    >> >  > x <- list(list(factor(LETTERS[1])))
    >> >  > unlist(x)
    >> > Error in as.character.factor(x) : malformed factor
    >> >
    >> > However, if one of the factors is at the top level, and one is nested,
    >> > then the result is:
    >> >
    >> >  > x <- list(list(factor(LETTERS[1])), factor(LETTERS[2]))
    >> >  > unlist(x)
    >> >
    >> > [1] <NA> B
    >> > Levels: B
    >> >
    >> > ... which does not seem to me to be desired behavior.
    >> 
    >> The patch I suggested doesn't help with either of these.  I'd suggest
    >> collecting examples, and posting a bug report to bugs.r-project.org.
    >> 
    >> Duncan Murdoch
    >> 
    >> 
    >> >
    >> >
    >> > On Tue, May 8, 2018 at 2:22 PM Duncan Murdoch <murdoch.duncan at gmail.com
    >> > <mailto:murdoch.duncan at gmail.com>> wrote:
    >> >
    >> >     On 08/05/2018 2:58 PM, Duncan Murdoch wrote:
    >> >      > On 08/05/2018 1:48 PM, Steven Nydick wrote:
    >> >      >> Reproducible example:
    >> >      >>
    >> >      >> x <- list(list(list(), list()))
    >> >      >> unlist(x)
    >> >      >>
    >> >      >> *> Error in as.character.factor(x) : malformed factor*
    >> >      >
    >> >      > The error comes from the line
    >> >      >
    >> >      > structure(res, levels = lv, names = nm, class = "factor")
    >> >      >
    >> >      > which is called because unlist() thinks that some entry is a
    >> factor,
    >> >      > with NULL levels and NULL names.  It's not legal for a factor to
    >> have
    >> >      > NULL levels.  Probably it should never get here; the earlier test
    >> >      >
    >> >      > if (.Internal(islistfactor(x, recursive))) {
    >> >      >
    >> >      > should have been false, and then the result would have been
    >> >      >
    >> >      > .Internal(unlist(x, recursive, use.names))
    >> >      >
    >> >      > (with both recursive and use.names being TRUE), which returns
    >> NULL.
    >> >
    >> >     And the problem is in the islistfactor function in src/main/apply.c,
    >> >     which looks like this:
    >> >
    >> >     static Rboolean islistfactor(SEXP X)
    >> >     {
    >> >           int i, n = length(X);
    >> >
    >> >           switch(TYPEOF(X)) {
    >> >           case VECSXP:
    >> >           case EXPRSXP:
    >> >               if(n == 0) return NA_LOGICAL;
    >> >               for(i = 0; i < LENGTH(X); i++)
    >> >                   if(!islistfactor(VECTOR_ELT(X, i))) return FALSE;
    >> >               return TRUE;
    >> >               break;
    >> >           }
    >> >           return isFactor(X);
    >> >     }
    >> >
    >> >     One of those deeply nested lists is length 0, so at the lowest level
    >> it
    >> >     returns NA_LOGICAL.  But then it does C-style logical testing on the
    >> >     results.  I think to C NA_LOGICAL counts as true, so at the next
    >> level
    >> >     up we get the wrong answer.
    >> >
    >> >     A fix would be to rewrite it like this:
    >> >
    >> >     static Rboolean islistfactor(SEXP X)
    >> >     {
    >> >           int i, n = length(X);
    >> >           Rboolean result = NA_LOGICAL, childresult;
    >> >           switch(TYPEOF(X)) {
    >> >           case VECSXP:
    >> >           case EXPRSXP:
    >> >               for(i = 0; i < LENGTH(X); i++) {
    >> >                   childresult = islistfactor(VECTOR_ELT(X, i));
    >> >                   if(childresult == FALSE) return FALSE;
    >> >                   else if(childresult == TRUE) result = TRUE;
    >> >               }
    >> >               return result;
    >> >               break;
    >> >           }
    >> >           return isFactor(X);
    >> >     }
    >> >
    >> >
    >> >
    >> > --
    >> > Steven Nydick
    >> > PhD, Quantitative Psychology
    >> > M.A., Psychology
    >> > M.S., Statistics
    >> > --
    >> > "Beware of the man who works hard to learn something, learns it, and
    >> > finds himself no wiser than before, Bokonon tells us. He is full of
    >> > murderous resentment of people who are ignorant without having come by
    >> > their ignorance the hard way."
    >> > -Kurt Vonnegut
    >> 
    >> 

    > -- 
    > Steven Nydick
    > PhD, Quantitative Psychology
    > M.A., Psychology
    > M.S., Statistics
    > --
    > "Beware of the man who works hard to learn something, learns it, and finds
    > himself no wiser than before, Bokonon tells us. He is full of murderous
    > resentment of people who are ignorant without having come by their
    > ignorance the hard way."
    > -Kurt Vonnegut

    > [[alternative HTML version deleted]]

    > ______________________________________________
    > R-devel at r-project.org mailing list
    > https://stat.ethz.ch/mailman/listinfo/r-devel



More information about the R-devel mailing list