[Rd] Proposed Patch for poly.Rd

Marc Schwartz marc_schwartz at me.com
Fri Jul 14 13:57:26 CEST 2017


> On Jul 13, 2017, at 5:07 PM, Marc Schwartz <marc_schwartz at me.com> wrote:
> 
> 
>> On Jul 13, 2017, at 3:37 PM, Marc Schwartz <marc_schwartz at me.com> wrote:
>> 
>> 
>>> On Jul 13, 2017, at 3:22 PM, Duncan Murdoch <murdoch.duncan at gmail.com> wrote:
>>> 
>>> On 13/07/2017 4:08 PM, Marc Schwartz wrote:
>>>> Hi All,
>>>> 
>>>> As per the discussion today on R-Help:
>>>> 
>>>> https://stat.ethz.ch/pipermail/r-help/2017-July/448132.html
>>>> 
>>>> I am attaching a proposed patch for poly.Rd to provide clarifying wording relative to naming the 'degree' argument explicitly, in the case where the 'x' argument is a matrix, rather than a vector.
>>>> 
>>>> This is based upon the svn trunk version of poly.Rd.
>>> 
>>> I don't think this is the right fix.  The use of the unnamed 2nd arg as degree happens whether the first arg is a matrix or not.
>>> 
>>> I didn't read the whole thread in detail, but it appears there's a bug somewhere, in the report or in the poly() code or in the plsr() code. That bug should be reported on the bug list if it turns out to be in base R, and to the package maintainer if it is in plsr().
>>> 
>>> Duncan Murdoch
>> 
>> 
>> Duncan,
>> 
>> Thanks for your reply. You only really need to read that last post in the thread linked to above.
>> 
>> I won't deny the possibility of a bug in poly(), relative to the handling of 'x' as a matrix. The behavior occurs in the poly() function in a pure stand alone fashion, without the need for plsr():
>> 
>> x1 <- runif(20)
>> x2 <- runif(20)
>> mx <- cbind(x1, x2)
>> 
> 
> <snip>
> 
> Duncan,
> 
> Tracing through the code for poly() using debug once with:
> 
>  poly(mx, 2)
> 
> and then with:
> 
>  poly(mx, degree = 2)
> 
> there is a difference in the transformation of 'mx' internally by the use of:
> 
> if (is.matrix(x)) {
>    m <- unclass(as.data.frame(cbind(x, ...)))
>    return(do.call(polym, c(m, degree = degree, raw = raw, list(coefs = coefs))))
> }
> 
> 
> In the first case, 'mx' ends up being transformed to:
> 
> Browse[2]> m
> $x1
> [1] 0.99056941 0.13953093 0.38965567 0.35353514 0.90838486 0.97552474
> [7] 0.01135743 0.06537047 0.56207834 0.50554056 0.96653391 0.69533973
> [13] 0.31333549 0.97488211 0.54952630 0.71747157 0.31164777 0.81694822
> [19] 0.58641410 0.08858699
> 
> $x2
> [1] 0.6628658 0.9221436 0.3162418 0.8494452 0.4665010 0.3403719
> [7] 0.4040692 0.4916650 0.9091161 0.2956006 0.3454689 0.3331070
> [13] 0.8788974 0.5614636 0.7794396 0.2304009 0.6566537 0.6875646
> [19] 0.5110733 0.4122336
> 
> $V3
> [1] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
> 
> attr(,"row.names")
> [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20
> 
> Thus, when do.call() is used, m$V3 is passed as the 'x' argument on the third iteration, essentially resulting in:
> 
>> polym(rep(2, 20), degree = 2)
> Error in poly(dots[[1L]], degree, raw = raw, simple = raw && nd > 1) : 
> 'degree' must be less than number of unique points
> 
> 
> Note also that in this case, 'dots', which is the result of using list(...) on the initial call, is:
> 
> Browse[2]> dots
> [[1]]
> [1] 2
> 
> 
> In the second case:
> 
> Browse[2]> m
> $x1
> [1] 0.99056941 0.13953093 0.38965567 0.35353514 0.90838486 0.97552474
> [7] 0.01135743 0.06537047 0.56207834 0.50554056 0.96653391 0.69533973
> [13] 0.31333549 0.97488211 0.54952630 0.71747157 0.31164777 0.81694822
> [19] 0.58641410 0.08858699
> 
> $x2
> [1] 0.6628658 0.9221436 0.3162418 0.8494452 0.4665010 0.3403719
> [7] 0.4040692 0.4916650 0.9091161 0.2956006 0.3454689 0.3331070
> [13] 0.8788974 0.5614636 0.7794396 0.2304009 0.6566537 0.6875646
> [19] 0.5110733 0.4122336
> 
> attr(,"row.names")
> [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20
> 
> 
> So, there is no m$V3.
> 
> Note also that 'dots' ends up being:
> 
> Browse[2]> dots
> list()
> 
> 
> In both cases, 'degree' is indeed 2, but the result of 'list(...)' on the initial function call is quite different.
> 
> So, I may be hypo-caffeinated, but if there is a bug here, it may be due to the way in which cbind() is being called in the code above, where the three dots are being used?
> 
> I can replicate the presumably correct behavior by using:
> 
>  m <- unclass(as.data.frame(cbind(x)))
> 
> instead of:
> 
>  m <- unclass(as.data.frame(cbind(x, ...)))
> 
> But I am not sure if removing the three dots in the cbind() call may have other unintended consequences.
> 
> Regards,
> 
> Marc


Duncan,

Some additional information here.

Reviewing the source code for the function in SVN:

  https://svn.r-project.org/R/trunk/src/library/stats/R/contr.poly.R

there is a relevant comment in the code:

 if(is.matrix(x)) { ## FIXME: fails when combined with 'unnamed degree' above
        m <- unclass(as.data.frame(cbind(x, ...)))
	return(do.call(polym, c(m, degree = degree, raw = raw,
				list(coefs=coefs))))
    }


A version review would suggest that the above comment was added to the code back in 2015.

So it would appear that the behavior being discussed here is known.

I am still confused by the need for the '...' in the call to cbind(), which as far as I can tell, has been in the code at least back to 2003, when the poly() code was split from base. I am not sure why one would want to pass on other '...' arguments to cbind(), but I am presumably missing something here.

Regards,

Marc



More information about the R-devel mailing list