[R] User defined function within a formula

Fri Jul 17 01:51:00 CEST 2015

This might do what you want:

OPoly <- function(x, degree=1, weight=1, coefs=NULL, rangeX=NULL){
  weight <- round(weight,0)# weight need to be integer
  if(length(weight)!=length(x)) {
    weight <- rep(1,length(x))
  }
  if (is.null(rangeX)) {
      rangeX <- range(x)
  }
  p <- poly(4*(rep(x,weight)-mean(rangeX))/diff(rangeX), degree=degree,
coefs=coefs)
  # why t(t(...))?  That strips the attributes.
  Z <- t( t(p[cumsum(weight),]) * sqrt(attr(p,"coefs")$norm2[-seq(2)]) )[,
degree, drop=FALSE]
  class(Z) <- "OPoly"
  attr(Z, "coefs") <- attr(p, "coefs")
  attr(Z, "rangeX") <- rangeX
  Z
}

makepredictcall.OPoly<-function(var,call)
{
  if (is.call(call)) {
    if (identical(call[[1]], quote(OPoly))) {
      if (!is.null(tmp <- attr(var, "coefs"))) {
        call$coefs <- tmp
      }
      if (!is.null(tmp <- attr(var, "rangeX"))) {
        call$rangeX <- tmp
      }
      call$weight <- 1 # weight not relevant in predictions
    }
  }
  call
}

d <- data.frame(Y=1:8, X=log(1:8), Weight=1:8)
fit <- lm(data=d, Y ~ OPoly(X, degree=2, weight=Weight))
predict(fit)[c(3,8)]
predict(fit, newdata=data.frame(X=d$X[c(3,8)])) # same result

Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Thu, Jul 16, 2015 at 4:39 PM, William Dunlap <wdunlap at tibco.com> wrote:

> OPoly<-function(x,degree=1,weight=1){
>   weight=round(weight,0)# weight need to be integer
>   if(length(weight)!=length(x))weight=rep(1,length(x))
>   p=poly(4*(rep(x,weight)-mean(range(x)))/diff(range(x)),degree)
>   Z<-(t(t(p[cumsum(weight),])*sqrt(attr(p,"coefs")$norm2[-
> seq(2)]))[,degree])
>   class(Z)<-"OPoly";Z
> }
>
> You need to make OPoly to have optional argument(s) that give
> the original-regressor-dependent information to OPoly and then
> have it return, as attributes, the value of those arguments.
>  makepredictcall
> will take the attributes and attach them to the call in predvars so
> predict uses values derived from the original regressors, not value derived
> from the data to be predicted from.
>
> Take a look at a pair like makepredictcall.scale() and scale() for an
> example:
> scale has optional arguments 'center' and 'scale' that it returns as
> attributes
> and makepredictcall.scale adds those to the call to scale that it is given.
> Thus when you predict, the scale and center arguments come from the
> original data, not from the data you are predicting from.
>
>
>
>
>
>
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com
>
> On Thu, Jul 16, 2015 at 3:43 PM, Kunshan Yin <yinkunshan at gmail.com> wrote:
>
>> Thanks Bill for your quick reply.
>>
>> I tried your solution and it did work for the simple user defined
>> function xploly. But when I try with other function, it gave me error again:
>>
>> OPoly<-function(x,degree=1,weight=1){
>>   weight=round(weight,0)# weight need to be integer
>>   if(length(weight)!=length(x))weight=rep(1,length(x))
>>   p=poly(4*(rep(x,weight)-mean(range(x)))/diff(range(x)),degree)
>>
>> Z<-(t(t(p[cumsum(weight),])*sqrt(attr(p,"coefs")$norm2[-seq(2)]))[,degree])
>>   class(Z)<-"OPoly";Z
>> }
>>
>> ##this OPoly is an FORTRAN orthogonal polynomial routine, it first maps
>> the x to range[-2,2] then do QR, then return the results with sqrt(norm2).
>> Comparing with poly, this transformation will make the model coefficients
>> within a similar range as other variables, the R poly routine will usually
>> give you a very large coefficients. I did not find such routine in R, so I
>> have to define this as user defined function.
>> #######
>>
>> I  also have following function as you suggested:
>>
>> makepredictcall.OPoly<-function(var,call)
>> {
>>   if (is.call(call)) {
>>     if (identical(call[[1]], quote(OPoly))) {
>>       if (!is.null(tmp <- attr(var, "coefs"))) {
>>         call$coefs <- tmp
>>       }
>>     }
>>   }
>>   call
>> }
>>
>>
>> But I still got error for following:
>>
>> > g3=glm(lot1 ~ log(u) + OPoly(u,1), data = clotting, family = Gamma)
>>
>> > predict(g3,dc)Error in poly(4 * (rep(x, weight) - mean(range(x)))/diff(range(x)), degree) :
>>   missing values are not allowed in 'poly'
>>
>> I thought it might be due to the /diff(range(x) in the function.  But
>> even I remove that part, it will still give me error. Any idea?
>>
>> Many thanks in advance.
>>
>> Alex
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Thu, Jul 16, 2015 at 2:09 PM, William Dunlap <wdunlap at tibco.com>
>> wrote:
>>
>>> Read about the 'makepredictcall' generic function.  There is a method,
>>> makepredictcall.poly(), for poly() that attaches the polynomial
>>> coefficients
>>> used during the fitting procedure to the call to poly() that predict()
>>> makes.
>>> You ought to supply a similar method for your xpoly(), and xpoly() needs
>>> to return an object of a a new class that will cause that method to be used.
>>>
>>> E.g.,
>>>
>>> xpoly <- function(x,degree=1,...){ ret <- poly(x,degree=degree,...);
>>> class(ret) <- "xpoly" ; ret }
>>> makepredictcall.xpoly <- function (var, call)
>>> {
>>>     if (is.call(call)) {
>>>         if (identical(call[[1]], quote(xpoly))) {
>>>             if (!is.null(tmp <- attr(var, "coefs"))) {
>>>                 call$coefs <- tmp
>>>             }
>>>         }
>>>     }
>>>     call
>>> }
>>>
>>> g2 <- glm(lot1 ~ log(u) + xpoly(u,1), data = clotting, family = Gamma)
>>> predict(g2,dc)
>>> #             1              2              3              4
>>>  5
>>> #-0.01398928608 -0.01398928608 -0.01398928608 -0.01398928608
>>> #-0.01398928608
>>> #             6              7              8              9
>>> #-0.01398928608 -0.01398928608 -0.01398928608 -0.01398928608
>>>
>>> You can see the effects of makepredictcall() in the 'terms' component
>>> of glm's output.  The 'variables' attribute of it gives the original
>>> function
>>> calls and the 'predvars' attribute gives the calls to be used for
>>> prediction:
>>>    > attr(g2$terms, "variables")
>>>    list(lot1, log(u), xpoly(u, 1))
>>>   > attr(g2$terms, "predvars")
>>>   list(lot1, log(u), xpoly(u, 1, coefs = list(alpha = 40, norm2 = c(1,
>>>   9, 8850))))
>>>
>>>
>>>
>>> Bill Dunlap
>>> TIBCO Software
>>> wdunlap tibco.com
>>>
>>> On Thu, Jul 16, 2015 at 12:35 PM, Kunshan Yin <yinkunshan at gmail.com>
>>> wrote:
>>>
>>>> Hello, I have a question about the formula and the user defined
>>>> function:
>>>>
>>>> I can do following:
>>>> ###Case 1:
>>>> > clotting <- data.frame(
>>>> +     u = c(5,10,15,20,30,40,60,80,100),
>>>> +     lot1 = c(118,58,42,35,27,25,21,19,18),
>>>> +     lot2 = c(69,35,26,21,18,16,13,12,12))
>>>> > g1=glm(lot1 ~ log(u) + poly(u,1), data = clotting, family = Gamma)
>>>> > dc=clotting
>>>> > dc$u=1
>>>> > predict(g1,dc)
>>>>           1           2           3           4           5
>>>> 6           7           8           9
>>>> -0.01398929 -0.01398929 -0.01398929 -0.01398929 -0.01398929 -0.01398929
>>>> -0.01398929 -0.01398929 -0.01398929
>>>>
>>>> However, if I just simply wrap the poly as a user defined function ( in
>>>> reality I would have my own more complex function)  then I will get
>>>> error:
>>>> ###Case 2:
>>>> > xpoly<-function(x,degree=1){poly(x,degree)}
>>>> > g2=glm(lot1 ~ log(u) + xpoly(u,1), data = clotting, family = Gamma)
>>>> > predict(g2,dc)
>>>> Error in poly(x, degree) :
>>>>   'degree' must be less than number of unique points
>>>>
>>>> It seems that the predict always treat the user defined function in the
>>>> formula with I().  My question is how can I get the  results for Case2
>>>> same
>>>> as case1?
>>>>
>>>> Anyone can have any idea about this?
>>>>
>>>> Thank you very much.
>>>>
>>>> Alex
>>>>
>>>>         [[alternative HTML version deleted]]
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>
>>>
>>
>

	[[alternative HTML version deleted]]