[R] how to suppress the intercept in an lm()-like formula method?

Kevin E. Thorpe kevin.thorpe at utoronto.ca
Tue Jan 29 15:18:08 CET 2013


For lm() the intercept can be removed by adding a "- 1" to the RHS of 
the formula.  Does that not work in your case?

Kevin

On 01/29/2013 09:14 AM, Michael Friendly wrote:
> To partly answer my own question:  It wasn't that hard to hack the
> result of model.matrix() to remove the intercept,
>
> remove.intercept <- function(x) {
>      if (colnames(x)[1] == "(Intercept)") {
>          x <- x[,-1]
>          attr(x, "assign") <- attr(x, "assign")[-1]
>      }
>      x
> }
>
> However, the model frame and therefore the model terms stored in the
> object are wrong, still including the intercept:
>
> Browse[1]> str(mt)
> Classes 'terms', 'formula' length 3 cbind(SAT, PPVT, Raven) ~ n + s + ns
> + na + ss
>    ..- attr(*, "variables")= language list(cbind(SAT, PPVT, Raven), n,
> s, ns, na, ss)
>    ..- attr(*, "factors")= int [1:6, 1:5] 0 1 0 0 0 0 0 0 1 0 ...
>    .. ..- attr(*, "dimnames")=List of 2
>    .. .. ..$ : chr [1:6] "cbind(SAT, PPVT, Raven)" "n" "s" "ns" ...
>    .. .. ..$ : chr [1:5] "n" "s" "ns" "na" ...
>    ..- attr(*, "term.labels")= chr [1:5] "n" "s" "ns" "na" ...
>    ..- attr(*, "order")= int [1:5] 1 1 1 1 1
>    ..- attr(*, "intercept")= int 1
>    ..- attr(*, "response")= int 1
>    ..- attr(*, ".Environment")=<environment: R_GlobalEnv>
>    ..- attr(*, "predvars")= language list(cbind(SAT, PPVT, Raven), n, s,
> ns, na, ss)
>    ..- attr(*, "dataClasses")= Named chr [1:6] "nmatrix.3" "numeric"
> "numeric" "numeric" ...
>    .. ..- attr(*, "names")= chr [1:6] "cbind(SAT, PPVT, Raven)" "n" "s"
> "ns" ...
> Browse[1]>
>
>
> On 1/29/2013 8:44 AM, Michael Friendly wrote:
>> I'm trying to write a formula method for canonical correlation analysis,
>> that could be called similarly to lm() for
>> a multivariate response:
>>
>> cancor(cbind(y1,y2,y3) ~ x1+x2+x3+x4, data=, ...)
>> or perhaps more naturally,
>> cancor(cbind(y1,y2,y3) ~ cbind(x1,x2,x3,x4), data=, ...)
>>
>> I've adapted the code from lm() to my case, but in this situation, it
>> doesn't make sense to
>> include an intercept, since X & Y are mean centered by default in the
>> computation.
>>
>> In the code below, I can't see where the intercept gets included in the
>> model matrix and therefore
>> how to suppress it.  There is a test case at the end, showing that the
>> method fails when called
>> normally, but works if I explicitly use -1 in the formula.  I could hack
>> the result of model.matrix(),
>> but maybe there's an easier way?
>>
>> cancor <- function(x, ...) {
>>      UseMethod("cancor", x)
>> }
>>
>> cancor.default <- candisc:::cancor
>>
>> # TODO: make cancisc::cancor() use x, y, not X, Y
>> cancor.formula <- function(formula, data, subset, weights,
>>          na.action,
>>          method = "qr",
>>      model = TRUE,
>>      x = FALSE, y = FALSE, qr = TRUE,
>>      contrasts = NULL, ...) {
>>
>>      cl <- match.call()
>>      mf <- match.call(expand.dots = FALSE)
>>      m <- match(c("formula", "data", "subset", "weights", "na.action"),
>> names(mf), 0L)
>>      mf <- mf[c(1L, m)]
>>
>>      mf[[1L]] <- as.name("model.frame")
>>      mf <- eval(mf, parent.frame())
>>
>>      mt <- attr(mf, "terms")
>>      y <- model.response(mf, "numeric")
>>      w <- as.vector(model.weights(mf))
>>      if (!is.null(w) && !is.numeric(w))
>>          stop("'weights' must be a numeric vector")
>>
>>      x <- model.matrix(mt, mf, contrasts)
>>      # fixup to remove intercept???
>>      z <- if (is.null(w))
>>          cancor.default(x, y,  ...)
>>      else stop("weights are not yet implemented")  # lm.wfit(x, y, w,
>> ...)
>>
>>      z$call <- cl
>>      z$terms <- mt
>>          z
>> }
>>
>> TESTME <- FALSE
>> if (TESTME) {
>>
>> # need to get latest version, 0.6-1 from R-Forge
>> install.packages("candisc", repo="http://R-Forge.R-project.org")
>> library(candisc)
>>
>> data(Rohwer)
>>
>> # this bombs: needs intercept removed
>> cc <- cancor.formula(cbind(SAT, PPVT, Raven) ~  n + s + ns + na + ss,
>> data=Rohwer)
>> ## Error in chol.default(Rxx) :
>> ##  the leading minor of order 1 is not positive definite
>>
>> #this works as is
>> cc <- cancor.formula(cbind(SAT, PPVT, Raven) ~  -1 + n + s + ns + na +
>> ss, data=Rohwer)
>> cc
>> ## Canonical correlation analysis of:
>> ##       5   X  variables:  n, s, ns, na, ss
>> ##   with        3   Y  variables:  SAT, PPVT, Raven
>> ##
>> ##     CanR  CanRSQ   Eigen percent    cum
>> ## 1 0.6703 0.44934 0.81599   77.30  77.30
>> ## 2 0.3837 0.14719 0.17260   16.35  93.65
>> ## 3 0.2506 0.06282 0.06704    6.35 100.00
>> ##
>> ## Test of H0: The canonical correlations in the
>> ## current row and all that follow are zero
>> ##
>>    ...
>> }
>>
>>
>
>


-- 
Kevin E. Thorpe
Head of Biostatistics,  Applied Health Research Centre (AHRC)
Li Ka Shing Knowledge Institute of St. Michael's
Assistant Professor, Dalla Lana School of Public Health
University of Toronto
email: kevin.thorpe at utoronto.ca  Tel: 416.864.5776  Fax: 416.864.3016



More information about the R-help mailing list