[R] Using MASS::boxcox for a single variable gives different results than the original paper

Tal Galili tal.galili at gmail.com
Mon Oct 12 16:30:42 CEST 2015


Peter and John - thank you both for the answers.


Tal




----------------Contact
Details:-------------------------------------------------------
Contact me: Tal.Galili at gmail.com |
Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
www.r-statistics.com (English)
----------------------------------------------------------------------------------------------


On Mon, Oct 12, 2015 at 4:45 PM, Fox, John <jfox at mcmaster.ca> wrote:

> Dear Tal,
>
> MASS:boxcox() evaluates the pseudo-log-likelihood at a pre-specified
> vector of values of the transformation parameter lambda. In your example,
>
> > head(a$x)
> [1] -2.000000 -1.959596 -1.919192 -1.878788 -1.838384 -1.797980
>
> Which accounts, I think, for the small difference in the answer.
>
> I hope this helps,
>  John
>
> -----------------------------
> John Fox, Professor
> McMaster University
> Hamilton, Ontario
> Canada L8S 4M4
> Web: socserv.mcmaster.ca/jfox
>
>
>
> > -----Original Message-----
> > From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Tal
> Galili
> > Sent: October 12, 2015 9:32 AM
> > To: r-help at r-project.org
> > Subject: Re: [R] Using MASS::boxcox for a single variable gives
> different results
> > than the original paper
> >
> > After trying this with the function "estimateTransform" from {car}, it
> returns
> > values similar to my solution rather than the one from MASS::boxcox:
> >
> >
> > # Toy data
> > ################
> > set.seed(13241089)
> > x <- rnorm(1000, 10)
> > x2 <- x**2 # we want to transform x2 to something more normal
> >
> >
> >
> > # using MASS::boxcox
> > ################
> >
> > mle <- function(BC) {
> > with(BC, x[which.max(y)])
> > }
> >
> > ONES <- rep(1, length(x2))
> > a <- MASS::boxcox(lm(x2 ~ ONES))
> > mle(a)
> > # lambda:
> > # 0.42424
> >
> >
> >
> > # using estimateTransform from car
> > ################
> >
> > # Same result as the paper: !
> > library(car)
> > ONES <- rep(1, length(x2))
> > estimateTransform(X=data.frame(x = ONES), Y = x2) # lambda:
> > # 0.40782
> >
> > (just as with my own function in the previous email)
> >
> >
> >
> > What am I missing?
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > ----------------Contact
> > Details:-------------------------------------------------------
> > Contact me: Tal.Galili at gmail.com |
> > Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
> > www.r-statistics.com (English)
> >
> ----------------------------------------------------------------------------------------------
> >
> >
> > On Mon, Oct 12, 2015 at 12:17 PM, Tal Galili <tal.galili at gmail.com>
> wrote:
> >
> > > Hello all,
> > >
> > > Given a set of observations, I would like to find lambda for a boxcox
> > > transformation so to get a symmetric/normal result as possible.
> > >
> > > I try to use MASS::boxcox, but get different results than when using
> > > the formula from the original box-cox paper (link
> > > <http://www.jstor.org/stable/2984418?seq=1#page_scan_tab_contents>).
> > >
> > > I probably have made an error somewhere, but I can't figure out where.
> > >
> > > Here is an example in which the lambda by MASS::boxcox is 0.42424,
> > > while by the formula from the paper I get 0.40782.
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > # Toy data
> > > ################
> > > set.seed(13241089)
> > > x <- rnorm(1000, 10)
> > > x2 <- x**2 # we want to transform x2 to something more normal
> > > plot(density(x2))
> > >
> > > # using MASS::boxcox
> > > ################
> > >
> > > zpoints <- function(y) {
> > > n <- length(y)
> > > qnorm(ppoints(n))[order(order(y))]
> > > }
> > > mle <- function(BC) {
> > > with(BC, x[which.max(y)])
> > > }
> > >
> > > a <- MASS::boxcox(x2 ~ zpoints(x2))
> > > mle(a)
> > > # lambda:
> > > # 0.42424
> > >
> > >
> > >
> > > # using formula from the paper
> > > ################
> > >
> > > loglik_lambda <- function(l, y) {
> > > GM <- exp(mean(log(y)))
> > > if(l==0) x <- log(y)*GM else x <- (y^l-1)/ (l * GM^(l-1) ) #  if(l==0)
> > > x <- log(y) else x <- (y^l-1)/ (l )
> > > sd(x)
> > > }
> > > fo <- function(l) loglik_lambda(l, y = x2) V_fo <- Vectorize(fo)
> > > V_fo(2)
> > > curve(V_fo, -.5,1.5)
> > > optimize(V_fo, c(-3,3))
> > > # lambda:
> > > # 0.40782
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> >
> >       [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list