[Rd] discrepancy between lm and MASS:rlm

William Dunlap wdunlap at tibco.com
Mon Mar 14 19:38:34 CET 2011


> -----Original Message-----
> From: r-devel-bounces at r-project.org 
> [mailto:r-devel-bounces at r-project.org] On Behalf Of Vadim Ogranovich
> Sent: Monday, March 14, 2011 10:37 AM
> To: 'r-devel at r-project.org'
> Subject: [Rd] discrepancy between lm and MASS:rlm
> 
> Dear R-devel,
> 
> There seems to be a discrepancy in the order in which lm and 
> rlm evaluate their arguments. This causes rlm to sometimes 
> produce an error where lm is just fine.

It may not be a problem with the order of evaluation.  rlm()
might not be calling model.frame() with drop.unused.levels=TRUE.
I've made that mistake before with similar symptoms.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com 

> 
> Here is a little script that illustrate the issue:
> 
> > library(MASS)
> > ## create data
> > n <- 100
> > dat <- data.frame(x=rep(c(-1,0,1), n), y=rnorm(3*n))
> >
> > ## call lm, works fine
> > summary(lm(y ~ as.factor(x), data=dat, subset=x!=0))
> 
> Call:
> lm(formula = y ~ as.factor(x), data = dat, subset = x != 0)
> 
> Residuals:
>      Min       1Q   Median       3Q      Max
> -2.60619 -0.82160  0.06307  0.65501  2.56677
> 
> Coefficients:
>               Estimate Std. Error t value Pr(>|t|)
> (Intercept)   0.061010   0.100027   0.610    0.543
> as.factor(x)1 0.001332   0.141459   0.009    0.992
> 
> Residual standard error: 1 on 198 degrees of freedom
> Multiple R-squared: 4.479e-07,  Adjusted R-squared: -0.00505
> F-statistic: 8.868e-05 on 1 and 198 DF,  p-value: 0.9925
> 
> > ## call rlm, error
> > summary(rlm(y ~ as.factor(x), data=dat, subset=x!=0))
> Error in rlm.default(x, y, weights, method = method, 
> wt.method = wt.method,  :
>   'x' is singular: singular fits are not implemented in rlm
> >
> 
> 
> My guess is that rlm first converts x to a factor, which 
> becomes a three-level factor, then subsets on x!=0, which 
> effectively eliminates a level, and then creates a 
> "regression" matrix, which becomes singular due to the 
> absence of data for a level.
> 
> Is there a simple way to work around it. The simplest I could 
> think of is
> with(subset(dat, x!=0), rlm(y ~ as.factor(x))
> which is ok, but most of my scripts make use of data arg to 
> regressions and I'd like to stay consistent as much as practical.
> 
> Thanks,
> Vadim
> 
> 
> Note: This email is for the confidential use of the named 
> addressee(s) only and may contain proprietary, confidential 
> or privileged information. If you are not the intended 
> recipient, you are hereby notified that any review, 
> dissemination or copying of this email is strictly 
> prohibited, and to please notify the sender immediately and 
> destroy this email and any attachments.  Email transmission 
> cannot be guaranteed to be secure or error-free.  Jump 
> Trading, therefore, does not make any guarantees as to the 
> completeness or accuracy of this email or any attachments.  
> This email is for informational purposes only and does not 
> constitute a recommendation, offer, request or solicitation 
> of any kind to buy, sell, subscribe, redeem or perform any 
> type of transaction of a financial product.
> 
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
> 



More information about the R-devel mailing list