[R] A comment about R:

Thu Jan 5 17:17:52 CET 2006

Dear Peter,

> -----Original Message-----
> From: r-help-bounces at stat.math.ethz.ch 
> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Peter 
> Muhlberger
> Sent: Wednesday, January 04, 2006 2:43 PM
> To: rhelp
> Subject: [R] A comment about R:
> 

. . .

> Ex. 1)  Wald tests of linear hypotheses after max. likelihood 
> or even after a regression.  "Wald" does not even appear in 
> my standard R package on a search.  There's no comment in the 
> lm help or optim help about what function to use for 
> hypothesis tests.  I know that statisticians prefer 
> likelihood ratio tests, but Wald tests are still useful and 
> indeed crucial for first-pass analysis.  After searching with 
> Google for some time, I found several Wald functions in 
> various contributed R packages I did not have installed.  One 
> confusion was which one would be relevant to my needs.  This 
> took some time to resolve.  I concluded, perhaps on 
> insufficient evidence, that package car's Wald test would be 
> most helpful.  To use it, however, one has to put together a 
> matrix for the hypotheses, which can be arduous for a 
> many-term regression or a complex hypothesis.  
> In comparison, 
> in Stata one simply states the hypothesis in symbolic terms.  
> I also don't know for certain that this function in car will 
> work or work properly w/ various kinds of output, say from lm 
> or from optim.  To be sure, I'd need to run time-consuming 
> tests comparing it with Stata output or examine the 
> function's code.  In Stata the test is easy to find, and 
> there's no uncertainty about where it can be run or its 
> accuracy.  Simply having a comment or "see also" in lm help 
> or mle or optim help pointing the user to the right Wald 
> function would be of enormous help.
> 

The reference, I believe, is to the linear.hypothesis() function, which has
methods for lm and glm objects. [To see what kinds of objects
linear.hypothesis is suitable for, use the command
methods(linear.hypothesis).] For lm objects, you get an F-test by default.
Note that the Anova() function, also in car, can more conveniently compute
Wald tests for certain kinds of hypotheses. More generally, however, I'd be
interested in your suggestions for an alternative method of specifying
linear hypotheses. There is currently no method for mle objects, but adding
one is a good idea, and I'll do that when I have a chance. (In the meantime,
it's very easy to compute Wald tests from the coefficients and the
hypothesis and coefficient-covariance matrices. Writing a small function to
do so, without the bells and whistles of something like linear.hypothesis(),
should not be hard. Indeed, the ability to do this kind of thing easily is
what I see as the primary advantage of working in a statistical computing
environment like R -- or Stata.

> Ex. 2) Getting neat output of a regression with Huberized 
> variance matrix.
> I frequently have to run regressions w/ robust variances.  In 
> Stata, one simply adds the word "robust" to the end of the 
> command or "cluster(cluster.variable)" for a cluster-robust 
> error.  In R, there are two functions, robcov and hccm.  I 
> had to run tests to figure out what the relationship is 
> between them and between them and Stata (robcov w/o cluster 
> gives hccm's hc0; hccm's hc1 is equivalent to Stata's 
> 'robust' w/o cluster; etc.).  A single sentence in hccm's 
> help saying something to the effect that statisticians prefer 
> hc3 for most types of data might save me from having to 
> scramble through the statistical literature to try to figure 
> out which of these I should be using.  A few sentences on 
> what the differences are between these methods would be even 
> better.  Then, there's the problem of output.  Given that hc1 
> or hc3 are preferred for non-clustered data, I'd need to be 
> able to get regression output of the form summary(lm) out of 
> hccm, for any practical use.  Getting this, however, would 
> require programming my own function.  Huberized t-stats for 
> regressions are commonplace needs, an R oriented a little 
> toward more everyday needs would not require programming of 
> such needs.  Also, I'm not sure yet how well any of the 
> existing functions handle missing data.
> 

I think that we have a philosophical difference here: I don't like giving
advice in documentation. An egregious extended example of this, in my
opinion, is the SPSS documentation. The hccm() function uses hc3 as the
default, which is an implicit recommendation, but more usefully, in my view,
points to Long and Erwin's American Statistician paper on the subject, which
does give advice and which is quite accessible. As well, and more generally,
the car package is associated with a book (my R and S-PLUS Companion to
Applied Regression), which gives advice, though, admittedly, tersely in this
case.

The Anova() function with argument white=TRUE will give you F-tests
corresponding to the t-tests to which you refer (though it will combine df
for multiple-df terms in the model). To get the kind of summary you
describe, you could use something like

mysummary <- function(model){
	coef <- coef(model)	
	se <- sqrt(diag(hccm(model)))
	t <- coef/se
	p <- 2*pt(abs(t), df=model$df.residual, lower=FALSE)
	table <- cbind(coef, se, t, p)
	rownames(table) <- names(coef)
	colnames(table) <- c("Estimate", "Std. Error", "t value",
"Pr(>|t|)")
	table
	}

Again, it's not time-consuming to write simple functions like this for one's
own use, and the ability to do so is a strength of R, in my view. 

I'm not sure what you mean about handling missing data: functions like
hccm(), linear.hypothesis(), and Anova() start with a model object for which
missing data have already been handled.

Regards,
 John