[R] printCoefmat() and zap.ind

Martin Maechler m@ech|er @end|ng |rom @t@t@m@th@ethz@ch
Fri Jul 7 18:12:24 CEST 2023


>>>>> Shu Fai Cheung 
>>>>>     on Thu, 6 Jul 2023 17:14:27 +0800 writes:

    > Hi All,

    > I would like to ask two questions about printCoefmat().

Good... this function, originally named print.coefmat(),
is 25 years old (in R) now:

  --------------------------------------------------------------------
  r1902 | maechler | 1998-08-14 19:19:05 +0200 (Fri, 14 Aug 1998) |
  Changed paths:
     M R-0-62-patches/CHANGES
     M R-0-62-patches/src/library/base/R/anova.R
     M R-0-62-patches/src/library/base/R/glm.R
     M R-0-62-patches/src/library/base/R/lm.R
     M R-0-62-patches/src/library/base/R/print.R

  print.coefmat(.) about ok
  --------------------------------------------------------------------

  (yes, at the time, the 'stats' package did not exist yet ..)

so it may be a good time to look at it.


    > First, I found a behavior of printCoefmat() that looks strange to me,
    > but I am not sure whether this is an intended behavior:

    > ``` r
    > set.seed(5689417)
    > n <- 10000
    > x1 <- rnorm(n)
    > x2 <- rnorm(n)
    > y <- .5 * x1 + .6 * x2 + rnorm(n, -0.0002366, .2)
    > dat <- data.frame(x1, x2, y)
    > out <- lm(y ~ x1 + x2, dat)
    > out_summary <- summary(out)
    > printCoefmat(out_summary$coefficients)
    > #>               Estimate Std. Error t value Pr(>|t|)
    > #> (Intercept) 1.7228e-08 1.9908e-03    0.00        1
    > #> x1          5.0212e-01 1.9715e-03  254.70   <2e-16 ***
    > #> x2          6.0016e-01 1.9924e-03  301.23   <2e-16 ***
    > #> ---
    > #> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

    > printCoefmat(out_summary$coefficients,
    > zap.ind = 1,
    > digits = 4)
    > #>             Estimate Std. Error t value Pr(>|t|)
    > #> (Intercept) 0.000000   0.001991     0.0        1
    > #> x1          0.502100   0.001971   254.7   <2e-16 ***
    > #> x2          0.600200   0.001992   301.2   <2e-16 ***
    > #> ---
    > #> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    > ```

    > With zap.ind = 1, the values in "Estimate" were correctly
    > zapped using digits = 4. However, by default, "Estimate"
    > and "Std. Error" are formatted together. Because the
    > standard errors are small, with digits = 4, zero's were added
    > to values in "Estimate", resulting in "0.502100" and
    > "0.600200", which are misleading because, if rounded to
    > the 6th decimal place, the values to be displayed should
    > be "0.502122" and "0.600162".

    > Is this behavior of printCoefmat() intended/normal?

Yes, this is "normal" in the sense that zapsmall() is used.
I'm not even sure anymore if I was always aware 1998 what exactly the
simple zapsmall() function is doing.
It does not do what you want here (and actually *typically* want
for formatting numbers for display, plotting, etc):
You "really want" here and in such situations

  zapOnlysmall <- function(x, dig) {
      x[abs(x) <= 10^-dig] <- 0
      x
  }

and I think I'd replace the use of zapsmall() inside
printCoefmat() with something like zapOnlysmall() above.

This will indeed nicely solve your problem.


    > Second, how can I use zap without this behavior?
    > In cases like the one above, I need to use zap such that
    > the intercept will not be displayed in scientific notation.
    > Disabling scientific notation cannot achieve the desired
    > goal.


    > I tried adding cs.ind = 1:

well, from the help page   ?printCoefmat  

cs.ind is really about the [ind]ices of [c]oefficient + [s]cale or [s]td.err
So, for lm() you should not have to set cs.ind but rather keep
it at it's smart default of cs.ind = 1:2 .


    > ```r
    > printCoefmat(out_summary$coefficients,
    > zap.ind = 1,
    > digits = 4,
    > cs.ind = 1)
    > #>             Estimate Std. Error t value Pr(>|t|)
    > #> (Intercept)   0.0000   0.001991     0.0        1
    > #> x1            0.5021   0.001971   254.7   <2e-16 ***
    > #> x2            0.6002   0.001992   301.2   <2e-16 ***
    > #> ---
    > #> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    > ```

    > However, this solution is not ideal because the numbers
    > of decimal places of "Estimate" and "Std. Error" are
    > different. How can I get the output like this one?


    > ```r
    > #>             Estimate Std. Error t value Pr(>|t|)
    > #> (Intercept)   0.0000   0.0020     0.0        1
    > #> x1            0.5021   0.0020   254.7   <2e-16 ***
    > #> x2            0.6002   0.0020   301.2   <2e-16 ***
    > ```

    > Thanks for your attention.

    > Regards,
    > Shu Fai Cheung

Thank you, Shu Fai,
for your careful and thoughtful report!

Best regards,
Martin



More information about the R-help mailing list