[R] Comparison: glm() vs. bigglm()
    Benilton Carvalho 
    bcarvalh at jhsph.edu
       
    Fri Jun 29 09:14:12 CEST 2007
    
    
  
Hi,
Until now, I thought that the results of glm() and bigglm() would  
coincide. Probably a naive assumption?
Anyways, I've been using bigglm() on some datasets I have available.  
One of the sets has >15M observations.
I have 3 continuous predictors (A, B, C) and a binary outcome (Y).  
And tried the following:
m1 <- bigglm(Y~A+B+C, family=binomial(), data=dataset1, chunksize=10e6)
m2 <- bigglm(Y~A*B+C, family=binomial(), data=dataset1, chunksize=10e6)
imp <- m1$deviance-m2$deviance
For my surprise "imp" was negative.
I then tried the same models, using glm() instead... and as I  
expected, "imp" was positive.
I also noticed differences on the coefficients estimated by glm() and  
bigglm() - small differences, though, and CIs for the coefficients (a  
given coefficient compared across methods) overlap.
Are such incrongruences expected? What can I use to check for  
convergence with bigglm(), as this might be one plausible cause for a  
negative difference on the deviances?
Thank you very much,
-benilton
 > sessionInfo()
R version 2.5.0 (2007-04-23)
x86_64-unknown-linux-gnu
locale:
LC_CTYPE=en_US.iso885915;LC_NUMERIC=C;LC_TIME=en_US.iso885915;LC_COLLATE 
=en_US.iso885915;LC_MONETARY=en_US.iso885915;LC_MESSAGES=en_US.iso885915 
;LC_PAPER=en_US.iso885915;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASU 
REMENT=en_US.iso885915;LC_IDENTIFICATION=C
attached base packages:
[1] "stats"     "graphics"  "grDevices" "utils"     "datasets"   
"methods"
[7] "base"
other attached packages:
biglm
"0.4"
    
    
More information about the R-help
mailing list