summary() of lm() problem (PR#135)

fears@roycastle.liv.ac.uk fears@roycastle.liv.ac.uk
Tue, 9 Mar 1999 14:20:59 +0100


Debuggers,

I wrote to r-help about this and was appropriately told off by Peter
Dalgaard. I append that mail in case you have not seen it.

Following Peter's advice I have attempted to simplify the problem.

First note that the following does *not* fail (by which I mean crash, as
in generate a memory access violation):

> tmp<-matrix(c(1,0,0,1,1,1),2,3)
> dimnames(tmp)<-list(NULL,c('yvar','x1','x2'))
> lm(tmp[,'yvar']~tmp[,'x1']+tmp[,'x2'])
> summary(.Last.value)

I tried to cut down my original data set to just the first ten rows to
make it manageable to transmit. Of course then when I ran lm() there
were NA estimates. Thus I wasn't totally surprised that summary() would
have trouble. But, unlike the above, it crashes fatally.

Thinking to reproduce this very simply, I used (sorry quick and dirty, I
know there's a way to use paste to give the model formula):

> tmp<-matrix(c(1,0,0,1,rep(1,56)),2,30)
> dimnames(tmp)<-list(NULL,paste('x',1:30,sep=''))
> lm(tmp[, "x1"] ~ tmp[, "x2"] + tmp[, "x3"] + tmp[,     "x4"] + tmp[,
"x4"] + tmp[, "x5"] + tmp[, "x6"] + tmp[, "x7"] +     tmp[, "x8"] +
tmp[, "x9"] + tmp[, "x10"] + tmp[, "x11"] +     tmp[, "x12"] + tmp[,
"x13"] + tmp[, "x14"] + tmp[, "x15"] +     tmp[, "x16"] + tmp[, "x17"] +
tmp[, "x18"] + tmp[, "x19"] +     tmp[, "x20"]+tmp[, "x21"]+tmp[,
"x22"]+tmp[, "x23"]+tmp[, "x24"]+tmp[, "x25"]+tmp[, "x26"]+tmp[,
"x27"]+tmp[, "x28"]+tmp[, "x29"]+tmp[, "x30"])
> summary(.Last.value)

But this has no problem. So it doesn't seem to be singularity of X or
the length of the model formula at fault (my problem data has 27
variables).

What follows now is what *does* give a fault. The data (in sasch2) is
truncated to just the first 10 rows. I made it so the modified dataset
is called sasch2 so that I could cut and paste the exact same lm() call:

> tmp<-sasch2[1:10,]
> holdsasch2<-sasch2
> sasch2<-tmp
> dump('sasch2','bugs.dump')
> lm(sasch2[, "ddiff"] ~ sasch2[, "td30"] + sasch2[,     "td60"] +
sasch2[, "td90"] + sasch2[, 
+ "td120"] + +sasch2[,     "td180"] + sasch2[, "td240"] + sasch2[,
"td300"] + sasch2[,     "td360"] + 
+ sasch2[, "td420"] + sasch2[, "td480"] + sasch2[,     "db1"] + sasch2[,
"db1.5"] + sasch2[, "db2"] + 
+ sasch2[, "db2.5"] +     +sasch2[, "db3.5"] + sasch2[, "db4"] +
sasch2[, "db4.5"] +     sasch2[, "db5"] + 
+ sasch2[, "db5.5"] + +sasch2[, "db6"] +     sasch2[, "db6.5"] +
sasch2[, "db7"] + sasch2[, "db7.5"] +     
+ sasch2[, "db8"] + +sasch2[, "db8.5"] + sasch2[, "db9"] +     sasch2[,
"db9.5"])
> summary(.Last.value)

Dr Watson tells me the access violation is 0xc0000005 at address
0x2020200 (whatever that means; I last used machine code on a PDP 8).

The data as dumped now follows (though it is anonymised, please continue
to treat the data as confidential). After that is my original report to
r-help.

I'd be very happy to provide any other info you might need to know. PD
suggested running on Unix but my version there is hopelessly out of date
since I started using NT. I hope I've given you enough here to try it
for yourself on the latest unix version without too much bother.

"sasch2" <-
structure(c(1, 1, 1, 2, 2, 3, 4, 5, 5, 6, 3, 2, 0.5, 0, 2, 4, 
6, 1.5, 6, 5, 225, 175, 180, 255, 140, 140, 236, 315, 90, 190, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 0, 0, 1, 0, 
0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 2, 5, 7, 3, 3, 5, 4, 2.5, 4, 3, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 
0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 
1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 611231, 611231, 611231, 611041, 611041, 
611553, 605829, 604881, 604881, 612966, 4, 4, 4, 4, 4, 4, 3, 
3, 3, 2, 25, 25, 25, 29, 29, 29, 31, 28, 28, 18, 289, 289, 289, 
279, 279, 289, 296, 268, 268, 281, 1, 1, 1, 2, 2, 2, 2, 1, 1, 
1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
655, 655, 655, 780, 780, 180, 347, 295, 295, 240, 1, 1, 1, 1, 
1, 0, 0, 0, 0, 0, 1, 1, 1, 2, 2, 0, 0, 0, 0, 0, 1, 1, 1, 2, 2, 
0, 0, 0, 0, 0, 5, 5, 5, 4, 4, 4, 4, 1, 1, 1, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 2, 2, 2, 3, 3, 1, 2, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 4, 4, 5, 5, 3, 4, 3, 3, 
2, 234, 234, 234, 24, 24, 12, 17, 127, 127, 27, 1, 1, 1, 1, 1, 
3, 3, 1, 1, 1, 2, 2, 2, 2, 2, 4, 4, 2, 2, 0, 300, 300, 300, 100, 
100, 200, 500, 200, 200, 400, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 
1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 809782, 
809782, 809782, 809790, 809790, 809797, 809813, 809832, 809832, 
809840, 3600, 3600, 3600, 3740, 3740, 3280, 3090, 3100, 3100, 
4110, 4, 4, 4, 9, 9, 9, 9, 10, 10, 9, 9, 9, 9, 10, 10, 10, 9, 
10, 10, 10, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 7.38, 7.38, 7.38, 7.34, 
7.34, 7.27, 7.33, 7.3, 7.3, 7.29, NA, NA, NA, -3.4, -3.4, -5.8, 
-6.4, -2, -2, -7.2, 2, 2, 2, 3, 3, 5, 4, 2.5, 2.5, 3, 225, 225, 
225, 255, 255, 140, 236, 315, 315, 190, 5, 5, 5, 3, 3, 9, 10, 
4, 4, 8, 400, 400, 400, 395, 395, NA, NA, 405, 405, 210, 7, 7, 
7, 5, 5, NA, NA, 10, 10, 10, 580, 580, 580, NA, NA, NA, NA, NA, 
NA, NA, 7.5, 7.5, 7.5, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
315, 315, 190, NA, NA, NA, NA, NA, NA, NA, 4, 4, 8, 440, 440, 
440, 390, 390, NA, NA, NA, NA, NA, NA, NA, NA, 3, 3, NA, NA, 
NA, NA, NA), .Dim = c(10, 83), .Dimnames = list(NULL, c("pxid", 
"ddiff", "tdiff", "td30", "td60", "td90", "td120", "td180", "td240", 
"td300", "td360", "td420", "td480", "td2000", "dbase", "db1", 
"db1.5", "db2", "db2.5", "db3", "db3.5", "db4", "db4.5", "db5", 
"db5.5", "db6", "db6.5", "db7", "db7.5", "db8", "db8.5", "db9", 
"db9.5", "lwh.num", "partogram", "age", "gestation", "cervix", 
"effaced", "membranes", "rd.interval", "action.line", "synto", 
"synto.time", "rom", "blood", "ctg", "fbs", "fbs.no", "iupc", 
"amnioinf", "ves", "anaesth", "delivery.mode", "perineum", "blood.loss",

"transfus", "drugs", "retained.pl", "baby.no", "weight", "apgar1", 
"apgar5", "scbu", "ph.ven", "be.ven", "dilat1", "time2", "dilat2", 
"time3", "dilat3", "time4", "dilat4", "time5", "dilat5", "time6", 
"dilat6", "time7", "dilat7", "srom.time", "srom.dilat", "epi.time", 
"epi.dilat")))


************************************************************************

Under NT 4.0, using Version 0.63.2 Beta (Jan 12, 1999):

Not sure if this is a bug or a feature (forcing me to program less
clumsily) so I'll report it here rather than to bugs.

With a medium size data set (1700 observations,70 explanatory variables)
and plenty of memory, specifically

> gc()
          free   total
Ncells  886738 1000000
Vcells 7912909 8388608

I get a fatal error when attempting summary() on the fit of an lm() on a
large-ish set of dummy variables (stored in a matrix):

Call:
lm(formula = sasch2[, "ddiff"] ~ sasch2[, "td30"] + sasch2[,     "td60"]
+ sasch2[, "td90"] + sasch2[,  "td120"] + +sasch2[,     "td180"] +
sasch2[, "td240"] + sasch2[, "td300"] + sasch2[,     "td360"] +
sasch2[, "td420"] + sasch2[, "td480"] + sasch2[,     "db1"] + sasch2[,
"db1.5"] + sasch2[, "db2"] +  sasch2[, "db2.5"] +     +sasch2[, "db3.5"]
+ sasch2[, "db4"] + sasch2[, "db4.5"] +     sasch2[, "db5"] +  sasch2[,
"db5.5"] + +sasch2[, "db6"] +     sasch2[, "db6.5"] + sasch2[, "db7"] +
sasch2[, "db7.5"] +      sasch2[, "db8"] + +sasch2[, "db8.5"] + sasch2[,
"db9"] +     sasch2[, "db9.5"])

I get estimates OK, but summary() collapses. However, if I do the same
thing less clumsily, by writing all the relevant variables to a new data
frame, and then calling

Call:
lm(formula = ddiff ~ ., data = dtmp)

I get not only the estimates but can also summary() with no problem.

Any ideas why? Seems to be memory-linked, because I can lm() and
summary() the matrix versions using only the sasch2[,'td*'] or db*
variable sets.

Simon Fear

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._