summary() of lm() problem (PR#135)
Tue, 9 Mar 1999 14:20:59 +0100


I wrote to r-help about this and was appropriately told off by Peter
Dalgaard. I append that mail in case you have not seen it.

Following Peter's advice I have attempted to simplify the problem.

First note that the following does *not* fail (by which I mean crash, as
in generate a memory access violation):

> tmp<-matrix(c(1,0,0,1,1,1),2,3)
> dimnames(tmp)<-list(NULL,c('yvar','x1','x2'))
> lm(tmp[,'yvar']~tmp[,'x1']+tmp[,'x2'])
> summary(.Last.value)

I tried to cut down my original data set to just the first ten rows to
make it manageable to transmit. Of course then when I ran lm() there
were NA estimates. Thus I wasn't totally surprised that summary() would
have trouble. But, unlike the above, it crashes fatally.

Thinking to reproduce this very simply, I used (sorry quick and dirty, I
know there's a way to use paste to give the model formula):

> tmp<-matrix(c(1,0,0,1,rep(1,56)),2,30)
> dimnames(tmp)<-list(NULL,paste('x',1:30,sep=''))
> lm(tmp[, "x1"] ~ tmp[, "x2"] + tmp[, "x3"] + tmp[,     "x4"] + tmp[,
"x4"] + tmp[, "x5"] + tmp[, "x6"] + tmp[, "x7"] +     tmp[, "x8"] +
tmp[, "x9"] + tmp[, "x10"] + tmp[, "x11"] +     tmp[, "x12"] + tmp[,
"x13"] + tmp[, "x14"] + tmp[, "x15"] +     tmp[, "x16"] + tmp[, "x17"] +
tmp[, "x18"] + tmp[, "x19"] +     tmp[, "x20"]+tmp[, "x21"]+tmp[,
"x22"]+tmp[, "x23"]+tmp[, "x24"]+tmp[, "x25"]+tmp[, "x26"]+tmp[,
"x27"]+tmp[, "x28"]+tmp[, "x29"]+tmp[, "x30"])
> summary(.Last.value)

But this has no problem. So it doesn't seem to be singularity of X or
the length of the model formula at fault (my problem data has 27

What follows now is what *does* give a fault. The data (in sasch2) is
truncated to just the first 10 rows. I made it so the modified dataset
is called sasch2 so that I could cut and paste the exact same lm() call:

> tmp<-sasch2[1:10,]
> holdsasch2<-sasch2
> sasch2<-tmp
> dump('sasch2','bugs.dump')
> lm(sasch2[, "ddiff"] ~ sasch2[, "td30"] + sasch2[,     "td60"] +
sasch2[, "td90"] + sasch2[, 
+ "td120"] + +sasch2[,     "td180"] + sasch2[, "td240"] + sasch2[,
"td300"] + sasch2[,     "td360"] + 
+ sasch2[, "td420"] + sasch2[, "td480"] + sasch2[,     "db1"] + sasch2[,
"db1.5"] + sasch2[, "db2"] + 
+ sasch2[, "db2.5"] +     +sasch2[, "db3.5"] + sasch2[, "db4"] +
sasch2[, "db4.5"] +     sasch2[, "db5"] + 
+ sasch2[, "db5.5"] + +sasch2[, "db6"] +     sasch2[, "db6.5"] +
sasch2[, "db7"] + sasch2[, "db7.5"] +     
+ sasch2[, "db8"] + +sasch2[, "db8.5"] + sasch2[, "db9"] +     sasch2[,
> summary(.Last.value)

Dr Watson tells me the access violation is 0xc0000005 at address
0x2020200 (whatever that means; I last used machine code on a PDP 8).

The data as dumped now follows (though it is anonymised, please continue
to treat the data as confidential). After that is my original report to

I'd be very happy to provide any other info you might need to know. PD
suggested running on Unix but my version there is hopelessly out of date
since I started using NT. I hope I've given you enough here to try it
for yourself on the latest unix version without too much bother.

"sasch2" <-
structure(c(1, 1, 1, 2, 2, 3, 4, 5, 5, 6, 3, 2, 0.5, 0, 2, 4, 
6, 1.5, 6, 5, 225, 175, 180, 255, 140, 140, 236, 315, 90, 190, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 0, 0, 1, 0, 
0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 2, 5, 7, 3, 3, 5, 4, 2.5, 4, 3, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 
0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 
1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 611231, 611231, 611231, 611041, 611041, 
611553, 605829, 604881, 604881, 612966, 4, 4, 4, 4, 4, 4, 3, 
3, 3, 2, 25, 25, 25, 29, 29, 29, 31, 28, 28, 18, 289, 289, 289, 
279, 279, 289, 296, 268, 268, 281, 1, 1, 1, 2, 2, 2, 2, 1, 1, 
1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
655, 655, 655, 780, 780, 180, 347, 295, 295, 240, 1, 1, 1, 1, 
1, 0, 0, 0, 0, 0, 1, 1, 1, 2, 2, 0, 0, 0, 0, 0, 1, 1, 1, 2, 2, 
0, 0, 0, 0, 0, 5, 5, 5, 4, 4, 4, 4, 1, 1, 1, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 2, 2, 2, 3, 3, 1, 2, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 4, 4, 5, 5, 3, 4, 3, 3, 
2, 234, 234, 234, 24, 24, 12, 17, 127, 127, 27, 1, 1, 1, 1, 1, 
3, 3, 1, 1, 1, 2, 2, 2, 2, 2, 4, 4, 2, 2, 0, 300, 300, 300, 100, 
100, 200, 500, 200, 200, 400, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 
1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 809782, 
809782, 809782, 809790, 809790, 809797, 809813, 809832, 809832, 
809840, 3600, 3600, 3600, 3740, 3740, 3280, 3090, 3100, 3100, 
4110, 4, 4, 4, 9, 9, 9, 9, 10, 10, 9, 9, 9, 9, 10, 10, 10, 9, 
10, 10, 10, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 7.38, 7.38, 7.38, 7.34, 
7.34, 7.27, 7.33, 7.3, 7.3, 7.29, NA, NA, NA, -3.4, -3.4, -5.8, 
-6.4, -2, -2, -7.2, 2, 2, 2, 3, 3, 5, 4, 2.5, 2.5, 3, 225, 225, 
225, 255, 255, 140, 236, 315, 315, 190, 5, 5, 5, 3, 3, 9, 10, 
4, 4, 8, 400, 400, 400, 395, 395, NA, NA, 405, 405, 210, 7, 7, 
7, 5, 5, NA, NA, 10, 10, 10, 580, 580, 580, NA, NA, NA, NA, NA, 
NA, NA, 7.5, 7.5, 7.5, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
315, 315, 190, NA, NA, NA, NA, NA, NA, NA, 4, 4, 8, 440, 440, 
440, 390, 390, NA, NA, NA, NA, NA, NA, NA, NA, 3, 3, NA, NA, 
NA, NA, NA), .Dim = c(10, 83), .Dimnames = list(NULL, c("pxid", 
"ddiff", "tdiff", "td30", "td60", "td90", "td120", "td180", "td240", 
"td300", "td360", "td420", "td480", "td2000", "dbase", "db1", 
"db1.5", "db2", "db2.5", "db3", "db3.5", "db4", "db4.5", "db5", 
"db5.5", "db6", "db6.5", "db7", "db7.5", "db8", "db8.5", "db9", 
"db9.5", "lwh.num", "partogram", "age", "gestation", "cervix", 
"effaced", "membranes", "rd.interval", "action.line", "synto", 
"synto.time", "rom", "blood", "ctg", "fbs", "", "iupc", 
"amnioinf", "ves", "anaesth", "delivery.mode", "perineum", "blood.loss",

"transfus", "drugs", "", "", "weight", "apgar1", 
"apgar5", "scbu", "ph.ven", "be.ven", "dilat1", "time2", "dilat2", 
"time3", "dilat3", "time4", "dilat4", "time5", "dilat5", "time6", 
"dilat6", "time7", "dilat7", "srom.time", "srom.dilat", "epi.time", 


Under NT 4.0, using Version 0.63.2 Beta (Jan 12, 1999):

Not sure if this is a bug or a feature (forcing me to program less
clumsily) so I'll report it here rather than to bugs.

With a medium size data set (1700 observations,70 explanatory variables)
and plenty of memory, specifically

> gc()
          free   total
Ncells  886738 1000000
Vcells 7912909 8388608

I get a fatal error when attempting summary() on the fit of an lm() on a
large-ish set of dummy variables (stored in a matrix):

lm(formula = sasch2[, "ddiff"] ~ sasch2[, "td30"] + sasch2[,     "td60"]
+ sasch2[, "td90"] + sasch2[,  "td120"] + +sasch2[,     "td180"] +
sasch2[, "td240"] + sasch2[, "td300"] + sasch2[,     "td360"] +
sasch2[, "td420"] + sasch2[, "td480"] + sasch2[,     "db1"] + sasch2[,
"db1.5"] + sasch2[, "db2"] +  sasch2[, "db2.5"] +     +sasch2[, "db3.5"]
+ sasch2[, "db4"] + sasch2[, "db4.5"] +     sasch2[, "db5"] +  sasch2[,
"db5.5"] + +sasch2[, "db6"] +     sasch2[, "db6.5"] + sasch2[, "db7"] +
sasch2[, "db7.5"] +      sasch2[, "db8"] + +sasch2[, "db8.5"] + sasch2[,
"db9"] +     sasch2[, "db9.5"])

I get estimates OK, but summary() collapses. However, if I do the same
thing less clumsily, by writing all the relevant variables to a new data
frame, and then calling

lm(formula = ddiff ~ ., data = dtmp)

I get not only the estimates but can also summary() with no problem.

Any ideas why? Seems to be memory-linked, because I can lm() and
summary() the matrix versions using only the sasch2[,'td*'] or db*
variable sets.

Simon Fear

r-devel mailing list -- Read
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: