[Rd] Using response variable in interaction as explanatory variable in glm crashes R

Scott Kostyshak skostyshak at ufl.edu
Tue Oct 10 19:24:56 CEST 2017


On Mon, Oct 09, 2017 at 03:52:43PM +0000, Martin Maechler wrote:
> >>>>> Jan van der Laan <rhelp at eoos.dds.nl>
> >>>>>     on Fri, 6 Oct 2017 12:13:39 +0200 writes:
> 
>     > It is actually model.matrix that crashes, not glm. Same
>     > crash occurs with e.g. lm.
> 
>     > model.matrix(dob_mon ~ dob_day*dob_mon, data = tab)
> 
>     > also crashes R.
> 
> Yes, segmentation fault.
> 
> It only happens when these are *logical* variables, not, e.g., when
> transformed to integer.
> 
> The C code in src/library/stats/src/model.c  tries to eliminate
> occurances of the LHS of the formula from the RHS when building
> the model matrix and it does work fine in the integer case.
> 
> Part of the culprit code may be this (from line 717),
> with the  isLogical(.) which in our case, shifts the pointer by
> 1  in the call to firstfactor() :
> 
> 			int adj = isLogical(var_i)?1:0;
> 			// avoid overflow of jstart * nn PR#15578
> 			firstfactor(&rx[jstart * nn], n, jnext - jstart,
> 				    REAL(contrast), nrows(contrast),
> 				    ncols(contrast), INTEGER(var_i)+adj);
> 
> then in firstfactor(), we see the segfault (when running R with
> '-d gdb') :
> 
>     > model.matrix(dob_mon ~ dob_day*dob_mon, data = tab)
> 
>   Program received signal SIGSEGV, Segmentation fault.
>   0x00007fffeafa76b5 in firstfactor (ncx=0, v=0x5c3b37c, ncc=1, nrc=2, c=0x5c90008, 
>    nrx=8, x=0x5cbf150) at ../../../../../R/src/library/stats/src/model.c:252
>     252		    else xj[i] = cj[v[i]-1];
>     Missing separate debuginfos, .................
>     (gdb) list
>     247	    for (int j = 0; j < ncc; j++) {
>     248		xj = &x[j * (R_xlen_t)nrx];
>     249		cj = &c[j * (R_xlen_t)nrc];
>     250		for (int i = 0; i < nrx; i++)
>     251		    if(v[i] == NA_INTEGER) xj[i] = NA_REAL;
>     252		    else xj[i] = cj[v[i]-1];
>     253	    }
>     254	}
>     255	
> 
> and indeed in the debugger,  i=7  and  v[i] is "outside", v[]
> being of length 7, hence indexed 0:6.

Dear Martin,

I just wanted to thank you for providing details on your approach to
debugging. Often I see bug fixes and I wonder "how the heck did they
figure that out?" so I am very excited when I see details like these on
the process (and not just the end result), so that I can learn.

Best,

Scott


-- 
Scott Kostyshak
Assistant Professor of Economics
University of Florida
https://people.clas.ufl.edu/skostyshak/



More information about the R-devel mailing list