[R] Query on constrained regressions using -mgcv- and -pcls-

Tue Nov 3 06:28:09 CET 2020

As an addendum / erratum to my original post, the second block of code
should read for completeness:

set.seed(02102020)
N=500
M=10
rater=rep(1:M, each = N)
lead_n=as.factor(rep(1:N,M))
a=rep(rnorm(N),M)
z=rep(round(25+2*rnorm(N)+.2*a))
x=a+rnorm(N*M)
y=.5*x+5*a-.5*z+2*rnorm(N*M)
x_cl=rep(aggregate(x,list(lead_n) mean)[,2],M)
model=lm(y~x+x_cl+z)
summary(model)
y=1+1.5*x+4.6*x_cl-0.5*z
x.mat=cbind(rep(1,length(y)),x,x_cl,z)
ls.print(lsfit(x.mat,y,intercept=FALSE))
M=list(y=y,
w=rep(1, length(y)),
X=x.mat,
C=matrix(0,0,0),
p=rep(1, ncol(x.mat)),
off=array(0,0),
S=list(),
sp=array(0,0),
Ain=diag(ncol(x.mat)),
bin=rep(0, ncol(x.mat)) )
pcls(M)

However, all my questions stand.

Ta, Clive

On Tue, 3 Nov 2020 at 01:14, Clive Nicholas <clivelists using googlemail.com>
wrote:

> Hello all,
>
> I'll level with you: I'm puzzled!
>
> How is it that this constrained regression routine using -pcls- runs
> satisfactorily (courtesy of Tian Zheng):
>
> library(mgcv)
> options(digits=3)
> x.1=rnorm(100, 0, 1)
> x.2=rnorm(100, 0, 1)
> x.3=rnorm(100, 0, 1)
> x.4=rnorm(100, 0, 1)
> y=1+0.5*x.1-0.2*x.2+0.3*x.3+0.1*x.4+rnorm(100, 0, 0.01)
> x.mat=cbind(rep(1, length(y)), x.1, x.2, x.3, x.4)
> ls.print(lsfit(x.mat, y, intercept=FALSE))
> M=list(y=y,
> w=rep(1, length(y)),
> X=x.mat,
> C=matrix(0,0,0),
> p=rep(1, ncol(x.mat)),
> off=array(0,0),
> S=list(),
> sp=array(0,0),
> Ain=diag(ncol(x.mat)),
> bin=rep(0, ncol(x.mat)) )
> pcls(M)
> Residual Standard Error=0.0095
> R-Square=1
> F-statistic (df=5, 95)=314735
> p-value=0
>
>     Estimate Std.Err t-value Pr(>|t|)
>        1.000  0.0010  1043.9        0
> x.1    0.501  0.0010   512.6        0
> x.2   -0.202  0.0009  -231.6        0
> x.3    0.298  0.0010   297.8        0
> x.4    0.103  0.0011    94.8        0
>
> but this one does not for a panel dataset:
>
> set.seed(02102020)
> N=500
> M=10
> rater=rep(1:M, each = N)
> lead_n=as.factor(rep(1:N,M))
> a=rep(rnorm(N),M)
> z=rep(round(25+2*rnorm(N)+.2*a))
> x=a+rnorm(N*M)
> y=.5*x+5*a-.5*z+2*rnorm(N*M)
> x_cl=rep(aggregate(x,list(lead_n) mean)[,2],M)
> model=lm(y~x+x_cl+z)
> summary(model)
> y=1+1.5*x+4.6*x_cl-0.5*z
> x.mat=cbind(rep(1,length(y)),x,x_cl,z)
> ls.print(lsfit(x.mat,y,intercept=FALSE))
>
> Residual Standard Error=0
> R-Square=1
> F-statistic (df=4, 4996)=5.06e+30
> p-value=0
>
>      Estimate Std.Err   t-value Pr(>|t|)
>           1.0       0  2.89e+13        0
> x         0.8       0  2.71e+14        0
> x_cl      4.6       0  1.18e+15        0
> z        -0.5       0 -3.63e+14        0
>
> ?
>
> There shouldn't be anything wrong with the second set of data, unless I've
> missed something obvious (that constraints don't work for panel data? Seems
> unlikely to me)!
>
> Also:
>
> (1) I'm ultimately looking just to constrain ONE coefficient whilst
> allowing the other coefficients to be unconstrained (I tried this with the
> first dataset by setting
>
> y=1+0.5*x.1-x.2+x.3+x.4
>
> in the call, but got similar-looking output to what I got in the second
> dataset); and
>
> (2) it would be really useful to have the call to -pcls(M)- produce more
> informative output (SEs, t-values, fit stats, etc).
>
> Many thanks in anticipation of your expert help and being told what a
> clueless berk I am,
> Clive
>
> --
> Clive Nicholas
>
> "My colleagues in the social sciences talk a great deal about methodology.
> I prefer to call it style." -- Freeman J. Dyson
>

-- 
Clive Nicholas

"My colleagues in the social sciences talk a great deal about methodology.
I prefer to call it style." -- Freeman J. Dyson

	[[alternative HTML version deleted]]