[R] Query on constrained regressions using -mgcv- and -pcls-

Tue Nov 3 07:18:40 CET 2020

Warning: I did *not* attempt to follow your query(original or addendum) in
detail. But as you have not yet received a reply, it may be because your
post seems mostly about statistical issues, which are generally off topic
here. This list is primarily about R programming issues. If statistical
issues are your primary focus, SO may be a better place to post:
https://stats.stackexchange.com/

Otherwise, I guess you'll just have to continue waiting.

Incidentally, suggestions for improvements in nonstandard packages should
generally be sent to the package maintainer (?maintainer) rather than
posted here. Maintainers may not even check this list.

Finally, this is a plain text list. HTML posts often get mangled by the
server.

Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )

On Mon, Nov 2, 2020 at 9:28 PM Clive Nicholas via R-help <
r-help using r-project.org> wrote:

> As an addendum / erratum to my original post, the second block of code
> should read for completeness:
>
> set.seed(02102020)
> N=500
> M=10
> rater=rep(1:M, each = N)
> lead_n=as.factor(rep(1:N,M))
> a=rep(rnorm(N),M)
> z=rep(round(25+2*rnorm(N)+.2*a))
> x=a+rnorm(N*M)
> y=.5*x+5*a-.5*z+2*rnorm(N*M)
> x_cl=rep(aggregate(x,list(lead_n) mean)[,2],M)
> model=lm(y~x+x_cl+z)
> summary(model)
> y=1+1.5*x+4.6*x_cl-0.5*z
> x.mat=cbind(rep(1,length(y)),x,x_cl,z)
> ls.print(lsfit(x.mat,y,intercept=FALSE))
> M=list(y=y,
> w=rep(1, length(y)),
> X=x.mat,
> C=matrix(0,0,0),
> p=rep(1, ncol(x.mat)),
> off=array(0,0),
> S=list(),
> sp=array(0,0),
> Ain=diag(ncol(x.mat)),
> bin=rep(0, ncol(x.mat)) )
> pcls(M)
>
> However, all my questions stand.
>
> Ta, Clive
>
> On Tue, 3 Nov 2020 at 01:14, Clive Nicholas <clivelists using googlemail.com>
> wrote:
>
> > Hello all,
> >
> > I'll level with you: I'm puzzled!
> >
> > How is it that this constrained regression routine using -pcls- runs
> > satisfactorily (courtesy of Tian Zheng):
> >
> > library(mgcv)
> > options(digits=3)
> > x.1=rnorm(100, 0, 1)
> > x.2=rnorm(100, 0, 1)
> > x.3=rnorm(100, 0, 1)
> > x.4=rnorm(100, 0, 1)
> > y=1+0.5*x.1-0.2*x.2+0.3*x.3+0.1*x.4+rnorm(100, 0, 0.01)
> > x.mat=cbind(rep(1, length(y)), x.1, x.2, x.3, x.4)
> > ls.print(lsfit(x.mat, y, intercept=FALSE))
> > M=list(y=y,
> > w=rep(1, length(y)),
> > X=x.mat,
> > C=matrix(0,0,0),
> > p=rep(1, ncol(x.mat)),
> > off=array(0,0),
> > S=list(),
> > sp=array(0,0),
> > Ain=diag(ncol(x.mat)),
> > bin=rep(0, ncol(x.mat)) )
> > pcls(M)
> > Residual Standard Error=0.0095
> > R-Square=1
> > F-statistic (df=5, 95)=314735
> > p-value=0
> >
> >     Estimate Std.Err t-value Pr(>|t|)
> >        1.000  0.0010  1043.9        0
> > x.1    0.501  0.0010   512.6        0
> > x.2   -0.202  0.0009  -231.6        0
> > x.3    0.298  0.0010   297.8        0
> > x.4    0.103  0.0011    94.8        0
> >
> > but this one does not for a panel dataset:
> >
> > set.seed(02102020)
> > N=500
> > M=10
> > rater=rep(1:M, each = N)
> > lead_n=as.factor(rep(1:N,M))
> > a=rep(rnorm(N),M)
> > z=rep(round(25+2*rnorm(N)+.2*a))
> > x=a+rnorm(N*M)
> > y=.5*x+5*a-.5*z+2*rnorm(N*M)
> > x_cl=rep(aggregate(x,list(lead_n) mean)[,2],M)
> > model=lm(y~x+x_cl+z)
> > summary(model)
> > y=1+1.5*x+4.6*x_cl-0.5*z
> > x.mat=cbind(rep(1,length(y)),x,x_cl,z)
> > ls.print(lsfit(x.mat,y,intercept=FALSE))
> >
> > Residual Standard Error=0
> > R-Square=1
> > F-statistic (df=4, 4996)=5.06e+30
> > p-value=0
> >
> >      Estimate Std.Err   t-value Pr(>|t|)
> >           1.0       0  2.89e+13        0
> > x         0.8       0  2.71e+14        0
> > x_cl      4.6       0  1.18e+15        0
> > z        -0.5       0 -3.63e+14        0
> >
> > ?
> >
> > There shouldn't be anything wrong with the second set of data, unless
> I've
> > missed something obvious (that constraints don't work for panel data?
> Seems
> > unlikely to me)!
> >
> > Also:
> >
> > (1) I'm ultimately looking just to constrain ONE coefficient whilst
> > allowing the other coefficients to be unconstrained (I tried this with
> the
> > first dataset by setting
> >
> > y=1+0.5*x.1-x.2+x.3+x.4
> >
> > in the call, but got similar-looking output to what I got in the second
> > dataset); and
> >
> > (2) it would be really useful to have the call to -pcls(M)- produce more
> > informative output (SEs, t-values, fit stats, etc).
> >
> > Many thanks in anticipation of your expert help and being told what a
> > clueless berk I am,
> > Clive
> >
> > --
> > Clive Nicholas
> >
> > "My colleagues in the social sciences talk a great deal about
> methodology.
> > I prefer to call it style." -- Freeman J. Dyson
> >
>
>
> --
> Clive Nicholas
>
> "My colleagues in the social sciences talk a great deal about methodology.
> I prefer to call it style." -- Freeman J. Dyson
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

	[[alternative HTML version deleted]]