--- title: "Generalized Maximum Entropy framework" author: "Jorge Cabral" output: rmarkdown::html_vignette: toc: true toc_depth: 4 link-citations: yes bibliography: references.bib csl: american-medical-association-brackets.csl description: | An overview of the GME framework. vignette: > %\VignetteIndexEntry{Generalized Maximum Entropy framework} %\VignetteEncoding{UTF-8} %\VignetteEngine{knitr::rmarkdown} editor_options: markdown: wrap: 72 --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ![](GCEstim_logo.png) ## Introduction One of the most widely used data analysis techniques under fairly regularity conditions is the general linear regression analysis. In general, it proposes to model the relation between a dependent variable and a set of independent variables by means of a suitable mathematical model in order to understand whether the independent variables predict the dependent variable and how they explain the empirical data which are modeled. In the modelling process some assumptions should be imposed @Greene2008. In practice, however, data are frequently limited and some of these conditions are usually not satisfied. > [...] the available information is most often insufficient to provide > a unique answer or solution for most interesting decisions or > inferences we wish to make. In fact, insufficient information - > including limited, incomplete, complex, noisy and uncertain > information - is the norm for most problems across all disciplines. > --- Golan @Golan2017 Given this, regression models may become ill-posed which implies that the application of traditional estimation methods, such as the Ordinary Least Squares (OLS) estimators, may lead to non-unique or highly unstable solutions, unless simplifying assumptions/procedures are imposed to generate seemingly well-posed statistical models @Golan1996. One of the major concerns in univariate multiple linear regression is the collinearity problem, which is one of the responsible for inflating the variance associated with the regression coefficients estimates and, in general, to affect the signs of the estimates, as well as statistical inference [@Belsley1980; @Stewart1987].\ Supported by the maximum entropy principle @Jaynes1957, which advocates to evaluate events’ probabilities using a distribution that maximizes entropy among those that satisfy certain expectations’ constraints, Golan, Judge and Miller (from now on referred as GJM) @Golan1996 proposed a generalization to the regression framework and called it the generalized maximum entropy (GME) method. The GME is a suitable and flexible method, widely used, where no parametric assumptions are necessary, and a global measure of fit is available. ## General Linear Model Consider the general linear model given by \begin{align} \qquad \qquad \qquad \qquad \qquad \mathbf{y} = \mathbf{X}\boldsymbol{\beta} + \boldsymbol{\epsilon}, \qquad \qquad \qquad \qquad \qquad (1) \end{align} where $\mathbf{y}$ denotes a $(N \times 1)$ vector of noisy observations, $N \in \mathbb{N}$, $\boldsymbol{\beta}$ is a $((K+1) \times 1)$ vector of unknown parameters or coefficients, $K \in \mathbb{N}$, $\mathbf{X}$ is a known $(N \times (K+1))$ design matrix and $\boldsymbol{\epsilon}$ is a $(N \times 1)$ vector of random disturbances (errors), usually assumed to have a conditional expected value of zero and representing spherical disturbances, i.e, $E\left[ \boldsymbol{\epsilon} | \mathbf{X}\right]=\mathbf{0}$ and $E[\boldsymbol{\epsilon \epsilon}'|\mathbf{X}]=\sigma^2\mathbf{I}$, where $\mathbf{I}$ is a ($N \times N$) identity matrix and $\sigma^2$ is the error variance.\ ## Generalized Maximum Entropy estimator {#GMEestimator} GJM @Golan1996 generalized the Maximum Entropy formalism @Jaynes1957 to linear inverse problems with noise, expressed in model (1). The idea is to treat each $\beta_k,\space k\in\left\lbrace 0,\dots,K\right\rbrace$, as a discrete random variable with a compact support and $2 \leq M < \infty$ possible outcomes, and each $\epsilon_n, \space n\in\left\lbrace 1,\dots,N\right\rbrace$, as a finite and discrete random variable with $2 \leq J < \infty$ possible outcomes. Assuming that both the unknown parameters and the unknown error terms may be bounded *a priori*, the linear model (1) can be presented as \begin{equation} \mathbf{y}=\mathbf{XZp} + \mathbf{Vw}, \end{equation} where \begin{equation} \boldsymbol{\beta}=\mathbf{Zp}= \left[ \begin{array}{cccc} \mathbf{z}'_0 & \mathbf{0} & \cdots & \mathbf{0} \\ \mathbf{0} & \mathbf{z}'_1 & \cdots & \mathbf{0} \\ \vdots & \vdots & \ddots & \vdots\\ \mathbf{0} & \mathbf{0} & \cdots & \mathbf{z}'_K \end{array}\right] \left[ \begin{array}{c} \mathbf{p}_0 \\ \mathbf{p}_1 \\ \vdots\\ \mathbf{p}_K \end{array}\right], \end{equation} with $\mathbf{Z}$ a $((K+1) \times (K+1)M)$ matrix of support values and $\mathbf{p}$ a $((K+1)M \times 1)$ vector of unknown probabilities, and \begin{equation} \boldsymbol{\epsilon}=\mathbf{Vw}= \left[ \begin{array}{cccc} \mathbf{v}'_1 & \mathbf{0} & \cdots & \mathbf{0} \\ \mathbf{0} & \mathbf{v}'_2 & \cdots & \mathbf{0} \\ \vdots & \vdots & \ddots & \vdots\\ \mathbf{0} & \mathbf{0} & \cdots & \mathbf{v}'_N \end{array}\right] \left[ \begin{array}{c} \mathbf{w}_1 \\ \mathbf{w}_2 \\ \vdots\\ \mathbf{w}_N \end{array}\right], \end{equation} with $\mathbf{V}$ a $(N \times NJ)$ matrix of support values and $\mathbf{w}$ a $(NJ \times 1)$ vector of unknown probabilities. For the linear regression model specified in (1), the Generalized Maximum Entropy (GME) estimator is given by \begin{equation} \hat{\boldsymbol{\beta}}^{GME}(\mathbf{Z},\mathbf{V}) = \underset{\mathbf{p},\mathbf{w}}{\operatorname{argmax}} \left\{-\mathbf{p}' \ln \mathbf{p} - \mathbf{w}' \ln \mathbf{w} \right\}, \end{equation} subject to the model constraint \begin{equation} \mathbf{y}=\mathbf{XZp} + \mathbf{Vw}, \end{equation} and the additivity constraints for $\mathbf{p}$ and $\mathbf{w}$, respectively, \begin{equation} \begin{array}{c} \mathbf{1}_{K+1}=(\mathbf{I}_{K+1} \otimes \mathbf{1}'_M)\mathbf{p},\\ \mathbf{1}_N=(\mathbf{I}_N \otimes \mathbf{1}'_J)\mathbf{w}, \end{array} \end{equation} where $\otimes$ represents the Kronecker product, $\mathbf{1}$ is a column vector of ones with a specific dimension, $\mathbf{I}$ is an identity matrix with a specific dimension and $\mathbf{Z}$ and $\mathbf{V}$ are the matrices of supports, and $\mathbf{p}>\mathbf{0}$ and $\mathbf{w}>\mathbf{0}$ are probability vectors to be estimated.\ These equations can be rewritten using set notation as follows: \begin{align} &\text{maximize} & H(p,w) &=-\sum_{m=1}^M\sum_{k=0}^{K} p_{km}ln(p_{km}) -\sum_{j=1}^J\sum_{n=1}^N w_{nj}ln(w_{nj}) \\ &\text{subject to} & y_n &= \sum_{m=1}^M\sum_{k=0}^{K} X_{kn}Z_{kj}p_{kj} + \sum_{m=1}^M V_{nm}w_{nm} \\ & & \sum_{m=1}^M p_{km} = 1, \forall k\\ & & \sum_{j=1}^J w_{kj} = 1, \forall k. \end{align} The Lagrangian equation \begin{equation} \mathcal{L}=-\mathbf{p}' \ln \mathbf{p} - \mathbf{w}' \ln \mathbf{w} + \boldsymbol{\lambda}' \left( \mathbf{y} - \mathbf{XZp} - \mathbf{Vw} \right) + \boldsymbol{\theta}'\left( \mathbf{1}_{K+1}-(\mathbf{I}_{K+1} \otimes \mathbf{1}'_M)\mathbf{p} \right) + \boldsymbol{\tau}'\left( \mathbf{1}_N-(\mathbf{I}_N \otimes \mathbf{1}'_J)\mathbf{w}\right) \end{equation} can be used to find the interior solution, where $\lambda$, $\theta$, and $\tau$ are $(N\times 1)$, $((K+1)\times 1)$, $(N\times 1)$ associated vectors of Lagrangian multipliers, respectively. Taking the gradient of the Lagrangian and solving the first-order conditions yields the solutions for $\mathbf{\hat p}$ and $\mathbf{\hat w}$ \begin{equation} \hat p_{km} = \frac{exp(-z_{km}\sum_{n=1}^N \hat\lambda_n x_{nk})}{\sum_{m=1}^M exp(-z_{km}\sum_{n=1}^N \hat\lambda_n x_{nk})} \end{equation} and \begin{equation} \hat w_{nj} = \frac{exp(-\hat\lambda_n v_{n})}{\sum_{j=1}^J exp(-\hat\lambda_n v_{n})}. \end{equation} The GME estimator generates the probability vectors $\hat{\mathbf{p}}$ and $\hat{\mathbf{w}}$ that can be used to form point estimates of the unknown parameters and the unknown random errors through the reparameterizations. Since the objective function is strictly concave in the interior of the additivity constraint set, a unique solution for the GME estimator is guaranteed if the intersection of the model and the additivity constraint sets is non-empty @Golan1996. However, there is no closed-from solution for $\hat\lambda_n$, and hence no closed form solution for $\hat p$ and $\hat w$, so numerical optimization techniques must be used. ## Selecting support spaces ### Common approach The supports in matrices $\mathbf{Z}$ and $\mathbf{V}$ are defined as being closed and bounded intervals within which each parameter or error is restricted to lie, implying that researchers need to provide exogenous information. Unfortunately, researchers generally do not know the true values of the coefficients they are attempting to estimate. This is considered the main weakness of the GME estimator @Caputo2008. GJM @Golan1996 discuss these issues in the case of minimal prior information: for the unknown parameters, the authors recommend the use of "wide bounds" $\left(z_k^{lower},z_k^{upper}\right)$ for the supports in $\mathbf{Z}$, without extreme risk consequences. Furthermore, in order to reflect that the analyst has essentially no *a priori* information about the true values of the coefficients and errors, they assume that the support interval of each coefficient is symmetrically centered about the origin \begin{eqnarray} \mathbf{z}_k = \left[ \begin{array}{c} -z_k^{upper} \\ z_k^{upper}\frac{2\times 1 - \left( M-1 \right)} {M-1} \\ \vdots\\ 0\\ \vdots\\ z_k^{upper}\frac{2\times \left( M-2 \right) - \left( M-1 \right)} {M-1} \\ z_k^{upper} \end{array}\right]. \end{eqnarray} The number of points $M$ in the supports is less controversial and are usually used in the literature between $3$ and $7$ points, since authors claim that there is likely no significant improvement in the estimation with more points in the supports @Golan1996. For the unknown errors, the authors suggest the use of the three-sigma rule with a sample scale parameter @Pukelsheim1994 and $J=3$ points in the symmetrically centered about the origin supports \begin{eqnarray} \mathbf{v}_n = \left[ \begin{array}{c} -3 \hat\sigma_\mathbf{y} \\ 3 \hat\sigma_\mathbf{y} \times \frac{2\times 1 - \left( J-1 \right)} {J-1} \\ \vdots\\ 0\\ \vdots\\ 3\hat\sigma_\mathbf{y}\frac{2\times \left( J-2 \right) - \left( J-1 \right)} {J-1} \\ 3\hat\sigma_\mathbf{y} \end{array}\right], \end{eqnarray} where $\hat\sigma_\mathbf{y}$ is the sample standard deviation of $\mathbf{y}$. The parameters will also be referred to as the signal and the errors will also be referred to as the noise. #### Examples {#Examples} ##### Without *a priori* information Using `fngendata` a data set under certain conditions will be generated. To define an ill-conditioned design matrix, $\mathbf{X}$, with a specific condition number value, namely $cn=50$, the traditional singular value decomposition will be obtained and the singular values in $\mathbf{S}$, a diagonal matrix with the same dimension of $\mathbf{X}$, will be modified such that $cond(\mathbf{X})=cond(\mathbf{USV}')=cn$, where $\mathbf{U}$ and $\mathbf{V}$ are square unitary matrices, and $cond(\mathbf{X'X})=cn^2$. Errors will be defined by a normal distribution $e_i \sim N(0,\sigma^2)$ were $\sigma = \sqrt{var(\mathbf{X}\boldsymbol{\beta}) / snr}$ and $snr$ is the signal to noise ratio. ```{r,echo=FALSE,eval=TRUE} load("GCEstim_GME.RData") ``` ```{r,echo=TRUE,eval=TRUE} library(GCEstim) ``` ```{r,echo=TRUE,eval=FALSE} coef.dataGCE <- c(1, 0, 0, 3, 6, 9) ``` The following matrix shows the Pearson coefficients of correlation between the variables generated. ```{r,echo=FALSE,eval=TRUE} cor(dataGCE) ```
Although obviously questionable, under a "no *a priori* information" scenario, one can assume that $z_k^{upper}=100000$, $k\in\left\lbrace 0,\dots,6\right\rbrace$ is a "wide upper bound" for the signal support space. Using `lmgce` a model can be fitted using the GME framework. The `formula` is defined as in `stats::lm`, for instance. `support.signal` is used to define $\mathbf{Z}$. If the same lower limit (LL) and upper limit (UL) for the signal support space is assumed for each $\boldsymbol{\beta}_k$, `support.signal = c(LL, UL)` should be defined. `support.signal.points` can be an integer and in that case it corresponds to $M$. The noise support space is set by `support.noise`. If one wants to apply the three-sigma rule then the default `support.noise = NULL` should be used. `support.noise.points` can be an integer and in that case it corresponds to $J$. The optimization method `primal.solnp` uses the augmented Lagrange multiplier method with an Sequential Quadratic Programming (SQP) interior algorithm (see [Rsolnp](https://cran.r-project.org/package=Rsolnp)) and tries to solve the primal version of the optimization problem previously defined. ```{r,echo=TRUE,eval=FALSE} res.lmgce.100000 <- GCEstim::lmgce( y ~ ., data = dataGCE, support.signal = c(-100000, 100000), support.signal.points = 5, support.noise = NULL, support.noise.points = 3, twosteps.n = 0, method = "primal.solnp" ) ``` The estimated GME coefficients are $\widehat{\boldsymbol{\beta}}^{GME_{(100000)}}=$ `r paste0("(", paste(round(coef(res.lmgce.100000), 3), collapse = ", "), ")")`. ```{r,echo=TRUE,eval=TRUE} (coef.res.lmgce.100000 <- coef(res.lmgce.100000)) ``` They can be compared with the OLS estimation given by $\widehat{\boldsymbol{\beta}}^{OLS}=(\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'\mathbf{y}$. The estimated OLS coefficients are $\widehat{\boldsymbol{\beta}}^{OLS}=$ `r paste0("(", paste(round(coef(lm(y ~ ., data = dataGCE)), 3), collapse = ", "), ")")`. Those results can be obtained using `stats::lm` ```{r,echo=TRUE,eval=TRUE} res.lm <- lm(y ~ ., data = dataGCE) (coef.res.lm <- coef(res.lm)) ``` or setting `OLS = TRUE` in `lmgce`, which is the default. ```{r,echo=TRUE,eval=FALSE} res.lmgce.100000 <- GCEstim::lmgce( y ~ ., data = dataGCE, support.signal = c(-100000, 100000), support.signal.points = 5, support.noise = NULL, support.noise.points = 3, twosteps.n = 0, method = "primal.solnp", OLS = TRUE ) ``` ```{r,echo=TRUE,eval=TRUE} coef(res.lmgce.100000$results$OLS$res) ``` Consider the root mean squared error as a measure of fit of prediction ($RMSE_{\mathbf{\hat y}}$) defined as \begin{equation} RMSE_{\mathbf{\hat y}} = \sqrt{\frac{\sum_{n=1}^N {\left( y_n - \hat y_n \right)^2}}{N}}, \end{equation} where $\mathbf{\hat y}=\mathbf{X}\boldsymbol{\hat\beta}$. The prediction errors are approximately equal: $RMSE_{\mathbf{\hat y}}^{GME_{(100000)}} \approx$ `r round(GCEstim::accmeasure(fitted(res.lmgce.100000), dataGCE$y, which = "RMSE"), 3)` and $RMSE_{\mathbf{\hat y}}^{OLS} \approx$ `r round(GCEstim::accmeasure(fitted(res.lm), dataGCE$y, which = "RMSE"), 3)`. ```{r,echo=TRUE,eval=TRUE} (RMSE_y.lmgce.100000 <- GCEstim::accmeasure(fitted(res.lmgce.100000), dataGCE$y, which = "RMSE")) # or # res.lmgce.100000$error.measure (RMSE_y.lm <- GCEstim::accmeasure(fitted(res.lm), dataGCE$y, which = "RMSE")) ``` In order to provide an estimate of how well the model generalizes to an independent data set one can compute the root mean squared error using cross-validation, which involves randomly partition the data set into $\kappa$ folds of approximately the same size, training the model on $\kappa-1$ folds, and validating it on the remaining one.\ Let \begin{eqnarray} \mathbf{X} = \left[ \begin{array}{c} \mathbf{X_{(1)}} \\ \mathbf{X_{(2)}} \\ \vdots\\ \mathbf{X_{(\kappa)}} \end{array}\right], \end{eqnarray} where $\mathbf{X_{(cv)}}$ is a $(N_{cv} \times (K+1))$ design matrix such that $\sum_{cv=1}^\kappa N_{cv}=N$ and $N_{cv}\approx \frac N \kappa$, $cv\in\left\lbrace 1,\dots,\kappa\right\rbrace$\ The prediction $\kappa$-fold cross-validation root mean squared error can be defined as \begin{equation} CV\text{-}RMSE_{\mathbf{\hat y}}=\frac{\sum_{cv=1}^\kappa RMSE_{\mathbf{\hat y_{cv}}}}{\kappa}, \end{equation} where $\mathbf{\hat y_{cv}}=X_{(cv)}\boldsymbol{\hat\beta_{(cv)}}$ and $\boldsymbol{\hat\beta_{(cv)}}$ was estimated in \begin{eqnarray} \mathbf{X_{(cv)}} = \left[ \begin{array}{c} \mathbf{X_{(1)}} \\ \vdots\\ \mathbf{X_{(cv-1)}} \\ \mathbf{X_{(cv+1)}} \\ \vdots\\ \mathbf{X_{(\kappa)}} \end{array}\right]. \end{eqnarray} To perform $\kappa$-fold cross-validation with `lmgce`, the default, one should set `cv = TRUE` and `cv.nfolds = k`. The default value for $\kappa$ is $5$. For reproducibility a `seed` should also be defined: the default is `seed = 230676`. ```{r,echo=TRUE,eval=FALSE} res.lmgce.100000 <- GCEstim::lmgce( y ~ ., data = dataGCE, cv = TRUE, cv.nfolds = 5, support.signal = c(-100000, 100000), support.signal.points = 5, support.noise = NULL, support.noise.points = 3, twosteps.n = 0, method = "primal.solnp", OLS = TRUE, seed = 230676 ) ``` The prediction cross-validation errors are again approximately equal: $CV\text{-}RMSE_{\mathbf{\hat y}}^{GME_{(100000)}} \approx$ `r round(res.lmgce.100000$error.measure.cv.mean, 3)` and $CV\text{-}RMSE_{\mathbf{\hat y}}^{OLS} \approx$ `r round(mean(res.lmgce.100000$results$OLS$error), 3)`. ```{r,echo=TRUE,eval=TRUE} (CV_RMSE_y.lmgce.100000 <- res.lmgce.100000$error.measure.cv.mean) (CV_RMSE_y.lm <- mean(res.lmgce.100000$results$OLS$error)) ``` In the example given, the "true" value of $\boldsymbol{\beta}$ is known so the root mean squared error can be used as a measure of fit of precision. Let $RMSE_{\boldsymbol{\hat\beta}}$ be \begin{equation} RMSE_{\boldsymbol{\hat\beta}} = \sqrt{\frac{\sum_{k=0}^K {\left( \beta_k - \hat\beta_k \right)^2}}{K+1}}. \end{equation} The precision errors are similar in magnitude with a slight advantage for the GME approach: $RMSE_{\boldsymbol{\hat\beta}}^{GME_{(100000)}} \approx$ `r round(GCEstim::accmeasure(coef.res.lmgce.100000, coef.dataGCE, which = "RMSE"), 3)` and $RMSE_{\boldsymbol{\hat\beta}}^{OLS} \approx$ `r round(GCEstim::accmeasure(coef.res.lm, coef.dataGCE, which = "RMSE"), 3)`. ```{r,echo=TRUE,eval=TRUE} (RMSE_beta.lmgce.100000 <- GCEstim::accmeasure(coef.res.lmgce.100000, coef.dataGCE, which = "RMSE")) (RMSE_beta.lm <- GCEstim::accmeasure(coef.res.lm, coef.dataGCE, which = "RMSE")) ``` From the given example it seems that OLS and GME have similar performance. But there as been a great deal of concern about the sensitivity of the GME coefficient estimates to the choice of their support intervals [@Leon1999; @Paris1998; @Caputo2008]. Some state that estimates are not invariant to the design of the support intervals @Leon1999.\ Suppose that one had assumed that $z_k^{upper}=100$, $k\in\left\lbrace 0,\dots,6\right\rbrace$ was a "wide upper bound" for the signal support space. ```{r,echo=TRUE,eval=FALSE} res.lmgce.100 <- GCEstim::lmgce( y ~ ., data = dataGCE, cv = TRUE, cv.nfolds = 5, support.signal = c(-100, 100), support.signal.points = 5, support.noise = NULL, support.noise.points = 3, twosteps.n = 0, method = "primal.solnp", OLS = TRUE, seed = 230676 ) ``` The estimated GME coefficients under these conditions are $\widehat{\boldsymbol{\beta}}^{GME_{(100)}}=$ `r paste0("(", paste(round(coef(res.lmgce.100), 3), collapse = ", "), ")")`. ```{r,echo=FALSE,eval=TRUE} coef.res.lmgce.100 <- coef(res.lmgce.100) ``` The prediction errors are still approximately the same, $RMSE_{\mathbf{\hat y}}^{GME_{(100)}} \approx$ `r round(GCEstim::accmeasure(fitted(res.lmgce.100), dataGCE$y, which = "RMSE"), 3)` and $CV\text{-}RMSE_{\mathbf{\hat y}}^{GME_{(100)}} \approx$ `r round(res.lmgce.100$error.measure.cv.mean, 3)`, but a significant decrease was found for the precision error, $RMSE_{\boldsymbol{\hat\beta}}^{GME_{(100)}} \approx$ `r round(GCEstim::accmeasure(coef.res.lmgce.100, coef.dataGCE, which = "RMSE"), 3)`. ```{r,echo=FALSE,eval=TRUE} RMSE_y.lmgce.100 <- GCEstim::accmeasure(fitted(res.lmgce.100), dataGCE$y, which = "RMSE") RMSE_beta.lmgce.100 <- GCEstim::accmeasure(coef.res.lmgce.100, coef.dataGCE, which = "RMSE") CV_RMSE_y.lmgce.100 <- res.lmgce.100$error.measure.cv.mean ``` ```{r,echo=FALSE,eval=FALSE} res.lmgce.50 <- GCEstim::lmgce( y ~ ., data = dataGCE, cv = TRUE, cv.nfolds = 5, support.signal = c(-50, 50), support.signal.points = 5, support.noise = NULL, support.noise.points = 3, twosteps.n = 0, method = "primal.solnp", OLS = TRUE, seed = 230676 ) ``` ```{r,echo=FALSE,eval=TRUE} coef.res.lmgce.50 <- coef(res.lmgce.50) RMSE_y.lmgce.50 <- GCEstim::accmeasure(fitted(res.lmgce.50), dataGCE$y, which = "RMSE") RMSE_beta.lmgce.50 <- GCEstim::accmeasure(coef.res.lmgce.50, coef.dataGCE, which = "RMSE") CV_RMSE_y.lmgce.50 <- res.lmgce.50$error.measure.cv.mean ``` If $z_k^{upper}=50$ then $\widehat{\boldsymbol{\beta}}^{GME_{(50)}}=$ `r paste0("(", paste(round(coef(res.lmgce.50), 3), collapse = ", "), ")")`, $RMSE_{\mathbf{\hat y}}^{GME_{(50)}} \approx$ `r round(GCEstim::accmeasure(fitted(res.lmgce.50), dataGCE$y, which = "RMSE"), 3)`, $CV\text{-}RMSE_{\mathbf{\hat y}}^{GME_{(50)}} \approx$ `r round(res.lmgce.50$error.measure.cv.mean, 3)` and $RMSE_{\boldsymbol{\hat\beta}}^{GME_{(50)}} \approx$ `r round(GCEstim::accmeasure(coef.res.lmgce.50, coef.dataGCE, which = "RMSE"), 3)`. ##### With *a priori* information As stated before, the "true" value of $\boldsymbol{\beta}$ was predefined so there is *a prior* information. That information can be incorporated in the GME estimations. If one wants to have symmetrically centered supports it is possible to define, for instance, the following: $\mathbf{z}_0'= \left[ -5, -2.5, 0, 2.5, 5\right]$, $\mathbf{z}_1' = \mathbf{z}_2'= \left[ -2, -1, 0, 1, 2\right]$, $\mathbf{z}_3'= \left[ -6, -3, 0, 3, 6\right]$, $\mathbf{z}_4'= \left[ -10, -5, 0, 5, 10\right]$ and $\mathbf{z}_5'= \left[ -15, -7.5, 0, 7.5, 15\right]$. ```{r,echo=TRUE,eval=FALSE} res.lmgce.apriori.centered.zero <- GCEstim::lmgce( y ~ ., data = dataGCE, support.signal = matrix(c(-5, 5, -2, 2, -2, 2, -6, 6, -10, 10, -10, 15), ncol = 2, byrow = TRUE), support.signal.points = 5, support.noise = NULL, support.noise.points = 3, twosteps.n = 0, method = "primal.solnp", OLS = TRUE, seed = 230676 ) ``` The estimated GME coefficients with *a prior* information and zero centered signal supports are $\widehat{\boldsymbol{\beta}}^{GME_{(apriori.centered.zero)}}=$ `r paste0("(", paste(round(coef(res.lmgce.apriori.centered.zero), 3), collapse = ", "), ")")`. ```{r,echo=FALSE,eval=TRUE} coef.lmgce.apriori.centered.zero <- coef(res.lmgce.apriori.centered.zero) ``` The prediction error is slightly higher, $RMSE_{\mathbf{\hat y}}^{GME_{(apriori.centered.zero)}} \approx$ `r round(GCEstim::accmeasure(fitted(res.lmgce.apriori.centered.zero), dataGCE$y, which = "RMSE"), 3)`, as well as the cross-validation prediction error, $CV\text{-}RMSE_{\mathbf{\hat y}}^{GME_{(apriori.centered.zero)}} \approx$ `r round(res.lmgce.apriori.centered.zero$error.measure.cv.mean, 3)`, but again a significant decrease was found for the precision error, $RMSE_{\boldsymbol{\hat\beta}}^{GME_{(apriori.centered.zero)}} \approx$ `r round(GCEstim::accmeasure(coef.lmgce.apriori.centered.zero, coef.dataGCE, which = "RMSE"), 3)`. ```{r,echo=FALSE,eval=TRUE} RMSE_y.lmgce.apriori.centered.zero <- GCEstim::accmeasure(fitted(res.lmgce.apriori.centered.zero), dataGCE$y, which = "RMSE") RMSE_beta.lmgce.apriori.centered.zero <- GCEstim::accmeasure(coef.lmgce.apriori.centered.zero, coef.dataGCE, which = "RMSE") CV_RMSE_y.lmgce.apriori.centered.zero <- res.lmgce.apriori.centered.zero$error.measure.cv.mean ``` Some authors state that a narrow support around this *a priori* knowledge will provide a far better fit than its OLS counterpart, particularly in small sample sizes @Mittelhammer2013. So, a different approach could be centering the signal support spaces at the "true" values of $\boldsymbol{\beta}$ and setting, for instance, a range equal to $4$ for each support space: $\mathbf{z}_0'= \left[ -1, 0, 1, 2, 3\right]$, $\mathbf{z}_1' = \mathbf{z}_2'= \left[ -2, -1, 0, 1, 2\right]$, $\mathbf{z}_3'= \left[ 1, 2, 3, 4, 5\right]$, $\mathbf{z}_4'= \left[ 4, 5, 6, 7, 8\right]$ and $\mathbf{z}_5'= \left[ 7, 8, 9, 10, 11\right]$. ```{r,echo=TRUE,eval=FALSE} res.lmgce.apriori.centered.beta <- GCEstim::lmgce( y ~ ., data = dataGCE, support.signal = matrix(c(-1, 3, -2, 2, -2, 2, 1, 5, 4, 8, 7, 11), ncol = 2, byrow = TRUE), support.signal.points = 5, support.noise = NULL, support.noise.points = 3, twosteps.n = 0, method = "primal.solnp", OLS = TRUE, seed = 230676 ) ``` The estimated GME coefficients when the support is centered in $\boldsymbol\beta$ are $\widehat{\boldsymbol{\beta}}^{GME_{(apriori.centered.beta)}}=$ `r paste0("(", paste(round(coef(res.lmgce.apriori.centered.beta), 3), collapse = ", "), ")")`. ```{r,echo=FALSE,eval=TRUE} coef.lmgce.apriori.centered.beta <- coef(res.lmgce.apriori.centered.beta) ``` The prediction error ($RMSE_{\mathbf{\hat y}}^{GME_{(apriori.centered.beta)}} \approx$ `r round(GCEstim::accmeasure(fitted(res.lmgce.apriori.centered.beta), dataGCE$y, which = "RMSE"), 3)`), the cross-validation prediction error ($CV\text{-}RMSE_{\mathbf{\hat y}}^{GME_{(apriori.centered.beta)}} \approx$ `r round(res.lmgce.apriori.centered.beta$error.measure.cv.mean, 3)`) and precision error ($RMSE_{\boldsymbol{\hat\beta}}^{GME_{(apriori.centered.beta)}} \approx$ `r round(GCEstim::accmeasure(coef.lmgce.apriori.centered.beta, coef.dataGCE, which = "RMSE"), 3)`) improved when compared with the centered at zero approach. ```{r,echo=FALSE,eval=TRUE} RMSE_y.lmgce.apriori.centered.beta <- GCEstim::accmeasure(fitted(res.lmgce.apriori.centered.beta), dataGCE$y, which = "RMSE") RMSE_beta.lmgce.apriori.centered.beta <- GCEstim::accmeasure(coef.lmgce.apriori.centered.beta, coef.dataGCE, which = "RMSE") CV_RMSE_y.lmgce.apriori.centered.beta <- res.lmgce.apriori.centered.beta$error.measure.cv.mean ``` The following table summarizes the previous results. ```{r, echo=FALSE,eval=TRUE} res.all <- data.frame(OLS = c(RMSE_y.lm, CV_RMSE_y.lm, RMSE_beta.lm), GCE_100000 = c(RMSE_y.lmgce.100000, CV_RMSE_y.lmgce.100000, RMSE_beta.lmgce.100000), GCE_100 = c(RMSE_y.lmgce.100, CV_RMSE_y.lmgce.100, RMSE_beta.lmgce.100), GCE_50 = c(RMSE_y.lmgce.50, CV_RMSE_y.lmgce.50, RMSE_beta.lmgce.50), GCE_apriori.centered.zero = c(RMSE_y.lmgce.apriori.centered.zero, CV_RMSE_y.lmgce.apriori.centered.zero, RMSE_beta.lmgce.apriori.centered.zero), GCE_apriori.centered.beta = c(RMSE_y.lmgce.apriori.centered.beta, CV_RMSE_y.lmgce.apriori.centered.beta, RMSE_beta.lmgce.apriori.centered.beta), row.names = c("Prediction RMSE", "Prediction CV-RMSE", "Precision RMSE") ) ``` ```{r, echo=FALSE,eval=TRUE,results = 'asis'} kableExtra::kable( res.all, digits = 3, align = c(rep('c', times = 5)), col.names = c("$OLS$", "$GME_{(100000)}$", "$GME_{(100)}$", "$GME_{(50)}$", "$GME_{(apriori.centered.zero)}$", "$GME_{(apriori.centered.beta)}$"), row.names = TRUE, booktabs = FALSE) ``` ## Conclusion Given that the absence of knowledge about the true value of coefficients is generally the observed situation in applied work, it is manifestly important to come to some general procedure to define those supports. ## References ::: {#refs} ::: ## Acknowledgements This work was supported by Fundação para a Ciência e Tecnologia (FCT) through CIDMA and projects and .