The R package `specs`

implements the single-equation penalized error-correction selector proposed in Smeekes and Wijler (2018a) as an automated approach towards sparse single-equation cointegration modelling. In addition, the package contains the dataset used in Smeekes and Wijler (2018a) to predict Dutch unemployment rates based on Google Trends.

The development version of the `specs`

package can be installed from GitHub using

```
# install.package("devtools")
devtools::install_github("wijler/specs")
```

After installation, the package can be loaded in the standard way:

`library(specs)`

Loading `specs`

provides access to the dataset `Unempl_GT`

which contains the monthly unemployment rates (x1000) in the Netherlands and a set of 87 related Google Trends aggregated to a monthly frequency. The data covers the period from 1 January 2014 to December 2017 and is used in the empirical application in Smeekes and Wijler (2018a).

The package `specs`

enables sparse and automated estimation of single-equation cointegration models. To estimate a conditional error-correction model (CECM) on the untransformed levels of a collection of time series, use the function `specs`

. Conversion of the data matrix in levels to a CECM representation is performed automatically within the function. For example,

```
#library(specs)
unemployment <- Unempl_GT[,1] #Extract the Dutch unemployment levels (x1,000)
GT <- Unempl_GT[,2:11] #Select the first ten Google Trends
my_specs <- specs(unemployment,GT,p=1) #Estimate specs
my_coefs <- my_specs$gammas #store the coefficients
y_d <- my_specs$y_d #Transformed dependent variable
z_l <- my_specs$v #Transformed independent variables
```

estimates a regularized CECM on Dutch unemployment rates and ten Google Trends by appropriately differencing and/or lagging the levels of the time series, see Smeekes and Wijler (2018a, eq. 7) for the full model specification. `my_coefs`

contains 1,000 column, each column being the solution for a unique combination of the group and individual penalty (see the **penalty** section for more details).

The short-run dynamics in a CECM are modeled via the inclusion of lagged differences of the data. By default, `specs`

includes once lagged differences, but the user is free to specify the desired number via the input `p`

. Continuing the above example,

`my_specs <- specs(unemployment,GT,p=0) #Estimate specs without any lagged differences`

The inclusion of deterministic terms is often desired for correct model specification. When it is believed that a constant and/or trend should be included, it is advisable to estimate the model without imposing regularization on these deterministic components. Accordingly, the input `deterministics = c("constant","trend","both","none")`

allows the user to choose the deterministic specification without penalizing the deterministic terms. For example,

```
my_specs <- specs(unemployment,GT,deterministics = "both") #Estimate specs with a constant and trend included
my_coefs <- my_specs$gammas #Store the new coefficients
my_deterministics <- my_specs$thetas #Store the coefficients of the deterministic component
```

There are cases in which it may not be of interest to explicitly model cointegration. In this instance, the lagged levels can be omitted from the model by setting `ADL = TRUE`

. This estimates a penalized autoregressive distributed lag (ADL) model on the differenced data:

```
my_adl <- specs(unemployment,GT,ADL=TRUE) #Estimate and ADL model
my_coefs <- my_specs$gammas #Store the coefficients (smaller matrix than before)
```

The matrix of coefficients stored in `my_coefs`

does not contain the contribution of `z_l`

anymore and, consequently, has a smaller row dimension than before. `my_coefs`

now contains only 100 columns, as there is no more group penalty included. The reduced number of penalties to estimate a solution far and some algorithmic change (see **Algorithm and Implementation**) speed up the estimation procedure considerably.

Alternatively, the user may choose to pre-transform the data into the form of a CECM/ADL, for example to save on computation time in rolling window forecast exercises. In this case, the function `specs_tr`

can be used instead. This function operates entirely analogous to `specs`

, with the exception of requiring a differenced dependent variable (`y_d`

), the lagged levels of the time series (`z_l`

) and the required differences of the data (`w`

) as inputs. Since `w`

is directly provided to the function, the option to set the lag length `p`

is omitted. When `ADL=TRUE`

, the user may omit `z_l`

as an input. For example:

```
z <- cbind(unemployment,GT)
y_d <- diff(unemployment) #Difference the dependent variable
z_l <- GT[-nrow(GT),] #Lagged levels of the data
w <- diff(GT) #Contemporaneous differences (corresponding to p=0)
my_specs <- specs_tr(y_d,z_l,w) #Estimate a CECM on pre-transformed data
my_adl <- specs_tr(y_d,NULL,w,ADL=TRUE) #Estimate an ADL model on pre-transformed data
```

Finally, a word on the naming of objects. Within this package the coefficients corresponding to `z_l`

are referred to as `delta`

, those corresponding to `w`

as `pi`

, and the numeric object that stacks both `delta`

and `pi`

is referred to as `gamma`

. Therefore, the solutions of `specs`

are referred to as `gammas`

. The deterministic terms are passed to the function outputs as `D`

, with there coefficients being referred to as `thetas`

. As seen in the above examples, when `ADL=TRUE`

, `delta`

is omitted from the numeric object `gamma`

in the output. The naming of objects here is congruent with Smeekes and Wijler (2018a), which may serve as a helpful guideline for implementation.

**Penalty**

The functions `specs`

, or equivalently `specs_tr`

, estimate the model by variants of penalized regression, customized to the error-correction framework. In its most general form, specs penalizes each individual coefficient via `lambda_i`

, and adds a group penalty on `delta`

via `lambda_g`

. Unless sequences of positive numbers for `lambda_i`

and/or `lambda_g`

are supplied, sequences are generated automatically within the function. A grid of 100 values is generated for `lambda_i`

and a grid of 10 values for `lambda_g`

. The largest value in each grid corresponds to the smallest value that sets all the coefficients that it penalizes equal to zero (with the other penalty set equal to zero). The smallest penalty in the grid is chosen as 1e-4 times the largest value in that grid. As an important special case, the user may set `lambda_g = 0`

, in which case the function will estimate the model by (weighted) *L*_{1}-penalized regression, i.e. the (adaptive) lasso:

`my_specs <- specs(unemployment,GT,lambda_g=0) #Estimate a CECM without group penalty`

In practice, one typically requires a single choice of penalties that provides the optimal solution for the model building exercise at hand. To facilitate selection of such an optimal penalty, the functions `specs_opt`

and `specs_tr_opt`

, being the equivalents of `specs`

and `specs_tr`

, respectively, come with the added functionality of automated selection of optimal values for `lambda_i`

and `lambda_g`

. The selection criteria can be set via the input `rule`

, with the possible choices being `BIC`

, `AIC`

or time series cross-validation (`TSCV`

). The first two are information criteria in which the degrees of freedom is approximated by the number of non-zero coefficients for a particular solution, whereas the latter is a form of cross-validation that respects the time series structure of the data. The implementation details for TSCV can be found in Smeekes and Wijler (2018b, p. 411). The full matrix of solutions, as well as the solution corresponding to the optimal penalty choice are included in the output.

```
my_specs <- specs_opt(unemployment,GT,rule="BIC") #Estimate a CECM with the optimal penalty chosen by BIC
coefs_opt <- my_specs$gamma_opt #Extract the optimal coefficients
my_adl <- specs_opt(unemployment,GT,rule="AIC",ADL=TRUE) #Estimate an ADL model with the optimal penalty chosen by BIC
my_specs <- specs_opt(unemployment,GT,rule="TSCV",CV_cutoff=4/5) #Estimate a CECM with the optimal penalty chosen by TSCV
#Training sample is 4/5 of the total sample
my_specs <- specs_tr_opt(y_d,z_l,w,rule="BIC") #Estimate a CECM based on pre-transformed data, penalty chosen by BIC
```

**Weights**

Finally, specs can be estimated with the use of adaptively weighted penalization. Automatically generated weighting schemes are available via the input `weights = c("ridge","ols","none")`

. The default option, `"ridge"`

constructs the weights via the use of initial estimates obtained by ridge regression. In detail, the weights for *δ*_{i} and *π*_{j} are constructed as |*δ̂*_{i}|^{ − kδ} and |*π̂*_{j}|^{ − kπ}, respectively. The penalty parameter for the ridge regression is automatically chosen by TSCV. Alternative options are to automatically generate weights via initial ols estimates (`weights = "ols"`

) or to refrain from adaptive weighting altogether (`weights = "none"`

). Alternatively, it is also possible to supply a sequence of positive weights directly. Finally, the values for *k*_{δ} and *k*_{π} can be chosen by the user via the equivalently named input options `k_delta`

and `k_pi`

. The optimal values for these parameters are case-dependent, although some theoretical guidance is provided in table 1 of Smeekes and Wijler (2018a).

`my_specs <- specs(unemployment,GT,weights="ols",k_delta=2,k_pi=1) #Estimate specs with OLS and variable weight exponents`

**Algorithm and Implementation**

The package `specs`

combines accelerated generalized gradient descent for the estimation of *δ* with coordinatewise descent for the estimation of *π*. Since *δ* is penalized by both an *L*_{1}- and *L*_{2}-penalty, its estimation fits into the framework of the so-called sparse group lasso, for which numerous computational procedures have been proposed. This package adopts the algorithm of Simon et al. (2013), as the use of accelerated gradient descent via Nesterov updates greatly improves computational time. However, since *π* is regularized via an *L*_{1}-penalty, and is separable from the penalty on *δ*, the optimal solution for *π* is calculated via the coordinate-wise descent procedure proposed in Friedman et al. (2009). Essentially, `specs`

iterates between optimizing for *δ* and *π*, where within each iteration one of the aforementioned two algorithms is repeated until numerical convergence. All calculations are performed in C++, with the help of the Rcpp and Armadillo packages.

- Friedman, J., Hastie, T., and Tibshirani, R. (2009). glmnet: Lasso and elastic-net regularized generalized linear models. R package version, 1(4).
- Simon, N., Friedman, J., Hastie, T., and Tibshirani, R. (2013). A sparse-group lasso. Journal of computational and graphical statistics, 22(2), 231-245.
- Smeekes, S., and Wijler, E. (2018a). An Automated Approach Towards Sparse Single-Equation Cointegration Modelling. arXiv preprint arXiv:1809.08889.
- Smeekes, S., & Wijler, E. (2018b). Macroeconomic forecasting using penalized regression methods. International journal of forecasting, 34(3), 408-430.