Sufficient dimension reduction (SDR) reduces the dimensionality of predictors \(\mathbf{X}\) while preserving their relationship with a response \(Y\). SDR estimates a basis \(\mathbf{B}\) of the central subspace \(\mathcal{S}_{Y\mid\mathbf{X}}\) defined by \[ Y \perp \mathbf{X} \mid \mathbf{B}^\top \mathbf{X}. \] In high dimensions, sparse SDR improves interpretability and accuracy by driving many rows of \(\mathbf{B}\) to zero, which is achieved by adding a sparsity-inducing penalty to the SDR optimization problem.
The ppmSDR package implements a unified framework
for sparse SDR based on the penalized principal machine (\(\mathrm{P}^2\mathrm{M}\)). A single
front-end, ppm(), dispatches to ten loss-specific
estimators, all fitted by one group coordinate descent (GCD) engine, and
ppm_tune() selects the sparsity parameter by
cross-validation.
Given data \((y_i, \mathbf{x}_i) \in \mathbb{R} \times \mathbb{R}^p\) with centered predictors, and a sequence of cutoffs \(r_1 < \cdots < r_h\), the sample principal machine solves \[ (\beta_{0k}, \boldsymbol{\beta}_k) = \arg\min_{\beta_0, \boldsymbol{\beta}} \boldsymbol{\beta}^\top \hat{\Sigma} \boldsymbol{\beta} + \frac{c}{n} \sum_{i=1}^n L_k\!\left(\tilde{y}_{ik}, \beta_0 + \boldsymbol{\beta}^\top \mathbf{x}_i\right), \quad k = 1, \ldots, h, \] where \(\hat{\Sigma} = \sum_i \mathbf{x}_i \mathbf{x}_i^\top / n\). The basis \(\hat{\mathbf{B}}\) is estimated by the leading \(d\) eigenvectors of \(\sum_{k=1}^h \hat{\boldsymbol{\beta}}_k \hat{\boldsymbol{\beta}}_k^\top\).
Under the sparsity assumption the slopes \(\boldsymbol{\beta}_k\) share a common support across \(k\), leading to the row-group penalized objective \[ \sum_{k=1}^{h} \left[ \boldsymbol{\beta}_k^\top \hat{\Sigma} \boldsymbol{\beta}_k + \frac{c}{n} \sum_{i=1}^n L_k(\tilde{y}_{ik}, \beta_{0k} + \boldsymbol{\beta}_k^\top \mathbf{x}_i) \right] + \sum_{j=1}^{p} p_{\lambda}\!\left(\|\boldsymbol{\beta}_{(j)}\|_2\right), \] where \(\boldsymbol{\beta}_{(j)} = (\beta_{1j}, \ldots, \beta_{hj})^\top\) and \(p_\lambda(\cdot)\) is the group LASSO, SCAD or MCP penalty. The penalty acts group-wise on all coefficients of predictor \(j\), so variable selection corresponds to identifying the predictors that form a sparse basis.
| Machine | Response | Type | Loss \(L_k(\tilde y_k, f)\) | Algorithm |
|---|---|---|---|---|
| \(\mathrm{P}^2\mathrm{LSM}\) | Continuous | RPM | \((1 - \tilde y_k f)^2\) | GCD |
| \(\mathrm{P}^2\mathrm{WLSM}\) | Binary | LPM | \(w_k(1 - y f)^2\) | GCD |
| \(\mathrm{P}^2\mathrm{LR}\) | Continuous | RPM | \(\log(1 + e^{-\tilde y_k f})\) | Iterative GCD |
| \(\mathrm{P}^2\mathrm{WLR}\) | Binary | LPM | \(w_k \log(1 + e^{-y f})\) | Iterative GCD |
| \(\mathrm{P}^2\mathrm{AR}\) | Both | LPM | \((y-f)^2(\rho_k\mathbb{I}\{y \ge f\} + (1-\rho_k)\mathbb{I}\{y<f\})\) | Iterative GCD |
| \(\mathrm{P}^2\mathrm{L2M}\) | Continuous | RPM | \([\max\{0, 1 - \tilde y_k f\}]^2\) | Iterative GCD |
| \(\mathrm{P}^2\mathrm{WL2M}\) | Binary | LPM | \(w_k[\max\{0, 1 - y f\}]^2\) | Iterative GCD |
| \(\mathrm{P}^2\mathrm{SVM}\) | Continuous | RPM | \(\max\{0, 1 - \tilde y_k f\}\) | MM-GCD |
| \(\mathrm{P}^2\mathrm{WSVM}\) | Binary | LPM | \(w_k \max\{0, 1 - y f\}\) | MM-GCD |
| \(\mathrm{P}^2\mathrm{QR}\) | Both | LPM | \((y-f)(\rho_k - \mathbb{I}\{y<f\})\) | MM-GCD |
The squared loss yields an exact group-penalized least-squares problem solved directly by GCD. Smooth losses (logistic, asymmetric least squares, L2-hinge) are reduced to a sequence of penalized least-squares problems via quadratic approximation (iterative GCD). Non-differentiable losses (hinge, check) are handled by quadratic majorization within a majorization-minimization scheme (MM-GCD).
The argument loss selects the estimator;
penalty is one of "grSCAD",
"grLasso" or "grMCP"; lambda
controls sparsity and is best chosen by cross-validation (see
below).
set.seed(1)
n <- 1000; p <- 10
B <- matrix(0, p, 2); B[1, 1] <- B[2, 2] <- 1
x <- matrix(rnorm(n * p), n, p)
y <- (x %*% B[, 1]) / (0.5 + (x %*% B[, 2] + 1)^2) + 0.2 * rnorm(n)
## penalized principal least-squares SVM (P^2LSM)
fit <- ppm(x, y, H = 10, C = 1, loss = "lssvm", penalty = "grSCAD", lambda = 0.01)
round(fit$evectors[, 1:2], 3)
#> [,1] [,2]
#> [1,] 1 0
#> [2,] 0 1
#> [3,] 0 0
#> [4,] 0 0
#> [5,] 0 0
#> [6,] 0 0
#> [7,] 0 0
#> [8,] 0 0
#> [9,] 0 0
#> [10,] 0 0
summary(fit, d = 2)
#> Penalized Principal Machine (P2M) summary
#> Loss: lssvm Penalty: grSCAD lambda = 0.01
#> Working dimension d = 2
#>
#> Estimated basis of the central subspace:
#> Dir1 Dir2
#> x1 1e+00 -1e-04
#> x2 1e-04 1e+00
#> x3 0e+00 0e+00
#> x4 0e+00 0e+00
#> x5 0e+00 0e+00
#> x6 0e+00 0e+00
#> x7 0e+00 0e+00
#> x8 0e+00 0e+00
#> x9 0e+00 0e+00
#> x10 0e+00 0e+00
#>
#> Selected variables (2 of 10): x1, x2For a binary response the weighted losses code the classes internally as \(\{-1, +1\}\).
y.binary <- sign(y)
## penalized principal weighted least-squares SVM (P^2WLSM)
fit2 <- ppm(x, y.binary, H = 10, C = 1, loss = "asls",
penalty = "grSCAD", lambda = 0.03)
round(fit2$evectors[, 1:2], 3)
#> [,1] [,2]
#> [1,] 1.000 -0.005
#> [2,] 0.005 1.000
#> [3,] 0.000 0.000
#> [4,] 0.000 0.000
#> [5,] 0.005 0.004
#> [6,] 0.000 0.000
#> [7,] 0.006 0.001
#> [8,] 0.000 0.000
#> [9,] 0.000 0.000
#> [10,] 0.000 0.000lambda by cross-validationppm_tune() performs \(K\)-fold cross-validation, selecting the
lambda that maximizes the held-out distance correlation
between the response and the estimated sufficient predictors. Call
set.seed() beforehand for reproducible folds.
set.seed(1)
cv <- ppm_tune(x, y, loss = "lssvm", d = 2, n.fold = 5,
nlambda = 10, lambda.max = 0.02)
cv$opt.lambda
#> [1] 0.009283178
summary(cv$fit, d = 2)
#> Penalized Principal Machine (P2M) summary
#> Loss: lssvm Penalty: grSCAD lambda = 0.00928318
#> Working dimension d = 2
#>
#> Estimated basis of the central subspace:
#> Dir1 Dir2
#> x1 1e+00 -3e-04
#> x2 3e-04 1e+00
#> x3 0e+00 0e+00
#> x4 0e+00 0e+00
#> x5 0e+00 0e+00
#> x6 0e+00 0e+00
#> x7 0e+00 0e+00
#> x8 0e+00 0e+00
#> x9 0e+00 0e+00
#> x10 0e+00 0e+00
#>
#> Selected variables (2 of 10): x1, x2We illustrate a weighted estimator on the Wisconsin Diagnostic Breast
Cancer (WDBC) data, coding the diagnosis as malignant = +1
and benign = -1. After fitting the penalized principal
weighted L2-SVM (P2WL2M), the predictors are projected onto the
estimated two-dimensional central subspace and plotted by class.
data(wdbc)
x <- scale(as.matrix(wdbc[, -1]))
y <- ifelse(wdbc$diagnosis == "M", 1, -1)
fit <- ppm(x, y, loss = "wl2svm", penalty = "grSCAD", lambda = 0.3)
summary(fit, d = 2)
#> Penalized Principal Machine (P2M) summary
#> Loss: wl2svm Penalty: grSCAD lambda = 0.3
#> Working dimension d = 2
#>
#> Estimated basis of the central subspace:
#> Dir1 Dir2
#> radius_mean -0.2733 -0.3393
#> texture_mean 0.0000 0.0000
#> perimeter_mean -0.2929 -0.3362
#> area_mean -0.2650 0.0599
#> smoothness_mean 0.0000 0.0000
#> compactness_mean -0.0819 -0.0947
#> concavity_mean -0.2452 0.0401
#> concave_points_mean -0.3706 0.2268
#> symmetry_mean 0.0000 0.0000
#> fractal_dimension_mean 0.0000 0.0000
#> radius_se -0.1022 0.3825
#> texture_se 0.0000 0.0000
#> perimeter_se -0.0773 0.2433
#> area_se -0.0670 0.2180
#> smoothness_se 0.0000 0.0000
#> compactness_se 0.0000 0.0000
#> concavity_se 0.0000 0.0000
#> concave_points_se 0.0000 0.0000
#> symmetry_se 0.0000 0.0000
#> fractal_dimension_se 0.0000 0.0000
#> radius_worst -0.3641 0.0832
#> texture_worst 0.0000 0.0000
#> perimeter_worst -0.3715 0.0384
#> area_worst -0.3223 0.4892
#> smoothness_worst 0.0000 0.0000
#> compactness_worst -0.0743 -0.0829
#> concavity_worst -0.1698 -0.2341
#> concave_points_worst -0.3679 -0.3794
#> symmetry_worst 0.0000 0.0000
#> fractal_dimension_worst 0.0000 0.0000
#>
#> Selected variables (15 of 30): radius_mean, perimeter_mean, area_mean, compactness_mean, concavity_mean, concave_points_mean, radius_se, perimeter_se, area_se, radius_worst, perimeter_worst, area_worst, compactness_worst, concavity_worst, concave_points_worst
B <- fit$evectors[, 1:2]
scores <- x %*% B
plot(scores[, 1], scores[, 2],
col = ifelse(y == 1, "red", "blue"),
pch = ifelse(y == 1, 17, 1),
xlab = "1st SDR direction", ylab = "2nd SDR direction")
legend("topright", legend = c("malignant (+1)", "benign (-1)"),
col = c("red", "blue"), pch = c(17, 1))The Boston housing data (data(boston)) provide a
continuous-response counterpart for the response-based estimators such
as "lssvm".