A unified and computationally efficient R package for sparse sufficient dimension reduction (SDR) using Penalized Principal Machines (P²M) and Group Coordinate Descent (GCD).
The ppmSDR package provides a unified interface and efficient algorithms for sparse sufficient dimension reduction (SDR) in regression and classification. It implements the Penalized Principal Machine (P²M) family, which generalizes the principal support vector machine (PSVM) by allowing a wide range of convex loss functions together with modern sparsity-inducing penalties. Efficient computation is achieved via the Group Coordinate Descent (GCD) and MM-GCD algorithms, making the package scalable to large and high-dimensional data.
ppm() that fits any
of ten penalized principal machine estimators, selected through the
loss argument, for both regression and binary
classification.ppm_tune() for cross-validation of the sparsity
parameter.boston, wdbc)
and summary()/print() methods for inspecting
the estimated basis and selected variables.# Install from GitHub (requires devtools)
devtools::install_github("c16267/ppmSDR")ppmSDR/
├── R/
│ ├── ppm.R # ppm(): unified front-end + input validation + dispatch
│ ├── ppm_tune.R # ppm_tune(): dCor-based K-fold cross-validation
│ ├── estimators.R # ten internal P2M solvers (selected via loss=)
│ ├── methods.R # S3 methods: print.ppm, summary.ppm, print.ppm_tune
│ ├── utils-internal.R # internal GCD helpers (thresholding, block-diagonal algebra)
│ ├── data.R # documentation of the bundled datasets
│ └── ppmSDR-package.R # package-level documentation and imports
├── data/ # boston.rda, wdbc.rda
├── man/ # generated by roxygen2
├── vignettes/ # ppmSDR.Rmd
└── tests/ # unit tests (testthat)
The loss-specific solvers are internal; users access
them through the loss argument of ppm() rather
than calling them directly.
| Function | Description |
|---|---|
ppm() |
Unified wrapper to fit any penalized PM via the loss
argument |
ppm_tune() |
K-fold cross-validation that selects the sparsity parameter
lambda |
summary() |
Estimated basis of the central subspace and the selected variables |
print() |
Compact summary of a fitted ppm / ppm_tune
object |
loss argument of ppm())loss |
Method | Response | Algorithm |
|---|---|---|---|
"lssvm" |
P²LSM | continuous | GCD |
"wlssvm" |
P²WLSM | binary | GCD |
"logit" |
P²LR | continuous | Iterative GCD |
"wlogit" |
P²WLR | binary | Iterative GCD |
"asls" |
P²AR | both | Iterative GCD |
"l2svm" |
P²L2M | continuous | Iterative GCD |
"wl2svm" |
P²WL2M | binary | Iterative GCD |
"svm" |
P²SVM | continuous | MM-GCD |
"wsvm" |
P²WSVM | binary | MM-GCD |
"qr" |
P²QR | both | MM-GCD |
Each method supports the "grSCAD", "grMCP"
and "grLasso" penalties.
library(ppmSDR)
## Generate data: the first two predictors form the central subspace
set.seed(1)
n <- 1000; p <- 10
B <- matrix(0, p, 2); B[1, 1] <- B[2, 2] <- 1
x <- MASS::mvrnorm(n, rep(0, p), diag(1, p))
y <- (x %*% B[, 1] / (0.5 + (x %*% B[, 2] + 1)^2)) + 0.2 * rnorm(n)
y.binary <- sign(y)
## Penalized principal least-squares SVM (P2LSM) via the unified interface
fit <- ppm(x, y, H = 10, C = 1, loss = "lssvm", penalty = "grSCAD", lambda = 0.01)
round(fit$evectors[, 1:2], 3)
summary(fit, d = 2)
## Penalized principal SVM (P2SVM); hinge loss benefits from a larger cost C
fit_svm <- ppm(x, y, H = 10, C = 100, loss = "svm", penalty = "grSCAD", lambda = 3e-5)
round(fit_svm$evectors[, 1:2], 3)
## Binary classification: penalized principal weighted least-squares SVM (P2WLSM)
fit_w <- ppm(x, y.binary, H = 10, C = 1, loss = "wlssvm", penalty = "grSCAD", lambda = 8e-4)
round(fit_w$evectors[, 1:2], 3)
## Select lambda by cross-validation (set the seed for reproducible folds)
set.seed(1)
cv <- ppm_tune(x, y, loss = "lssvm", d = 2, n.fold = 5, nlambda = 20)
cv$opt.lambda
summary(cv$fit, d = 2)data(boston) # Boston housing (continuous response: medv)
data(wdbc) # Wisconsin Diagnostic Breast Cancer (binary: diagnosis B/M)See the package vignette for a worked walk-through:
browseVignettes("ppmSDR")