README

A unified and computationally efficient R package for sparse sufficient dimension reduction (SDR) using Penalized Principal Machines (P²M) and Group Coordinate Descent (GCD).

Overview

The ppmSDR package provides a unified interface and efficient algorithms for sparse sufficient dimension reduction (SDR) in regression and classification. It implements the Penalized Principal Machine (P²M) family, which generalizes the principal support vector machine (PSVM) by allowing a wide range of convex loss functions together with modern sparsity-inducing penalties. Efficient computation is achieved via the Group Coordinate Descent (GCD) and MM-GCD algorithms, making the package scalable to large and high-dimensional data.

Key Features

Installation

# Install from GitHub (requires devtools)
devtools::install_github("c16267/ppmSDR")

Repository Structure

ppmSDR/ ├── R/ │ ├── ppm.R # ppm(): unified front-end + input validation + dispatch │ ├── ppm_tune.R # ppm_tune(): dCor-based K-fold cross-validation │ ├── estimators.R # ten internal P2M solvers (selected via loss=) │ ├── methods.R # S3 methods: print.ppm, summary.ppm, print.ppm_tune │ ├── utils-internal.R # internal GCD helpers (thresholding, block-diagonal algebra) │ ├── data.R # documentation of the bundled datasets │ └── ppmSDR-package.R # package-level documentation and imports ├── data/ # boston.rda, wdbc.rda ├── man/ # generated by roxygen2 ├── vignettes/ # ppmSDR.Rmd └── tests/ # unit tests (testthat)

The loss-specific solvers are internal; users access them through the loss argument of ppm() rather than calling them directly.

Main Functions

Function	Description
`ppm()`	Unified wrapper to fit any penalized PM via the `loss` argument
`ppm_tune()`	K-fold cross-validation that selects the sparsity parameter `lambda`
`summary()`	Estimated basis of the central subspace and the selected variables
`print()`	Compact summary of a fitted `ppm` / `ppm_tune` object

ppm()

Unified wrapper to fit any penalized PM via the loss argument

ppm_tune()

K-fold cross-validation that selects the sparsity parameter lambda

summary()

Estimated basis of the central subspace and the selected variables

print()

Compact summary of a fitted ppm / ppm_tune object

Supported losses (the loss argument of ppm())

`loss`	Method	Response	Algorithm
`"lssvm"`	P²LSM	continuous	GCD
`"wlssvm"`	P²WLSM	binary	GCD
`"logit"`	P²LR	continuous	Iterative GCD
`"wlogit"`	P²WLR	binary	Iterative GCD
`"asls"`	P²AR	both	Iterative GCD
`"l2svm"`	P²L2M	continuous	Iterative GCD
`"wl2svm"`	P²WL2M	binary	Iterative GCD
`"svm"`	P²SVM	continuous	MM-GCD
`"wsvm"`	P²WSVM	binary	MM-GCD
`"qr"`	P²QR	both	MM-GCD

"lssvm"

P²LSM

continuous

GCD

"wlssvm"

P²WLSM

binary

GCD

"logit"

P²LR

continuous

Iterative GCD

"wlogit"

P²WLR

binary

Iterative GCD

"asls"

P²AR

both

Iterative GCD

"l2svm"

P²L2M

continuous

Iterative GCD

"wl2svm"

P²WL2M

binary

Iterative GCD

"svm"

P²SVM

continuous

MM-GCD

"wsvm"

P²WSVM

binary

MM-GCD

"qr"

P²QR

both

MM-GCD

Example Usage

library(ppmSDR)

## Generate data: the first two predictors form the central subspace
set.seed(1)
n <- 1000; p <- 10
B <- matrix(0, p, 2); B[1, 1] <- B[2, 2] <- 1
x <- MASS::mvrnorm(n, rep(0, p), diag(1, p))
y <- (x %*% B[, 1] / (0.5 + (x %*% B[, 2] + 1)^2)) + 0.2 * rnorm(n)
y.binary <- sign(y)

## Penalized principal least-squares SVM (P2LSM) via the unified interface
fit <- ppm(x, y, H = 10, C = 1, loss = "lssvm", penalty = "grSCAD", lambda = 0.01)
round(fit$evectors[, 1:2], 3)
summary(fit, d = 2)

## Penalized principal SVM (P2SVM); hinge loss benefits from a larger cost C
fit_svm <- ppm(x, y, H = 10, C = 100, loss = "svm", penalty = "grSCAD", lambda = 3e-5)
round(fit_svm$evectors[, 1:2], 3)

## Binary classification: penalized principal weighted least-squares SVM (P2WLSM)
fit_w <- ppm(x, y.binary, H = 10, C = 1, loss = "wlssvm", penalty = "grSCAD", lambda = 8e-4)
round(fit_w$evectors[, 1:2], 3)

## Select lambda by cross-validation (set the seed for reproducible folds)
set.seed(1)
cv <- ppm_tune(x, y, loss = "lssvm", d = 2, n.fold = 5, nlambda = 20)
cv$opt.lambda
summary(cv$fit, d = 2)

Bundled datasets

data(boston)   # Boston housing (continuous response: medv)
data(wdbc)     # Wisconsin Diagnostic Breast Cancer (binary: diagnosis B/M)

References

Artemiou, A. and Dong, Y. (2016). Sufficient dimension reduction via principal lq support vector machine, Electronic Journal of Statistics, 10: 783–805.

Artemiou, A., Dong, Y. and Shin, S. J. (2021). Real-time sufficient dimension reduction through principal least squares support vector machines, Pattern Recognition, 112: 107768.

Breheny, P. and Huang, J. (2015). Group descent algorithms for nonconvex penalized linear and logistic regression models with grouped predictors, Statistics and Computing, 25: 173–187.

Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties, Journal of the American Statistical Association, 96: 1348–1360.

Hunter, D. R. and Lange, K. (2004). A tutorial on MM algorithms, The American Statistician, 58(1): 30–37.

Jang, H. J., Shin, S. J. and Artemiou, A. (2023). Principal weighted least square support vector machine: An online dimension-reduction tool for binary classification, Computational Statistics & Data Analysis, 187: 107818.

Kim, B. and Shin, S. J. (2019). Principal weighted logistic regression for sufficient dimension reduction in binary classification, Journal of the Korean Statistical Society, 48(2): 194–206.

Li, B., Artemiou, A. and Li, L. (2011). Principal support vector machines for linear and nonlinear sufficient dimension reduction, Annals of Statistics, 39(6): 3182–3210.

Shin, J. and Shin, S. J. (2024). A concise overview of principal support vector machines and its generalization, Communications for Statistical Applications and Methods, 31(2): 235–246.

Shin, J., Shin, S. J. and Artemiou, A. (2024). The R package psvmSDR: A unified algorithm for sufficient dimension reduction via principal machines, arXiv preprint arXiv:2409.01547.

Shin, S. J. and Artemiou, A. (2017). Penalized principal logistic regression for sparse sufficient dimension reduction, Computational Statistics & Data Analysis, 111: 48–58.

ppmSDR: Penalized Principal Machine for Sparse Sufficient Dimension Reduction

Overview

Key Features

Installation

Repository Structure

Main Functions

Supported losses (the `loss` argument of `ppm()`)

Example Usage

Bundled datasets

References

ppmSDR: Penalized Principal Machine for Sparse Sufficient Dimension Reduction

Overview

Key Features

Installation

Repository Structure

Main Functions

Supported losses (the loss argument of ppm())

Example Usage

Bundled datasets

References

Supported losses (the `loss` argument of `ppm()`)