ppmSDR: Penalized Principal Machine for Sparse Sufficient Dimension Reduction

A unified and computationally efficient R package for sparse sufficient dimension reduction (SDR) using Penalized Principal Machines (P²M) and Group Coordinate Descent (GCD).

Overview

The ppmSDR package provides a unified interface and efficient algorithms for sparse sufficient dimension reduction (SDR) in regression and classification. It implements the Penalized Principal Machine (P²M) family, which generalizes the principal support vector machine (PSVM) by allowing a wide range of convex loss functions together with modern sparsity-inducing penalties. Efficient computation is achieved via the Group Coordinate Descent (GCD) and MM-GCD algorithms, making the package scalable to large and high-dimensional data.

Key Features

Installation

# Install from GitHub (requires devtools)
devtools::install_github("c16267/ppmSDR")

Repository Structure

ppmSDR/
├── R/
│   ├── ppm.R              # ppm(): unified front-end + input validation + dispatch
│   ├── ppm_tune.R         # ppm_tune(): dCor-based K-fold cross-validation
│   ├── estimators.R       # ten internal P2M solvers (selected via loss=)
│   ├── methods.R          # S3 methods: print.ppm, summary.ppm, print.ppm_tune
│   ├── utils-internal.R   # internal GCD helpers (thresholding, block-diagonal algebra)
│   ├── data.R             # documentation of the bundled datasets
│   └── ppmSDR-package.R   # package-level documentation and imports
├── data/                  # boston.rda, wdbc.rda
├── man/                   # generated by roxygen2
├── vignettes/             # ppmSDR.Rmd
└── tests/                 # unit tests (testthat)

The loss-specific solvers are internal; users access them through the loss argument of ppm() rather than calling them directly.

Main Functions

Function Description
ppm() Unified wrapper to fit any penalized PM via the loss argument
ppm_tune() K-fold cross-validation that selects the sparsity parameter lambda
summary() Estimated basis of the central subspace and the selected variables
print() Compact summary of a fitted ppm / ppm_tune object

Supported losses (the loss argument of ppm())

loss Method Response Algorithm
"lssvm" P²LSM continuous GCD
"wlssvm" P²WLSM binary GCD
"logit" P²LR continuous Iterative GCD
"wlogit" P²WLR binary Iterative GCD
"asls" P²AR both Iterative GCD
"l2svm" P²L2M continuous Iterative GCD
"wl2svm" P²WL2M binary Iterative GCD
"svm" P²SVM continuous MM-GCD
"wsvm" P²WSVM binary MM-GCD
"qr" P²QR both MM-GCD

Each method supports the "grSCAD", "grMCP" and "grLasso" penalties.

Example Usage

library(ppmSDR)

## Generate data: the first two predictors form the central subspace
set.seed(1)
n <- 1000; p <- 10
B <- matrix(0, p, 2); B[1, 1] <- B[2, 2] <- 1
x <- MASS::mvrnorm(n, rep(0, p), diag(1, p))
y <- (x %*% B[, 1] / (0.5 + (x %*% B[, 2] + 1)^2)) + 0.2 * rnorm(n)
y.binary <- sign(y)

## Penalized principal least-squares SVM (P2LSM) via the unified interface
fit <- ppm(x, y, H = 10, C = 1, loss = "lssvm", penalty = "grSCAD", lambda = 0.01)
round(fit$evectors[, 1:2], 3)
summary(fit, d = 2)

## Penalized principal SVM (P2SVM); hinge loss benefits from a larger cost C
fit_svm <- ppm(x, y, H = 10, C = 100, loss = "svm", penalty = "grSCAD", lambda = 3e-5)
round(fit_svm$evectors[, 1:2], 3)

## Binary classification: penalized principal weighted least-squares SVM (P2WLSM)
fit_w <- ppm(x, y.binary, H = 10, C = 1, loss = "wlssvm", penalty = "grSCAD", lambda = 8e-4)
round(fit_w$evectors[, 1:2], 3)

## Select lambda by cross-validation (set the seed for reproducible folds)
set.seed(1)
cv <- ppm_tune(x, y, loss = "lssvm", d = 2, n.fold = 5, nlambda = 20)
cv$opt.lambda
summary(cv$fit, d = 2)

Bundled datasets

data(boston)   # Boston housing (continuous response: medv)
data(wdbc)     # Wisconsin Diagnostic Breast Cancer (binary: diagnosis B/M)

See the package vignette for a worked walk-through:

browseVignettes("ppmSDR")

References