| Encoding: | UTF-8 |
| Type: | Package |
| Maintainer: | Peter Carbonetto <pcarbo@uchicago.edu> |
| Title: | Smoothing by Adaptive Shrinkage |
| Version: | 1.3-12 |
| Date: | 2025-12-09 |
| Description: | Fast, wavelet-based Empirical Bayes shrinkage methods for signal denoising, including smoothing Poisson-distributed data and Gaussian-distributed data with possibly heteroskedastic error. The algorithms implement the methods described Z. Xing, P. Carbonetto & M. Stephens (2021) https://jmlr.org/papers/v22/19-042.html. |
| License: | GPL (≥ 3) |
| Copyright: | file COPYRIGHTS |
| Depends: | R (≥ 3.1.1), |
| Imports: | utils, stats, data.table, caTools, wavethresh, ashr, Rcpp (≥ 1.1.0) |
| Suggests: | knitr, rmarkdown, MASS, EbayesThresh, testthat |
| LinkingTo: | Rcpp |
| NeedsCompilation: | yes |
| LazyData: | true |
| URL: | https://github.com/stephenslab/smashr |
| BugReports: | https://github.com/stephenslab/smashr/issues |
| RoxygenNote: | 7.3.1 |
| Packaged: | 2025-12-09 16:54:02 UTC; pcarbo |
| Author: | Zhengrong Xing [aut], Matthew Stephens [aut], Kaiqian Zhang [ctb], Daniel Nachun [ctb], Guy Nason [cph], Stuart Barber [cph], Tim Downie [cph], Piotr Frylewicz [cph], Arne Kovac [cph], Todd Ogden [cph], Bernard Silverman [cph], Peter Carbonetto [aut, cre] |
| Repository: | CRAN |
| Date/Publication: | 2025-12-15 20:10:02 UTC |
smashr: Smoothing using Adaptive SHrinkage in R
Description
This package performs nonparametric regression on
univariate Poisson or Gaussian data using multi-scale methods. For
the Poisson case, the data x is a vector, with x_j \sim
Poi(\mu_j) where the mean vector \mu is to be estimated.
For the Gaussian case, the data x are a vector with x_j
\sim N(\mu_j, \sigma^2_j). Where the mean vector \mu and
variance vector \sigma^2 are to be estimated. The primary
assumption is that \mu is spatially structured, so \mu_j
- \mu_{j+1} will often be small (that is, roughly, \mu is
smooth). Also \sigma is spatially structured in the Gaussian
case (or, optionally, \sigma is constant, not depending on
j).
Details
The function smash provides a minimal
interface to perform simple smoothing. It is actually a wrapper to
smash.gaus and smash.poiss which
provide more options for advanced use. The only required input is
a vector of length 2^J for some integer J. Other options include
the possibility of returning the posterior variances, specifying a
wavelet basis (default is Haar, which performs well in general due
to the fact that smash uses the translation-invariant transform)
Author(s)
Matthew Stephens and Zhengrong Xing
See Also
Useful links:
Model fitting using weighted least squares or a GLM approach.
Description
Fit the model specified in documentation, using either a weighted least squares approach or a generalized linear model approach, with some modifications. This function fits many "simple" logistic regressions (ie zero or one covariate) simultaneously, allowing for the possibility of small sample sizes with low or zero counts. In addition, an alternative model in the form of a weighted least squares regression can also be fit in place of a logistic regression.
Usage
glm.approx(
x,
g = NULL,
minobs = 1,
pseudocounts = 0.5,
all = FALSE,
eps = 1e-08,
center = FALSE,
repara = FALSE,
forcebin = FALSE,
lm.approx = FALSE,
disp = c("add", "mult"),
bound = 0.02
)
Arguments
x |
A matrix of N (# of samples) by 2*B (B: # of WCs or, more precisely, of different scales and locations in multi-scale space); Two consecutive columns correspond to a particular scale and location; The first column (the second column) contains # of successes (# of failures) for each sample at the corresponding scale and location. |
g |
A vector of covariate values. Can be a factor (2 groups
only) or quantitative. For a 2-group categorical covariate, provide
|
minobs |
Minimum number of non-zero required for each model to be fitted (otherwise NA is returned for that model). |
pseudocounts |
A number to be added to counts when counts are zero (or possibly extremely small). |
all |
Bool, if TRUE pseudocounts are added to all entries, if FALSE (default) pseudocounts are added only to cases when either number of successes or number of failures (but not both) is 0. |
eps |
Small positive number added to counts to improve numerical stability of the computations. |
center |
Bool, indicating whether to center |
repara |
Bool, indicating whether to reparameterize
|
forcebin |
Bool, if TRUE don't allow for
overdipersion. Defaults to TRUE if |
lm.approx |
Bool, indicating whether a WLS alternative should be used. Defaults to FALSE. |
disp |
A string, can be either "add" or "mult", indicating the
form of overdispersion assumed when |
bound |
Numeric, indicates the threshold of the success vs failure ratio below which pseudocounts will be added. |
Value
A matrix of 2 (or 5 if g is provided) by T (# of WCs); Each row contains alphahat (1st row), standard error of alphahat (2nd), betahat (3rd), standard error of betahat (4th), covariance between alphahat and betahat (5th) for each WC.
Reflect and extend a vector.
Description
Extends the vector to have length a power of 2 (if not already a power of 2) and then reflects it about its right end.
Usage
reflect(x)
Arguments
x |
An n-vector. |
Details
The vector x is first reflected about both its left and right ends, by (roughly) the same amount each end, to make its length a power of 2 (if the length of x is already a power of 2 this step is skipped). Then the resulting vector is reflected about its right end to create a vector that is both circular (left and right ends are the same) and a power of 2.
Value
A list with two list elements: "x" containing the
extended-and-reflected signal; and "idx" containing the indices of the
original signal.
Reverse wavelet transform a set of probabilities in TItable format for Poisson data.
Description
Reverse wavelet transform a set of probabilities in TItable format for Poisson data.
Usage
reverse.pwave(est, lp, lq = NULL)
Arguments
est |
An n-vector. Usually a constant vector with each element equal to the estimated log(total rate). |
lp |
A J by n matrix of probabilities. |
lq |
A J by n matrix of complementary probabilities. |
Value
Reconstructed signal in the original data space.
Estimate homoskedastic standard deviation from nonparamatric regression.
Description
Estimate homoskedastic standard deviation from nonparamatric regression.
Usage
sd_estimate_gasser_etal(x)
Arguments
x |
The data. |
Details
Uses formula (3) from Brown and Levine (2007), Annals of Statistics, who attribute it to Gasser et al.
Value
An estimate of the standard deviation
Estimate the underlying mean or intensity function from Gaussian or Poisson data, respectively.
Description
This is a wrapper function for
smash.gaus or smash.poiss as
appropriate. For details see smash.gaus and
smash.poiss.
Usage
smash(x, model = NULL, ...)
Arguments
x |
A vector of observations (assumed equally spaced). |
model |
Specifies the model (Gaussian or Poisson). Can be NULL, in which case the Poisson model is assumed if x consists of integers, and the Gaussian model is assumed otherwise. One of 'gaus' or 'poiss' can also be specified to fit a specific model. |
... |
Additional arguments passed to
|
Details
Performs nonparametric regression on univariate Poisson or
Gaussian data using wavelets. Unlike many wavelet methods
the data do not need to have length that is
a power of 2 or be cyclic – the functions internally deal with these issues.
For the Poisson case, the data are
assumed to be i.i.d. from an underlying inhomogeneous mean
function that is "smooth". Similarly for the Gaussian case, the
data are assumed to be independent with an underlying smooth mean
function. In the Gaussian case, the variances are allowed vary,
but are similarly "spatially structured" as with the mean
function. The functions smash.gaus and
smash.pois perform smoothing for Gaussian and Poisson
data respectively.
Value
See smash.gaus or smash.poiss
for details.
Examples
# First Gaussian example
# ----------------------
# In this first example, the length of the signal is a power of 2.
#
# Create the baseline mean function. (The "spikes" function is used
# as an example here.)
set.seed(2)
n <- 2^9
t <- 1:n/n
spike.f <- function (x) (0.75 * exp(-500 * (x - 0.23)^2) +
1.5 * exp(-2000 * (x - 0.33)^2) + 3 * exp(-8000 * (x - 0.47)^2) +
2.25 * exp(-16000 * (x - 0.69)^2) + 0.5 * exp(-32000 * (x - 0.83)^2))
mu.s <- spike.f(t)
# Scale the signal to be between 0.2 and 0.8
mu.t <- (1 + mu.s)/5
plot(mu.t,type = "l")
# Create the baseline variance function. (The function V2 from Cai &
# Wang (2008) is used here.)
var.fn <- (1e-04 + 4 * (exp(-550 * (t - 0.2)^2) +
exp(-200 * (t - 0.5)^2) +
exp(-950 * (t - 0.8)^2)))/1.35
plot(var.fn,type = "l")
# Set the signal-to-noise ratio.
rsnr <- sqrt(5)
sigma.t <- sqrt(var.fn)/mean(sqrt(var.fn)) * sd(mu.t)/rsnr^2
# Simulate an example dataset.
X.s <- rnorm(n,mu.t,sigma.t)
# Run smash (Gaussian version is run since observations are not
# counts).
mu.est <- smash(X.s)
# Plot the true mean function as well as the estimated one.
plot(mu.t,type = "l")
lines(mu.est,col = 2)
# First Poisson example
# ---------------------
# Scale the signal to be non-zero and to have a low average intensity.
mu.t <- 0.01 + mu.s
# Simulate an example dataset.
X.s <- rpois(n,mu.t)
# Run smash (the Poisson version is run since observations are counts).
mu.est <- smash(X.s)
# Plot the true mean function as well as the estimated one.
plot(mu.t,type = "l")
lines(mu.est,col = 2)
# Second Gaussian example
# -----------------------
# In this second example, we demonstrate that smash also works even
# when the signal length (n) is not a power of 2.
n <- 1000
t <- 1:n/n
mu.s <- spike.f(t)
# Scale the signal to be between 0.2 and 0.8.
mu.t <- (1 + mu.s)/5
# Create the baseline variance function.
var.fn <- (1e-04 + 4 * (exp(-550 * (t - 0.2)^2) +
exp(-200 * (t - 0.5)^2) +
exp(-950 * (t - 0.8)^2)))/1.35
sigma.t <- sqrt(var.fn)/mean(sqrt(var.fn)) * sd(mu.t)/rsnr^2
# Simulate an example dataset.
X.s <- rnorm(n,mu.t,sigma.t)
# Run smash.
mu.est <- smash(X.s)
# Plot the true mean function as well as the estimated one.
plot(mu.t,type = "l")
lines(mu.est,col = 2)
# Second Poisson example
# ----------------------
# The Poisson version of smash also works with signals that are not
# exactly of length 2^J for some integer J.
#
# Scale the signal to be non-zero and to have a low average intensity.
mu.t <- 0.01 + mu.s
# Simulate an example dataset
X.s <- rpois(n,mu.t)
# Run smash (Poisson version is run since observations are counts).
mu.est <- smash(X.s)
# Plot the true mean function as well as the estimated one.
plot(mu.t,type = "l")
lines(mu.est,col = 2)
Estimate underlying mean function from noisy Gaussian data.
Description
This function performs non-parametric regression (signal denoising) using Empirical Bayes wavelet-based methods.
Usage
smash.gaus(
x,
sigma = NULL,
v.est = FALSE,
joint = FALSE,
v.basis = FALSE,
post.var = FALSE,
filter.number = 1,
family = "DaubExPhase",
return.loglr = FALSE,
jash = FALSE,
SGD = TRUE,
weight = 0.5,
min.var = 1e-08,
ashparam = list(),
homoskedastic = FALSE,
reflect = FALSE
)
Arguments
x |
A vector of observations. Reflection is done automatically
if length of |
sigma |
A vector of standard deviations. Can be provided if known or estimated beforehand. |
v.est |
Boolean indicating if variance estimation should be performed instead. |
joint |
Boolean indicating if results of mean and variance estimation should be returned together. |
v.basis |
Boolean indicating if the same wavelet basis should be used for variance estimation as mean estimation. If false, defaults to Haar basis for variance estimation (this is much faster than other bases). |
post.var |
Boolean indicating if the posterior variance should be returned for the mean and/or variance estiamtes. |
filter.number |
Choice of wavelet basis to be used, as in
|
family |
Choice of wavelet basis to be used, as in
|
return.loglr |
Boolean indicating if a logLR should be returned. |
jash |
Indicates if the prior from method JASH should be used. This will often provide slightly better variance estimates (especially for nonsmooth variance functions), at the cost of computational efficiency. Defaults to FALSE. |
SGD |
Boolean indicating if stochastic gradient descent should be used in the EM. Only applicable if jash=TRUE. |
weight |
Optional parameter used in estimating overall variance. Only works for Haar basis. Defaults to 0.5. Setting this to 1 might improve variance estimation slightly. |
min.var |
The minimum positive value to be set if the variance estimates are non-positive. |
ashparam |
A list of parameters to be passed to |
homoskedastic |
indicates whether to assume constant variance (if v.est is true) |
reflect |
A logical indicating if the signals should be reflected. |
Details
We assume that the data come from the model Y_t =
\mu_t + \epsilon_t for t=1,...,T, where \mu_t is an
underlying mean, assumed to be spatially structured (or treated as
points sampled from a smooth continous function), and
\epsilon_t \sim N(0, \sigma_t), and are independent. Smash
provides estimates of \mu_t and \sigma_t^2 (and their
posterior variances if desired).
Value
By default smash.gaus simply returns a vector of estimated means.
However, if more outputs are requested (eg if return.loglr or v.est is TRUE) then
the output is a list with one or more of the following elements:
mu.res |
A list with the mean estimate, its posterior variance
if |
If v.est is TRUE, then smash.gaus returns the
following:
var.res |
A list with the variance estimate, its posterior
variance if |
Examples
n=2^10
t=1:n/n
spike.f = function(x) (0.75*exp(-500*(x-0.23)^2) +
1.5*exp(-2000*(x-0.33)^2) + 3*exp(-8000*(x-0.47)^2) +
2.25*exp(-16000*(x-0.69)^2)+0.5*exp(-32000*(x-0.83)^2))
mu.s = spike.f(t)
# Gaussian case
mu.t = (1+mu.s)/5
plot(mu.t,type='l')
var.fn = (0.0001 + 4*(exp(-550*(t-0.2)^2) + exp(-200*(t-0.5)^2) +
exp(-950*(t-0.8)^2)))/1.35
plot(var.fn,type='l')
rsnr=sqrt(5)
sigma.t=sqrt(var.fn)/mean(sqrt(var.fn))*sd(mu.t)/rsnr^2
X.s=rnorm(n,mu.t,sigma.t)
mu.est=smash.gaus(X.s)
plot(mu.t,type='l')
lines(mu.est,col=2)
Estimate the underlying intensity for Poisson counts.
Description
Main smoothing procedure for Poisson data. Takes a univariate inhomogeneous Poisson process and estimates its mean intensity.
Usage
smash.poiss(
x,
post.var = FALSE,
log = FALSE,
reflect = FALSE,
glm.approx.param = list(),
ashparam = list(),
cxx = TRUE,
lev = 0
)
Arguments
x |
A vector of Poisson counts (reflection is done
automatically if length of |
post.var |
Boolean, indicates if the posterior variance should be returned. |
log |
bool, determines if smoothed signal is returned on log scale or not |
reflect |
A logical indicating if the signals should be reflected. |
glm.approx.param |
A list of parameters to be passed to
|
ashparam |
A list of parameters to be passed to |
cxx |
bool, indicates if C++ code should be used to create TI tables. |
lev |
integer from 0 to J-1, indicating primary level of resolution. Should NOT be used (ie shrinkage is performed at all resolutions) unless there is good reason to do so. |
Details
We assume that the data come from the model Y_t \sim
Pois(\mu_t) for t=1,...,T, where \mu_t is the
underlying intensity, assumed to be spatially structured (or
treated as points sampled from a smooth continous function). The
Y_t are assumed to be independent. Smash provides estimates
of \mu_t (and its posterior variance if desired).
Value
smash.poiss returns the mean estimate by default,
with the posterior variance as an additional component if
post.var is TRUE.
Examples
n=2^10
t=1:n/n
spike.f = function(x) (0.75*exp(-500*(x-0.23)^2) +
1.5*exp(-2000*(x-0.33)^2) + 3*exp(-8000*(x-0.47)^2) +
2.25*exp(-16000*(x-0.69)^2)+0.5*exp(-32000*(x-0.83)^2))
mu.s=spike.f(t)
mu.t=0.01+mu.s
X.s=rpois(n,mu.t)
mu.est=smash.poiss(X.s)
plot(mu.t,type='l')
lines(mu.est,col=2)
TI thresholding with heteroskedastic errors.
Description
TI thresholding with heteroskedastic errors.
Usage
ti.thresh(
x,
sigma = NULL,
method = "smash",
filter.number = 1,
family = "DaubExPhase",
min.level = 3,
ashparam = list()
)
Arguments
x |
The data. Should be a vector of length a power of 2. |
sigma |
The standard deviation function. Can be provided if known or estimated beforehand. |
method |
The method to estimate the variance function. Can be 'rmad' for running MAD as described in Gao (1997), or 'smash'. |
filter.number |
The wavelet basis to be used. |
family |
The wavelet basis to be used. |
min.level |
The primary resolution level. |
ashparam |
Passed as the “ashparam” argument in the
call to |
Details
The 'rmad' option effectively implements the procedure
described in Gao (1997), while the 'smash' option first estimates
the variance function using package smash and then performs
thresholding given this variance function.
Value
returns a vector of mean estimates
References
Gao, Hong-Ye (1997) Wavelet shrinkage estimates for heteroscedastic regression models. MathSoft, Inc.
Examples
n=2^10
t=1:n/n
spike.f = function(x) (0.75*exp(-500*(x-0.23)^2) +
1.5*exp(-2000*(x-0.33)^2) +
3*exp(-8000*(x-0.47)^2) +
2.25*exp(-16000*(x-0.69)^2) +
0.5*exp(-32000*(x-0.83)^2))
mu.s = spike.f(t)
# Gaussian case
# -------------
mu.t=(1+mu.s)/5
plot(mu.t,type='l')
var.fn = (0.0001+4*(exp(-550*(t-0.2)^2) +
exp(-200*(t-0.5)^2) + exp(-950*(t-0.8)^2)))/1.35
plot(var.fn,type='l')
rsnr=sqrt(5)
sigma.t=sqrt(var.fn)/mean(sqrt(var.fn))*sd(mu.t)/rsnr^2
X.s=rnorm(n,mu.t,sigma.t)
mu.est.rmad<-ti.thresh(X.s,method='rmad')
mu.est.smash<-ti.thresh(X.s,method='smash')
plot(mu.t,type='l')
lines(mu.est.rmad,col=2)
lines(mu.est.smash,col=4)
Three-month treasury bill data.
Description
Yields of the three-month Treasury Bill from the secondary market rates, on Fridays. The secondary market rates are annualised using a 360-day year of bank interest and are quoted on a discount basis. The data consist of 1735 weekly observations, from 5 January 1962 to 31 March 1995.
Usage
data(treas)
Format
Data are stored in numeric (floating-point) vector, treas, with
1735 entries.
References
J. Fan and Q. Yao (1998). Efficient estimation of conditional variance functions in stochastic regression. Biometrika 85, 645–660.
Examples
# See smashr vignette.