---
title: "Weibull Parallel Systems: EM Algorithm for Shape-Scale Estimation"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Weibull Parallel Systems: EM Algorithm for Shape-Scale Estimation}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.width = 6,
  fig.height = 4
)

# Heavy simulations are disabled by default for CRAN.
# Set run_long <- TRUE to reproduce the full comparison tables.
run_long <- FALSE
```

The exponential distribution is the workhorse of reliability theory, but its
constant hazard rate is a strong assumption. Real components exhibit
*increasing* hazard (wear-out), *decreasing* hazard (early-life failures), or
bathtub-shaped hazard curves. The Weibull distribution captures increasing and
decreasing hazard through a single shape parameter, making it the natural next
step beyond exponential models.

This vignette shows why the exponential machinery for parallel system
estimation does not extend to Weibull components, and how the EM algorithm
in `kofn` solves the resulting estimation problem.

```{r load-package}
library(kofn)
library(flexhaz)
```

## Why Weibull?

The Weibull distribution with shape $\alpha$ and scale $\beta$ has hazard
function $h(t) = (\alpha / \beta)(t / \beta)^{\alpha - 1}$. The shape
controls the failure mechanism:

- $\alpha < 1$: decreasing failure rate (DFR) -- early-life failures, burn-in
- $\alpha = 1$: constant failure rate -- exponential (memoryless)
- $\alpha > 1$: increasing failure rate (IFR) -- wear-out, fatigue

```{r weibull-hazard-plot, fig.cap = "Weibull hazard functions for different shape parameters."}
t_grid <- seq(0.01, 5, length.out = 200)
alphas <- c(0.5, 1.0, 1.5, 2.5)
beta <- 2.0

plot(NULL, xlim = c(0, 5), ylim = c(0, 3),
     xlab = "Time", ylab = "Hazard rate h(t)",
     main = "Weibull hazard: shape controls failure mechanism")

cols <- c("steelblue", "grey40", "darkorange", "firebrick")
for (i in seq_along(alphas)) {
  a <- alphas[i]
  h <- (a / beta) * (t_grid / beta)^(a - 1)
  lines(t_grid, h, col = cols[i], lwd = 2)
}
legend("topright",
       legend = paste0("alpha = ", alphas),
       col = cols, lwd = 2, bty = "n")
```

A parallel system with two Weibull components -- one DFR (infant mortality)
and one IFR (wear-out) -- models a system where one component is fragile
early but the other degrades over time. Neither a pure exponential nor a
single Weibull can capture this heterogeneity.


## What breaks: from exponential to Weibull

For exponential parallel systems, the `kofn` package uses an
*inclusion-exclusion (IE) expansion* that converts the product
$\prod_{i \neq j}(1 - e^{-\lambda_i t})$ into a finite sum of
exponentials. Every integral needed for the likelihood, score, and EM
E-step has a closed form.

For Weibull components, the analogous product is
$$
  \prod_{j=1}^{m}\bigl(1 - e^{-(t/\beta_j)^{\alpha_j}}\bigr).
$$
Expanding this product yields terms like
$\exp\bigl[-\sum_{j \in S}(t/\beta_j)^{\alpha_j}\bigr]$. When the
$\alpha_j$ differ, the exponent is a sum of *different power functions*
of $t$ -- not a linear function. The resulting integrals have no closed
form, so the IE approach that works so well for exponentials is
inapplicable.

Beyond the IE failure, the exponential's memoryless property also breaks.
For exponentials, $E[T_j \mid T_j < t]$ has a simple closed form. For
Weibull, the truncated mean involves the incomplete gamma function, and
the M-step requires solving a transcendental equation for the shape.

### Direct MLE struggles

The direct MLE (`method = "mle"`) optimizes the Weibull parallel system
log-likelihood numerically via L-BFGS-B with Nelder-Mead fallback. This
works, but the likelihood surface has problematic features:

- **Shape-scale confounding**: different $(\alpha, \beta)$ combinations
  produce similar system CDFs, creating ridges in the likelihood surface.
- **Boundary violations**: the optimizer can push shape estimates toward
  zero or infinity, especially for fast-failing components whose lifetimes
  are deeply left-censored under Scheme 0.
- **Shape blow-up**: at $\alpha = 1$ (the exponential boundary), the
  direct MLE's shape RMSE can be 24 times worse than the EM's.

```{r direct-mle-demo}
# Generate data from a Weibull parallel system
set.seed(42)
model_mle <- kofn(k = 2, m = 2, component = dfr_weibull(), method = "mle")
gen <- rdata(model_mle)

true_par <- c(1.5, 2.0,   # component 1: shape=1.5, scale=2.0
              2.0, 3.0)    # component 2: shape=2.0, scale=3.0
df <- gen(true_par, n = 50)

# Fit with direct MLE
fit_mle_fn <- fit(model_mle)
result_mle <- fit_mle_fn(df)

cat("Direct MLE estimates:\n")
cat(sprintf("  Component 1: shape = %.3f, scale = %.3f\n",
            result_mle$shapes[1], result_mle$scales[1]))
cat(sprintf("  Component 2: shape = %.3f, scale = %.3f\n",
            result_mle$shapes[2], result_mle$scales[2]))
cat(sprintf("  Converged: %s\n", result_mle$converged))
```


## The EM algorithm

The EM algorithm sidesteps the difficult joint optimization by
introducing a latent variable: **which component failed last**. In a
parallel system ($k = m$), the system lifetime is $T = \max_j T_j$, so
exactly one component has $T_j = T$ (the critical component) and all
others have $T_j < T$ (left-censored). The identity
$J = \arg\max_j T_j$ is unobserved under Scheme 0.

### E-step: classification and truncated moments

Given current parameter estimates $\theta^{(s)}$, the E-step computes:

1. **Classification probabilities**: $p_j(t) = P(J = j \mid T = t)$,
   the posterior probability that component $j$ failed last. This is
   proportional to $f_j(t) \prod_{i \neq j} F_i(t)$ -- the component
   weight in the system density.

2. **Truncated power moment**: $E[T_j^{\alpha_j} \mid T_j < t]$ for
   left-censored components. Using the substitution
   $U = (T/\beta)^{\alpha} \sim \text{Exp}(1)$, this reduces to an
   incomplete gamma function:
   $$
     E[T^k \mid T < t] = \beta^k
       \frac{\gamma(k/\alpha + 1,\; (t/\beta)^{\alpha})}
            {1 - e^{-(t/\beta)^{\alpha}}}.
   $$

3. **Truncated log-moment**: $E[\log T_j \mid T_j < t]$, computed via
   numerical differentiation of the incomplete gamma function.

The `kofn` package implements these as `trunc_pow_moment()` and
`trunc_log_moment_vec()`:

```{r truncated-moments-demo}
# E[T^alpha | T < t] for T ~ Weibull(alpha=1.5, beta=2.0)
alpha <- 1.5
beta <- 2.0
t_trunc <- 3.0

# Scalar interface
pow_moment <- trunc_pow_moment(k = alpha, t = t_trunc,
                               alpha = alpha, beta = beta)
cat(sprintf("E[T^%.1f | T < %.1f] = %.4f\n", alpha, t_trunc, pow_moment))
cat(sprintf("  (compare: t^alpha = %.4f)\n", t_trunc^alpha))

# Vectorized interface (used in EM inner loop)
v_max <- (t_trunc / beta)^alpha
pow_vec <- trunc_pow_moment_vec(k = alpha, v_max_vec = v_max,
                                alpha = alpha, beta = beta)
cat(sprintf("  Vectorized result: %.4f (same)\n", pow_vec))

# Truncated log-moment
log_moment <- trunc_log_moment_vec(v_max_vec = v_max,
                                   alpha = alpha, beta = beta)
cat(sprintf("E[log T | T < %.1f] = %.4f\n", t_trunc, log_moment))
cat(sprintf("  (compare: log t = %.4f)\n", log(t_trunc)))
```

### M-step: profile optimization

Given the E-step expectations, the M-step maximizes the expected
complete-data log-likelihood. The key insight is that this decomposes
into $m$ independent Weibull MLEs -- one per component.

For each component $j$:

1. **Profile over shape**: Fix $\alpha_j$ and solve for the scale in
   closed form: $\beta_j^{\alpha_j} = B_j(\alpha_j) / n$, where
   $B_j(\alpha) = \sum_i [p_{ij} \cdot t_i^{\alpha} +
   (1 - p_{ij}) \cdot E[T_j^{\alpha} \mid T_j < t_i]]$.

2. **Optimize the profile**: Maximize $Q_j(\alpha_j)$ over $\alpha_j$
   in one dimension using `optimize()`.

This avoids the joint $2m$-dimensional optimization of the direct MLE,
replacing it with $m$ one-dimensional optimizations at each EM iteration.


## EM vs direct MLE comparison

Let us compare the two estimation methods on a concrete example.

```{r em-vs-mle}
set.seed(123)

# Create models
model_em  <- kofn(k = 2, m = 2, component = dfr_weibull(), method = "em")
model_mle <- kofn(k = 2, m = 2, component = dfr_weibull(), method = "mle")

# True parameters: shapes (1.5, 2.0), scales (2.0, 3.0)
true_par <- c(1.5, 2.0, 2.0, 3.0)

# Generate data
gen <- rdata(model_em)
df <- gen(true_par, n = 50)

# Fit with EM
fit_em_fn <- fit(model_em)
result_em <- fit_em_fn(df)

# Fit with direct MLE
fit_mle_fn <- fit(model_mle)
result_mle <- fit_mle_fn(df)

# Compare
cat("True parameters:\n")
cat(sprintf("  Component 1: shape = %.2f, scale = %.2f\n", 1.5, 2.0))
cat(sprintf("  Component 2: shape = %.2f, scale = %.2f\n", 2.0, 3.0))
cat("\nEM estimates:\n")
cat(sprintf("  Component 1: shape = %.3f, scale = %.3f\n",
            result_em$shapes[1], result_em$scales[1]))
cat(sprintf("  Component 2: shape = %.3f, scale = %.3f\n",
            result_em$shapes[2], result_em$scales[2]))
cat(sprintf("  Converged: %s, Iterations: %d\n",
            result_em$converged, result_em$iterations))
cat("\nDirect MLE estimates:\n")
cat(sprintf("  Component 1: shape = %.3f, scale = %.3f\n",
            result_mle$shapes[1], result_mle$scales[1]))
cat(sprintf("  Component 2: shape = %.3f, scale = %.3f\n",
            result_mle$shapes[2], result_mle$scales[2]))
cat(sprintf("  Converged: %s\n", result_mle$converged))
```

On a single dataset, both methods may produce similar estimates. The EM's
advantage becomes apparent across many replications, especially when
shape parameters are near 1.0 or components have very different scales.


## Shape effect on estimation difficulty

```{r shape-effect-sim, eval = run_long}
# Full simulation: vary shape, compare EM vs MLE (takes several minutes).
# Set run_long <- TRUE in setup chunk to reproduce.
set.seed(2026)
alphas <- c(0.5, 1.0, 1.5, 2.0, 3.0)
R <- 200; n <- 300

for (a in alphas) {
  true_par <- c(a, 2.0, a, 3.0)
  mdl_em  <- kofn(k = 2, m = 2, component = dfr_weibull(), method = "em")
  mdl_mle <- kofn(k = 2, m = 2, component = dfr_weibull(), method = "mle")
  gen <- rdata(mdl_em)
  f_em <- fit(mdl_em); f_mle <- fit(mdl_mle)

  em_sh <- mle_sh <- matrix(NA, R, 2)
  for (r in seq_len(R)) {
    d <- gen(true_par, n = n)
    re <- tryCatch(f_em(d), error = function(e) NULL)
    rm <- tryCatch(f_mle(d), error = function(e) NULL)
    if (!is.null(re) && re$converged) em_sh[r, ] <- sort(re$shapes)
    if (!is.null(rm) && rm$converged) mle_sh[r, ] <- sort(rm$shapes)
  }
  ok_em <- complete.cases(em_sh); ok_mle <- complete.cases(mle_sh)
  cat(sprintf("alpha=%.1f: EM RMSE=%.3f, MLE RMSE=%.3f\n", a,
    sqrt(mean((em_sh[ok_em,] - a)^2)), sqrt(mean((mle_sh[ok_mle,] - a)^2))))
}
```

The table below summarizes the shape effect results from 200 Monte Carlo
replications per configuration ($n = 300$, two-component parallel system
with scales $(\beta_1, \beta_2) = (2, 3)$, common shape $\alpha$).

| $\alpha$ | EM Conv. (%) | EM Shape RMSE | EM Scale RMSE | MLE Conv. (%) | MLE Shape RMSE | MLE Scale RMSE |
|:--------:|:----------:|:----------:|:-----------:|:----------:|:-----------:|:----------:|
| 0.5 | 98 | 0.21 | 2.06 | 100 | 0.64 | 2.21 |
| 1.0 | 100 | 0.32 | 0.82 | 100 | 7.49 | 0.91 |
| 1.5 | 100 | 0.50 | 0.53 | 100 | 2.05 | 0.55 |
| 2.0 | 100 | 0.74 | 0.40 | 100 | 0.91 | 0.40 |
| 3.0 | 100 | 0.97 | 0.24 | 100 | 1.26 | 0.26 |

Two systematic patterns emerge:

1. **Shape RMSE increases with $\alpha$** (0.21 at $\alpha = 0.5$ to
   0.97 at $\alpha = 3.0$), while **scale RMSE decreases** (2.06 to
   0.24). Higher shapes concentrate the Weibull distribution, making
   scales easier to pin down but creating a steeper likelihood surface
   where the shape is harder to identify.

2. **The EM dominates in shape recovery.** At $\alpha = 1.0$ (the
   exponential boundary), the EM achieves shape RMSE = 0.32 while the
   direct MLE explodes to 7.49 -- a 24-fold difference. The MLE's shape
   estimates for the fast component are driven to extreme values by the
   flat likelihood ridge along the shape-scale confounding direction.
   Both methods achieve nearly identical scale RMSE, confirming they find
   the same likelihood optima when they converge -- the EM simply reaches
   them more reliably.


## Mixed shapes: DFR + IFR systems

Real systems contain components with heterogeneous failure mechanisms.
A motor with early-life electrical failures (DFR, $\alpha < 1$) and
mechanical wear-out (IFR, $\alpha > 1$) is a natural example. The EM
handles this shape diversity better than direct MLE.

```{r mixed-shapes-demo}
set.seed(456)

# Mixed DFR + IFR: shape (0.8, 1.5), scale (2.0, 3.0)
model_em <- kofn(k = 2, m = 2, component = dfr_weibull(), method = "em")
model_mle <- kofn(k = 2, m = 2, component = dfr_weibull(), method = "mle")

true_par <- c(0.8, 2.0, 1.5, 3.0)
gen <- rdata(model_em)
df <- gen(true_par, n = 50)

fit_em_fn  <- fit(model_em)
fit_mle_fn <- fit(model_mle)

result_em  <- fit_em_fn(df)
result_mle <- fit_mle_fn(df)

cat("Mixed DFR + IFR system: alpha = (0.8, 1.5), beta = (2.0, 3.0)\n\n")
cat("EM estimates:\n")
cat(sprintf("  Component 1: shape = %.3f (true 0.8), scale = %.3f (true 2.0)\n",
            result_em$shapes[1], result_em$scales[1]))
cat(sprintf("  Component 2: shape = %.3f (true 1.5), scale = %.3f (true 3.0)\n",
            result_em$shapes[2], result_em$scales[2]))
cat("\nDirect MLE estimates:\n")
cat(sprintf("  Component 1: shape = %.3f (true 0.8), scale = %.3f (true 2.0)\n",
            result_mle$shapes[1], result_mle$scales[1]))
cat(sprintf("  Component 2: shape = %.3f (true 1.5), scale = %.3f (true 3.0)\n",
            result_mle$shapes[2], result_mle$scales[2]))
```

```{r mixed-shapes-sim, eval = run_long}
# Full mixed-shape simulation (set run_long <- TRUE to reproduce)
set.seed(2026)
scenarios <- list(
  mild  = list(par = c(0.8, 2.0, 1.5, 3.0), label = "Mild (0.8, 1.5)"),
  strong = list(par = c(0.5, 2.0, 3.0, 3.0), label = "Strong (0.5, 3.0)"),
  three = list(par = c(0.7, 1.5, 1.5, 2.0, 2.5, 3.0),
               label = "Three-comp (0.7, 1.5, 2.5)")
)
R <- 200; n <- 300
for (sc in scenarios) {
  m <- length(sc$par) / 2
  mdl <- kofn(k = m, m = m, component = dfr_weibull(), method = "em")
  gen <- rdata(mdl); f <- fit(mdl)
  true_sh <- sc$par[seq(1, length(sc$par), by = 2)]
  errs <- numeric(0)
  for (r in seq_len(R)) {
    res <- tryCatch(f(gen(sc$par, n = n)), error = function(e) NULL)
    if (!is.null(res) && res$converged)
      errs <- c(errs, (sort(res$shapes) - sort(true_sh))^2)
  }
  cat(sprintf("%s: EM shape RMSE = %.2f\n", sc$label, sqrt(mean(errs))))
}
```

Across replications, three scenarios illustrate the pattern:

- **Mild mix** ($\alpha = (0.8, 1.5)$): Both methods converge reliably.
  Mean shape RMSE is 0.34 (EM) -- the modest shape separation makes this
  a tractable problem.

- **Strong mix** ($\alpha = (0.5, 3.0)$): One DFR component with
  unbounded hazard at $t = 0$ and one strongly IFR component that fails
  in a narrow window. This creates a bimodal posterior over which
  component failed last. Mean shape RMSE is 0.39 (EM).

- **Three-component** ($\alpha = (0.7, 1.5, 2.5)$): The fast component
  ($\alpha = 0.7$, $\beta = 1.5$) is deeply left-censored, producing
  shape RMSE of 3.60 under EM. The EM still outperforms direct MLE:
  mean shape RMSE 1.62 vs. 5.32, with the MLE's fast-component shape
  RMSE reaching 14.7 from boundary violations.


## Base R stats integration

The `kofn` package returns `fisher_mle` objects (from `likelihood.model`),
which provide standard R methods for inference. This means `coef()`,
`vcov()`, `logLik()`, `AIC()`, `confint()`, and `summary()` all work
out of the box.

```{r stats-integration}
set.seed(789)
model_em <- kofn(k = 2, m = 2, component = dfr_weibull(), method = "em")
gen <- rdata(model_em)
true_par <- c(1.5, 2.0, 2.0, 3.0)
df <- gen(true_par, n = 50)

fit_em_fn <- fit(model_em)
result <- fit_em_fn(df)

# Coefficient extraction
cat("coef():\n")
print(coef(result))

# Variance-covariance matrix (from observed Fisher information)
cat("\nvcov():\n")
print(round(vcov(result), 5))

# Log-likelihood with degrees of freedom
cat("\nlogLik():\n")
print(logLik(result))

# Model selection criteria
cat("\nAIC:", AIC(result), "\n")
cat("BIC:", BIC(result), "\n")

# Confidence intervals (Wald-type, based on Hessian)
cat("\nconfint() (95% Wald intervals):\n")
print(confint(result))

# Full summary
cat("\nsummary():\n")
print(summary(result))
```

The confidence intervals are Wald-type, based on the observed Fisher
information (negative inverse Hessian of the log-likelihood). Under
Scheme 0, the shape parameters typically have wider intervals than the
scales, reflecting the information asymmetry: the scale of the
last-failing component is well-determined by the observed system
lifetime, but the shape is harder to identify from the system CDF alone.


## Summary

The Weibull extension to parallel system estimation is not a simple
generalization of the exponential case. Three structural properties
break:

1. **No IE closed form**: the inclusion-exclusion expansion produces
   sums of different power functions, not simple exponentials.
2. **No memoryless property**: truncated moments require incomplete
   gamma functions instead of elementary formulas.
3. **Shape-scale confounding**: the likelihood surface has ridges that
   trap gradient-based optimizers.

The EM algorithm addresses these difficulties by:

- Decomposing the $2m$-dimensional optimization into $m$ independent
  one-dimensional profile optimizations.
- Guaranteeing monotone likelihood increase at every iteration.
- Using numerically stable truncated Weibull moments via the
  $U = (T/\beta)^{\alpha} \sim \text{Exp}(1)$ substitution.

The result is substantially better shape recovery than direct MLE,
particularly at $\alpha = 1.0$ (24x improvement) and for multi-component
systems with mixed shapes (3.3x improvement). Both methods achieve
similar scale RMSE -- the EM's advantage is structural, preventing the
boundary violations and parameter blow-up that plague gradient-based
optimization on the Weibull likelihood surface.

For applications requiring even better shape estimation, the package also
supports Scheme 1 (periodic inspection), which provides component-level
interval censoring data. See `vignette("scheme1")` for details.
