% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/Kmeans.LCA.R
\name{Kmeans.LCA}
\alias{Kmeans.LCA}
\title{Initialize LCA Parameters via K-means Clustering}
\usage{
Kmeans.LCA(response, L, nrep = 10)
}
\arguments{
\item{response}{A numeric matrix of dimension \eqn{N \times I}, where \eqn{N} is the number of observations
and \eqn{I} is the number of observed categorical variables. Each column must contain nominal-scale
discrete responses (e.g., integers representing categories). Non-sequential category values are
automatically re-encoded to sequential integers starting from 1.}

\item{L}{Integer specifying the number of latent classes. Must be \eqn{2 \leq L < N}.}

\item{nrep}{Integer specifying the number of random starts for K-means algorithm
(default: 10). The solution with the lowest within-cluster sum of squares is retained.}
}
\value{
A list containing:
\describe{
\item{\code{params}}{List of initialized parameters:
\describe{
\item{\code{par}}{An \eqn{L \times I \times K_{\max}} array of initial conditional probabilities,
where \eqn{K_{\max}} is the maximum number of categories across items.
Dimension order: latent classes (1:L), items (1:I), response categories (1:K_max).}
\item{\code{P.Z}}{Numeric vector of length \eqn{L} containing initial class prior probabilities
derived from cluster proportions.}
}
}
\item{\code{P.Z.Xn}}{An \eqn{N \times L} matrix of posterior class probabilities. Contains
hard assignments (0/1 values) based on K-means cluster memberships.}
}
}
\description{
Performs hard clustering of observations using K-means algorithm to generate
initial parameter estimates for Latent Class Analysis (LCA) models. This
provides a data-driven initialization strategy that often outperforms random
starts when the number of observed categorical variables \eqn{I} is large
(i.e., \eqn{I > 50}).
}
\details{
The function executes the following steps:
\itemize{
\item Data preprocessing: Automatically adjusts non-sequential category values
to sequential integers (e.g., categories \{1,3,5\} become \{1,2,3\}) using internal adjustment routines.
\item K-means clustering: Scales variables to mean=0 and SD=1 before clustering.
Uses Lloyd's algorithm with Euclidean distance.
\item Parameter estimation:
\itemize{
\item For each cluster \eqn{l}, computes empirical response probabilities
\eqn{P(X_i=k|Z=l)} for all items \eqn{i} and categories \eqn{k}.
\item Handles singleton clusters by assigning near-deterministic probabilities
(e.g., \eqn{1-10^{-10}} for observed category, \eqn{10^{-10}} for others).
}
\item Posterior probabilities: Constructs hard-classification matrix where
\eqn{P(Z=l|\mathbf{X}_n)=1} for the assigned cluster and 0 otherwise.
}
}
\note{
\itemize{
\item Requires at least one observation per cluster. If a cluster has only one observation,
probabilities are set to avoid zero values (using \eqn{10^{-10}}) for numerical stability.
\item Data scaling is applied internally. Variables with zero variance are automatically
excluded from clustering.
\item This function is primarily designed as an initialization method for \code{\link{LCA}} and not
intended for final model estimation.
}
}
\examples{
# Simulate response data
set.seed(123)
response <- matrix(sample(1:4, 200, replace = TRUE), ncol = 5)

# Generate K-means initialization for 3-class LCA
init_params <- Kmeans.LCA(response, L = 3, nrep = 5)

# Inspect initial class probabilities
print(init_params$params$P.Z)
}
