This is an example of exploratory LCA with ordinal indicators using
`tidySEM`

, as explained in Van Lissa, C. J.,
Garnier-Villarreal, M., & Anadria, D. (2023). *Recommended
Practices in Latent Class Analysis using the Open-Source R-Package
tidySEM.* Structural Equation Modeling. https://doi.org/10.1080/10705511.2023.2250920. The
present example uses synthetic data based on a study by Maene and
colleagues. In a convenience sample of Flemish (Belgian) high-school
students with a migration background, this study set out to identify
distinct classes based on ordinal indicators of National, Regional, and
Heritage identity. Sample size was not justified.

The approach to class enumeration was semi-theory driven: The researchers expected to find profiles that were distinct on all three types of identity (national, regional, and heritage) - but the exact number of classes was not pre-specified (hypothesis 1).

Hypothesis 2 stated that adolescents who are nationally integrated would have lower depressive feelings than students from students with other combinations of identifications (hypothesis 2). Hypothesis 3 was that, for assimilated and separated adolescents, there would not be a significant effect of perceived teacher discrimination on depressive symptoms.

Use the command `?tidySEM::maene_identity`

to view the
data documentation.

To load the data, simply attach the `tidySEM`

package. For
convenience, we assign the indicator data to an object called
`df`

:

We use `tidySEM::descriptives()`

to describe the data
numerically. Because all items are categorical, we remove columns for
continuous data to de-clutter the table:

```
desc <- tidySEM::descriptives(df)
desc <- desc[, c("name", "type", "n", "missing", "unique", "mode",
"mode_value", "v")]
desc
```

Additionally, we can plot the data. The `ggplot2`

function
`geom_bar()`

is useful for ordinal data:

```
df_plot <- df
names(df_plot) <- paste0("Value.", names(df_plot))
df_plot <- reshape(df_plot, varying = names(df_plot), direction = "long")
ggplot(df_plot, aes(x = Value)) + geom_bar() + facet_wrap(~time,
scales = "free") + theme_bw()
```

As we can see, the `descriptives()`

table provides
invaluable information about the measurement level of the indicators,
which is used to specify the model correctly. If these data had not been
coded as ordinal factors, these descriptive statistics would have
revealed that each variable has only 5-10 unique values. The proportion
of missing values is reported in the `"missing"`

column. If
any variables had missing values, we would report an MCAR test with
`mice::mcar()`

and explain that missing data are accounted
for using FIML. In our example, we see that there are no missing values,
hence we proceed with our analysis. Note that the ethic identification
variables are very right-skewed and some response categories have
near-zero frequencies; this can cause problems in model specification
and convergence. One potential solution would be to merge small adjacent
categories. We will not do this here, however.

Before we fit a series of LCA models, we set a random seed using
`set.seed()`

. This is important because there is some
inherent randomness in the estimation procedure, and using a seed
ensures that we (and others) can exactly reproduce the results.

Next, we fit the LCA models. As all variables are ordinal, we can use
the convenience function `tidySEM::mx_lca()`

, which is a
wrapper for the generic function `mx_mixture()`

optimized for
LCA with ordinal data. Any mixture model can be specified through
`mx_mixture()`

. At the time of writing, there are two other
wrapper functions for special cases: `mx_profiles()`

, for
latent profile analysis, and `mx_growth_mixture()`

, for
latent growth analysis and growth mixture models. All of these functions
have arguments `data`

and number of `classes`

. All
variables in `data`

are included in the analysis, so relevant
variables must be selected first.

We here consider 1-6 class models, but note that this may be overfit, as some of the indicators have only 5 response categories.

In class enumeration, we want to compare a sequence of LCA models
fitted to the data. First, note that all models converged without
issues. If this had not been the case, it is possible to aid convergence
using `mxTryHardOrdinal()`

, which expands the search for
optimal parameter values for models with ordinal indicators. It is part
of the family of functions based on `mxTryHard()`

.

Next, we create a model fit table using `table_fit()`

and
retain relevant columns. We also determine whether any models can be
disqualified.

```
fit <- table_fit(res)
fit[, c("Name", "LL", "n", "Parameters", "BIC", "Entropy", "prob_min",
"n_min", "np_ratio", "np_local")]
```

Note that both the global and local ratio of cases to parameters is
low; for models of 3 or more classes, there are just a few observations
per parameter in the smallest class (see `np_local`

). This is
a good reason to disqualify those classes. We will eliminate classes 4-6
on those criteria, but in real data applications, the 3-class solution
might also be disqualified.

In terms of classification diagnostics, note that the entropy and the minimum classification probabilities are high for all models. This indicates that all classes are distinct.

Based on the BIC, we would prefer the 2-class model. This decision is
also validated by a scree plot of the BIC, obtained by running
`plot(fit)`

.

Despite the BIC indicating that the 2-class model is best, upon examining the 2-class and 3-class solution, it was noted that theoretically crucial distinctions between ethnic, regional (Flemish), and national (Belgian) identity were not captured by the 2-class solution but were captured by the 3-class solution. Based on this theoretical consideration, the analysis proceeded with the 3-class solution.

The 3-class model yielded classes of reasonable size; the largest
class comprised 33%, and the smallest comprised 16% of cases. The
entropy was high, \(S = .93\),
indicating good class separability. Furthermore, the posterior
classification probability ranged from \([.94,
.99]\), which means that all classes had low classification
error. We can produce a table of results using
`table_results(res_final)`

. However, the results are
thresholds, indicating quantiles of a standardized normal distribution.
These may be difficult to interpret. Therefore, we ask for the results
in probability scale:

```
tab <- table_prob(res_final)
reshape(tab, direction = "wide", v.names = "Probability", timevar = "group",
idvar = c("Variable", "Category"))
```

The results can also be interpreted by plotting the response probabilities:

Note that the first class (33%) has relatively high identification with the ethnic indicators and relatively low identification with Belgian and Flemish identity. The second class (16%) has moderate identification with Belgian and Flemish identity, and relatively low identification with ethnic identity. Finally, the third class (50%) has high identification with all identities.

Based on the probability plot, we can label class 1 as *ethic
identifiers*, class 2 as *low identifiers*, and class 3 as
*high identifiers*.

To address the remaining two hypotheses, we will perform auxiliary analyses. Hypothesis 2 stated that adolescents who are nationally integrated would have lower depressive feelings than students from students with other combinations of identifications (hypothesis 2).

To test this hypothesis, we can call the BCH function and supply the
auxiliary variable depression to the `data`

argument,
omitting the `model`

argument. Below, we estimate an
auxiliary model to compare depressive symptoms across classes:

To obtain an omnibus likelihood ratio test of the significance of
depression differences across classes, as well as pairwise comparisons
between classes, use `lr_test(aux_dep)`

. The results indicate
that there are no significant differences in depression across classes,
\(\Delta LL(5) = 4.32, p = .50\).

Hypothesis 3 was that, for assimilated and separated adolescents, there would not be a significant effect of perceived teacher discrimination on depressive symptoms. To test this hypothesis, we will compare the regression coefficient of discrimination on depressive symptoms across classes.

```
df_aux <- maene_identity[, c("vict_teacher", "depression")]
# Dummy-code vict_teacher
df_aux$vict_teacher <- (as.integer(df_aux$vict_teacher) - 1)
aux_model <- BCH(res_final, model = "depression ~ vict_teacher",
data = df_aux)
```

To view the coefficients of this model, we can use either
`coef(aux_model)`

or
`table_results(aux_model, columns = NULL)`

. As evident from
the results table, the coefficients labeled `class1.A[1,2]`

are the regression coefficients.

There are two ways to test the difference in regression coefficients
across classes: using `lr_test(aux_model, compare = "A")`

, to
compare the ‘A matrix’ (regression coefficients) across classes, or
`wald_test(aux_model, "class1.A[1,2]=class2.A[1,2]&class1.A[1,2]=class3.A[1,2]")`

.
The results indicate that there are no significant differences in the
regression coefficients across classes, \(\chi^2 (2) = 1.16, p = .56\).