The moderncor_cat() function provides a unified
interface for computing association measures between categorical
(factor) variables. All measures require the DescTools
package.
moderncor_cat() accepts two factor (or
character/numeric-as-categorical) vectors:
set.seed(42)
x <- factor(sample(c("A", "B", "C"), 100, replace = TRUE))
y <- factor(sample(c("X", "Y"), 100, replace = TRUE))
moderncor_cat(x, y, method = "cramers_v")
#>
#> Cramer's V
#>
#> Estimate: 0.0173
#> Statistic: 0.03
#> P-value: 0.9851
#> Sample size (n): 100The output is an S3 object of class "moderncor_cat" with
the same structure as moderncor() output:
$estimate: the association coefficient$statistic: the chi-square test statistic (for nominal
methods)$p.value: the p-value (for nominal methods;
NULL for ordinal methods)$n: the sample size$method_label: human-readable method nameavailable_methods_cat()
#> method label package type
#> 1 cramers_v Cramer's V DescTools nominal
#> 2 phi Phi Coefficient DescTools nominal
#> 3 gamma Goodman-Kruskal Gamma DescTools ordinal
#> 4 somers_d Somers' D DescTools ordinal
#> 5 contingency Contingency Coefficient DescTools nominal
#> 6 tschuprow Tschuprow's T DescTools nominalMethods fall into two categories:
Nominal measures are appropriate when categories have no natural ordering. They are all based on the chi-square statistic and return a p-value.
Cramér’s V is the most widely used measure of nominal association. It ranges from 0 (no association) to 1 (perfect association) and is symmetric:
moderncor_cat(x, y, method = "cramers_v")
#>
#> Cramer's V
#>
#> Estimate: 0.0173
#> Statistic: 0.03
#> P-value: 0.9851
#> Sample size (n): 100For a 2×2 table, Cramér’s V equals the absolute value of the Phi coefficient.
The Phi coefficient is designed for 2×2 contingency tables. For larger tables it can exceed 1, so prefer Cramér’s V in that case:
The contingency coefficient (Pearson’s C) is bounded between 0 and \(\sqrt{(k-1)/k}\) where \(k\) is the number of categories, so it is not comparable across tables of different sizes:
Tschuprow’s T is similar to Cramér’s V but uses the geometric mean of the marginal category counts as its normalizer. It is symmetric and ranges from 0 to 1:
Ordinal measures are appropriate when categories have a natural ordering (e.g., Likert scales, severity grades). They do not return p-values by default.
Goodman-Kruskal Gamma (\(\gamma\)) measures the tendency for pairs of observations to be concordant (both variables increase together) vs. discordant. It ranges from −1 to 1 and is symmetric:
# Simulate ordinal survey data
set.seed(1)
quality <- factor(sample(c("Low", "Medium", "High"), 100, replace = TRUE,
prob = c(0.3, 0.4, 0.3)),
levels = c("Low", "Medium", "High"), ordered = TRUE)
satisfaction <- factor(sample(c("Dissatisfied", "Neutral", "Satisfied"), 100,
replace = TRUE, prob = c(0.3, 0.4, 0.3)),
levels = c("Dissatisfied", "Neutral", "Satisfied"), ordered = TRUE)
moderncor_cat(quality, satisfaction, method = "gamma")
#>
#> Goodman-Kruskal Gamma
#>
#> Estimate: 0.0808
#> Sample size (n): 100Somers’ D is an asymmetric ordinal measure: it measures the
predictability of y from x (but not vice
versa). Values range from −1 to 1:
moderncor_cat(quality, satisfaction, method = "somers_d")
#>
#> Somers' D
#>
#> Estimate: 0.0548
#> Sample size (n): 100Note that swapping x and y gives a
different result:
Pass a data.frame of factor columns to compute pairwise
associations across all pairs:
df <- data.frame(
cyl = factor(mtcars$cyl),
gear = factor(mtcars$gear),
am = factor(mtcars$am)
)
res_mat <- moderncor_cat(df, method = "cramers_v")
res_mat
#>
#> Cramer's V
#>
#> Association Matrix (n = 32):
#>
#> cyl gear am
#> cyl 1.0000 0.5309 0.5226
#> gear 0.5309 1.0000 0.8090
#> am 0.5226 0.8090 1.0000
#>
#> P-value Matrix:
#>
#> cyl gear am
#> cyl 0.0000 0.0012 0.0126
#> gear 0.0012 0.0000 0.0000
#> am 0.0126 0.0000 0.0000The result is a matrix of association coefficients. For nominal
methods, the associated p-value matrix is also stored in
$p.value:
res_mat$p.value
#> cyl gear am
#> cyl 0.000000000 1.214066e-03 1.264661e-02
#> gear 0.001214066 0.000000e+00 2.830889e-05
#> am 0.012646605 2.830889e-05 0.000000e+00Use as.data.frame() to convert to tidy format:
The use argument controls how missing values are
handled, mirroring the interface of moderncor():
"complete.obs" (default): remove all rows with any NA
before computing"pairwise.complete.obs": remove NAs per pair"everything": propagate NAs (returns NA for any pair
with missing values)| Situation | Recommended method |
|---|---|
| Two unordered categorical variables (general) | cramers_v |
| Two binary variables (2×2 table) | phi |
| Two ordered categorical (Likert) variables | gamma |
| Predicting one ordered variable from another | somers_d |
| Comparing association across different table sizes | cramers_v or tschuprow |
For continuous variables, use moderncor() instead. See
vignette("introduction") for a full overview.