The moderncor package provides a single unified
interface for computing a wide variety of classical and modern
correlation coefficients. This guide introduces the core features of the
package.
Let’s generate some synthetic data with a non-linear parabolic relationship where \(y = x^2 + \epsilon\):
Because the relationship is non-linear and symmetric, classical Pearson correlation will fail to capture the dependence:
moderncor(x, y, method = "pearson")
#>
#> Pearson Product-Moment Correlation
#>
#> Estimate: 0.0168
#> Statistic: 0.1667
#> P-value: 0.868
#> Sample size (n): 100With moderncor, you can compute distance correlation
(dcor) or Chatterjee’s Xi correlation (xi)
using the same interface to capture the non-linear relationship:
moderncor supports Pearson, Spearman, and Kendall
correlations via the same interface as base R cor():
moderncor(x, y, method = "spearman")
#>
#> Spearman Rank Correlation
#>
#> Estimate: -0.0105
#> Statistic: 168404
#> P-value: 0.9171
#> Sample size (n): 100
moderncor(x, y, method = "kendall")
#>
#> Kendall Rank Correlation
#>
#> Estimate: -0.0129
#> Statistic: -0.1906
#> P-value: 0.8488
#> Sample size (n): 100If you pass a matrix or a data.frame to
moderncor(), it will compute the pairwise correlation
matrix of the columns:
# Compute Spearman correlation matrix for iris dataset
res_mat <- moderncor(iris[, 1:4], method = "spearman")
#> Warning in cor.test.default(x, y, method = "spearman", alternative =
#> alternative, : cannot compute exact p-value with ties
#> Warning in cor.test.default(x, y, method = "spearman", alternative =
#> alternative, : cannot compute exact p-value with ties
#> Warning in cor.test.default(x, y, method = "spearman", alternative =
#> alternative, : cannot compute exact p-value with ties
#> Warning in cor.test.default(x, y, method = "spearman", alternative =
#> alternative, : cannot compute exact p-value with ties
#> Warning in cor.test.default(x, y, method = "spearman", alternative =
#> alternative, : cannot compute exact p-value with ties
#> Warning in cor.test.default(x, y, method = "spearman", alternative =
#> alternative, : cannot compute exact p-value with ties
res_mat
#>
#> Spearman Rank Correlation
#>
#> Correlation Matrix (n = 150):
#>
#> Sepal.Length Sepal.Width Petal.Length Petal.Width
#> Sepal.Length 1.0000 -0.1668 0.8819 0.8343
#> Sepal.Width -0.1668 1.0000 -0.3096 -0.2890
#> Petal.Length 0.8819 -0.3096 1.0000 0.9377
#> Petal.Width 0.8343 -0.2890 0.9377 1.0000
#>
#> P-value Matrix:
#>
#> Sepal.Length Sepal.Width Petal.Length Petal.Width
#> Sepal.Length 0.0000 0.0414 0e+00 0e+00
#> Sepal.Width 0.0414 0.0000 1e-04 3e-04
#> Petal.Length 0.0000 0.0001 0e+00 0e+00
#> Petal.Width 0.0000 0.0003 0e+00 0e+00as.data.frameYou can convert the output of moderncor() to a tidy data
frame using as.data.frame(). This is particularly useful
for correlation matrices:
# Convert correlation matrix to tidy data frame
df <- as.data.frame(res_mat)
head(df)
#> var1 var2 r p.value
#> 1 Sepal.Width Sepal.Length -0.1667777 4.136799e-02
#> 2 Petal.Length Sepal.Length 0.8818981 3.443087e-50
#> 3 Petal.Width Sepal.Length 0.8342888 4.189447e-40
#> 4 Sepal.Length Sepal.Width -0.1667777 4.136799e-02
#> 5 Petal.Length Sepal.Width -0.3096351 1.153938e-04
#> 6 Petal.Width Sepal.Width -0.2890317 3.342981e-04This returns a data frame containing the variables being compared
(var1 and var2), the correlation coefficient
(r), and p-values (p.value) if they were
calculated.
For large datasets, calculating p-values for modern methods (such as
MIC, HSIC, or Mutual Information) can be slow because they rely on
permutation tests. You can disable p-value calculations by setting
p_value = FALSE for a significant speedup:
Robust correlations are less sensitive to outliers than classical
methods. moderncor provides three robust correlation
methods.
Biweight midcorrelation down-weights observations far from the median using a biweight function. It requires no additional packages:
set.seed(42)
x_out <- c(rnorm(95), rnorm(5, mean = 10)) # 5% outliers
y_out <- c(rnorm(95), rnorm(5, mean = 10))
moderncor(x_out, y_out, method = "biweight")
#>
#> Biweight Midcorrelation
#>
#> Estimate: 0.1045
#> Statistic: 1.0405
#> P-value: 0.3007
#> Sample size (n): 100Compare with Pearson, which is strongly influenced by outliers:
Percentage bend correlation trims a specified proportion of the most
extreme values (requires the WRS2 package):
Ordinal correlations are designed for ordered categorical (Likert-scale) data. They model the data as discretized versions of underlying continuous normal distributions.
Polychoric correlation is appropriate when both variables are ordinal
with more than two categories (requires psych):
# Simulate ordinal data (e.g., Likert scale responses)
set.seed(1)
z1 <- rnorm(200)
z2 <- 0.7 * z1 + rnorm(200, sd = sqrt(1 - 0.7^2))
x_ord <- cut(z1, breaks = c(-Inf, -1, 0, 1, Inf), labels = FALSE)
y_ord <- cut(z2, breaks = c(-Inf, -1, 0, 1, Inf), labels = FALSE)
moderncor(x_ord, y_ord, method = "polychoric")
#>
#> Polychoric Correlation
#>
#> Estimate: 0.6411
#> Sample size (n): 200Tetrachoric correlation is the special case of polychoric for binary
(0/1) data (requires psych):
Partial and semi-partial correlations measure the relationship
between two variables while controlling for one or more confounding
variables (requires ppcor).
Partial correlation removes the influence of z from
both x and y:
set.seed(7)
z <- rnorm(100)
x_p <- 0.6 * z + rnorm(100, sd = 0.8) # x correlates with z
y_p <- 0.6 * z + rnorm(100, sd = 0.8) # y correlates with z
# Raw correlation (inflated by shared z)
moderncor(x_p, y_p, method = "pearson")
#>
#> Pearson Product-Moment Correlation
#>
#> Estimate: 0.4122
#> Statistic: 4.4794
#> P-value: 2.028e-05
#> Sample size (n): 100Semi-partial correlation removes the influence of z from
y only (also requires ppcor):
moderncor(x_p, y_p, method = "semi_partial", z = z)
#>
#> Semi-partial Correlation (Pearson)
#>
#> Estimate: 0.1097
#> Statistic: 1.0867
#> P-value: 0.2799
#> Sample size (n): 100The method_partial argument selects which base
correlation to use ("pearson", "spearman", or
"kendall"):
Ball correlation is a nonparametric measure of dependence based on
ball covariance (requires Ball):
Bergsma-Dassios \(\tau^*\) is a
nonparametric measure of association that equals zero if and only if
x and y are independent (requires
TauStar):
To see all supported correlation methods and their required packages:
available_methods()
#> method label package
#> 1 pearson Pearson Product-Moment Correlation stats
#> 2 spearman Spearman Rank Correlation stats
#> 3 kendall Kendall Rank Correlation stats
#> 4 dcor Distance Correlation energy
#> 5 mic Maximal Information Coefficient (MIC) minerva
#> 6 hsic Hilbert-Schmidt Independence Criterion (HSIC) dHSIC
#> 7 xi Chatterjee's Xi Correlation XICOR
#> 8 hoeffding Hoeffding's D Hmisc
#> 9 mutual_info Mutual Information infotheo
#> 10 biweight Biweight Midcorrelation built-in
#> 11 percentage_bend Percentage Bend Correlation WRS2
#> 12 winsorized Winsorized Correlation WRS2
#> 13 polychoric Polychoric Correlation psych
#> 14 tetrachoric Tetrachoric Correlation psych
#> 15 partial Partial Correlation ppcor
#> 16 semi_partial Semi-partial Correlation ppcor
#> 17 ball Ball Correlation Ball
#> 18 tau_star Bergsma-Dassios Tau* TauStar
#> type
#> 1 classic
#> 2 classic
#> 3 classic
#> 4 modern
#> 5 modern
#> 6 modern
#> 7 modern
#> 8 modern
#> 9 information
#> 10 robust
#> 11 robust
#> 12 robust
#> 13 ordinal
#> 14 ordinal
#> 15 partial
#> 16 partial
#> 17 other
#> 18 otherTo get details on a specific method:
method_info("dcor")
#> $method
#> [1] "dcor"
#>
#> $label
#> [1] "Distance Correlation"
#>
#> $package
#> [1] "energy"
#>
#> $description
#> [1] "Measures both linear and nonlinear dependence. Zero if and only if independent."
#>
#> $range
#> [1] "[0, 1]"
#>
#> $assumptions
#> [1] "Continuous variables."For categorical variables (factors or contingency tables), use
moderncor_cat(). See vignette("categorical")
for a full introduction to categorical association measures.