Pearson’s \(r\) is undoubtedly the gold measure for linear dependence. Now, it might be the gold measure also for nonlinear monotone dependence, if adjusted.
recor is an R package that implements the Rearrangement Correlation Coefficient (\(r^\#\)), an adjusted version of Pearson’s correlation coefficient designed to accurately measure arbitrary monotone dependence relationships (both linear and nonlinear). Based on cutting-edge statistical research, this package addresses the underestimation problem of traditional correlation coefficients in nonlinear monotone scenarios. The rearrangement correlation is derived from a tighter inequality than the classical Cauchy-Schwarz inequality, providing sharper bounds and expanded capture range.
stats::cor().library(recor)
x <- c(1, 2, 3, 4, 5)
y <- c(2, 4, 6, 8, 10)
recor(x, y)
#> [1] 1
# Nonlinear monotone relationship
x <- c(1, 2, 3, 4, 5)
y <- c(1, 8, 27, 65, 125) # y = x^3
recor(x, y) # Higher value than Pearson's r
#> [1] 1
cor(x, y)
#> [1] 0.944458
# Matrix example
set.seed(123)
mat <- matrix(rnorm(100), ncol = 5)
colnames(mat) <- LETTERS[1:5]
recor(mat) # 5x5 correlation matrix
#> A B C D E
#> A 1.00000000 -0.09511994 -0.1283021 0.1243721 -0.2328551
#> B -0.09511994 1.00000000 0.1022576 0.2381745 0.3780232
#> C -0.12830211 0.10225762 1.0000000 -0.1523651 -0.3603780
#> D 0.12437205 0.23817455 -0.1523651 1.0000000 -0.1289523
#> E -0.23285513 0.37802315 -0.3603780 -0.1289523 1.0000000
# Two matrices
mat1 <- matrix(rnorm(50), ncol = 5)
mat2 <- matrix(rnorm(50), ncol = 5)
recor(mat1, mat2) # 5x5 cross-correlation matrix
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] 0.0001379295 0.019273397 -0.14776094 -0.01203410 0.14712263
#> [2,] -0.0850363746 0.135125063 -0.10799623 0.35026884 0.20233183
#> [3,] -0.2825948208 -0.020383616 -0.31990514 -0.33267352 -0.48254414
#> [4,] 0.4067584970 -0.008022853 0.08223935 0.02728547 0.37567963
#> [5,] 0.5566966868 -0.059564374 0.03296252 0.22249817 -0.03009148
# data.frame
recor(iris[, 1:4])
#> Sepal.Length Sepal.Width Petal.Length Petal.Width
#> Sepal.Length 1.0000000 -0.1210250 0.9156110 0.8445397
#> Sepal.Width -0.1210250 1.0000000 -0.4628225 -0.3909946
#> Petal.Length 0.9156110 -0.4628225 1.0000000 0.9694665
#> Petal.Width 0.8445397 -0.3909946 0.9694665 1.0000000The rearrangement correlation coefficient is based on rearrangement inequality theorems that provide tighter bounds than the Cauchy-Schwarz inequality. Mathematically, for samples \(x\) and \(y\), it is defined as:
\({r^\# }\left( {x,y} \right) = \frac{{{s_{x,y}}}}{{\left| {{s_{{x^ \uparrow },{y^ \updownarrow }}}} \right|}}\)
Where:
\({r^\# }\) can be computed in R as follows:
recor <- function(x, y = NULL) {
recor_vector <- function(x, y) {
numerator <- cov(x, y)
if (numerator >= 0) {
denominator <- abs(cov(
sort(x, decreasing = FALSE),
sort(y, decreasing = FALSE)
))
} else {
denominator <- abs(cov(
sort(x, decreasing = FALSE),
sort(y, decreasing = TRUE)
))
}
numerator / denominator
}
if (is.matrix(x) || is.data.frame(x)) {
x <- as.matrix(x)
if (is.null(y)) {
p <- ncol(x)
result <- matrix(1, nrow = p, ncol = p)
rownames(result) <- colnames(result) <- colnames(x)
for (i in 1:p) {
for (j in 1:p) {
if (i != j) {
result[i, j] <- result[j, i] <- recor_vector(x[, i], x[, j])
}
}
}
return(result)
} else if (is.matrix(y) || is.data.frame(y)) {
y <- as.matrix(y)
if (nrow(x) != nrow(y)) {
stop("The number of rows of x and y must be the same")
}
p <- ncol(x)
q <- ncol(y)
result <- matrix(0, nrow = p, ncol = q)
rownames(result) <- colnames(x)
colnames(result) <- colnames(y)
for (i in 1:p) {
for (j in 1:q) {
result[i, j] <- recor_vector(x[, i], y[, j])
}
}
return(result)
}
}
if (is.null(y)) {
stop("y is needed when x is a vector")
}
if (length(x) != length(y)) {
stop("x and y must have the same length")
}
if (length(x) < 2) {
stop("x and y must have at least two elements")
}
recor_vector(x, y)
}It is to be noted that the above R implementation is for illustrative purposes only. The actual recor package employs a highly optimized C++ backend to ensure efficient computation.
Do we need a new monotone measure given that rank-based measures such as Spearman’s \(\rho\) can already measure monotone dependence? The answser is YES in sense that r# has a higher resolution and is more accurate. To take a simple example, let \(x = (4, 3, 2, 1)\) and
Obviously, \(y_1\) and \(x\) behaves exactly in the same way, with their values getting small and small step by step. The behavior of \(y_2, y_3, y_4\) and \(y_5\) are becoming more and more different from that of \(x\). However, the \(\rho\) values are all the same for \(y_2, y_3, y_4\). In contrast, the \(r^\#\) values can reveal all these differences exactly.
x <- c(4, 3, 2, 1)
y_list <- list(y1 = c(5, 4, 3, 2.00),
y2 = c(5, 4, 3, 3.25),
y3 = c(5, 4, 3, 3.50),
y4 = c(5, 4, 3, 3.75),
y5 = c(5, 4, 3, 4.50))
# recor
lapply(y_list, recor, x)
#> $y1
#> [1] 1
#>
#> $y2
#> [1] 0.9259259
#>
#> $y3
#> [1] 0.8461538
#>
#> $y4
#> [1] 0.76
#>
#> $y5
#> [1] 0.3846154
#cor
lapply(y_list, cor, x, method = "spearman")
#> $y1
#> [1] 1
#>
#> $y2
#> [1] 0.8
#>
#> $y3
#> [1] 0.8
#>
#> $y4
#> [1] 0.8
#>
#> $y5
#> [1] 0.4Ai, X. (2024). Adjust Pearson’s r to Measure Arbitrary Monotone Dependence. In Advances in Neural Information Processing Systems (Vol. 37, pp. 37385-37407).
This project is licensed under GPL-3.
If you use this package in your research, please cite our work as:
@inproceedings{NEURIPS2024_41c38a83,
author = {Ai, Xinbo},
booktitle = {Advances in Neural Information Processing Systems},
editor = {A. Globerson and L. Mackey and D. Belgrave and A. Fan and U. Paquet and J. Tomczak and C. Zhang},
pages = {37385--37407},
publisher = {Curran Associates, Inc.},
title = {Adjust Pearson\textquotesingle s r to Measure Arbitrary Monotone Dependence},
url = {https://proceedings.neurips.cc/paper_files/paper/2024/file/41c38a83bd97ba28505b4def82676ba5-Paper-Conference.pdf},
volume = {37},
year = {2024}
}recor: Making Correlation Measurement More Accurate