recor: Rearrangement Correlation Coefficient

Pearson’s \(r\) is undoubtedly the gold measure for linear dependence. Now, it might be the gold measure also for nonlinear monotone dependence, if adjusted.

Overview

recor is an R package that implements the Rearrangement Correlation Coefficient (\(r^\#\)), an adjusted version of Pearson’s correlation coefficient designed to accurately measure arbitrary monotone dependence relationships (both linear and nonlinear). Based on cutting-edge statistical research, this package addresses the underestimation problem of traditional correlation coefficients in nonlinear monotone scenarios. The rearrangement correlation is derived from a tighter inequality than the classical Cauchy-Schwarz inequality, providing sharper bounds and expanded capture range.

Features

🎯 Extended Capture Range: From linear to arbitrary monotone dependence.
📊 High Precision Measurement: More accurate strength measurement than classical coefficients.
🔄 Backward Compatibility: Reverts to Pearson’s \(r\) in linear scenarios, and to Spearman’s \(\rho\) when calculated on ranks.
🚀 Efficient Implementation: Optimized computation with C++ backend.
📈 Multiple Input Support: Automatically handles various input types (vector, matrix, data.frame) consistently with stats::cor().

Quick Start

Basic Usage

library(recor)

x <- c(1, 2, 3, 4, 5)
y <- c(2, 4, 6, 8, 10)
recor(x, y)
#> [1] 1

# Nonlinear monotone relationship
x <- c(1, 2, 3, 4, 5)
y <- c(1, 8, 27, 65, 125) # y = x^3
recor(x, y) # Higher value than Pearson's r
#> [1] 1
cor(x, y)
#> [1] 0.944458


# Matrix example
set.seed(123)
mat <- matrix(rnorm(100), ncol = 5)
colnames(mat) <- LETTERS[1:5]
recor(mat) # 5x5 correlation matrix
#>    A           B          C          D          E
#> A  1.00000000 -0.09511994 -0.1283021  0.1243721 -0.2328551
#> B -0.09511994  1.00000000  0.1022576  0.2381745  0.3780232
#> C -0.12830211  0.10225762  1.0000000 -0.1523651 -0.3603780
#> D  0.12437205  0.23817455 -0.1523651  1.0000000 -0.1289523
#> E -0.23285513  0.37802315 -0.3603780 -0.1289523  1.0000000


# Two matrices
mat1 <- matrix(rnorm(50), ncol = 5)
mat2 <- matrix(rnorm(50), ncol = 5)
recor(mat1, mat2) # 5x5 cross-correlation matrix
#>       [,1]         [,2]        [,3]        [,4]        [,5]
#> [1,]  0.0001379295  0.019273397 -0.14776094 -0.01203410  0.14712263
#> [2,] -0.0850363746  0.135125063 -0.10799623  0.35026884  0.20233183
#> [3,] -0.2825948208 -0.020383616 -0.31990514 -0.33267352 -0.48254414
#> [4,]  0.4067584970 -0.008022853  0.08223935  0.02728547  0.37567963
#> [5,]  0.5566966868 -0.059564374  0.03296252  0.22249817 -0.03009148

# data.frame
recor(iris[, 1:4])
#>                Sepal.Length Sepal.Width Petal.Length Petal.Width
#> Sepal.Length    1.0000000  -0.1210250    0.9156110   0.8445397
#> Sepal.Width    -0.1210250   1.0000000   -0.4628225  -0.3909946
#> Petal.Length    0.9156110  -0.4628225    1.0000000   0.9694665
#> Petal.Width     0.8445397  -0.3909946    0.9694665   1.0000000

Theoretical Foundation

Mathematical Definition

The rearrangement correlation coefficient is based on rearrangement inequality theorems that provide tighter bounds than the Cauchy-Schwarz inequality. Mathematically, for samples \(x\) and \(y\), it is defined as:

\({r^\# }\left( {x,y} \right) = \frac{{{s_{x,y}}}}{{\left| {{s_{{x^ \uparrow },{y^ \updownarrow }}}} \right|}}\)

Where:

\({{s_{x,y}}}\) is the sample covariance between \(x\) and \(y\)
\({{x^ \uparrow }}\) denotes the increasing rearrangement of \(x\)
\({{y^ \updownarrow }}\) denotes either \({y^ \uparrow }\) (increasing rearrangement of \(y\)) if \({{s_{x,y}}} \ge 0\), or \({y^ \downarrow }\) (decreasing rearrangement of \(y\)) if \({{s_{x,y}}} < 0\).

R Implementation

\({r^\# }\) can be computed in R as follows:

recor <- function(x, y = NULL) {
    recor_vector <- function(x, y) {
        numerator <- cov(x, y)
        if (numerator >= 0) {
            denominator <- abs(cov(
                sort(x, decreasing = FALSE),
                sort(y, decreasing = FALSE)
            ))
        } else {
            denominator <- abs(cov(
                sort(x, decreasing = FALSE),
                sort(y, decreasing = TRUE)
            ))
        }
        numerator / denominator
    }

    if (is.matrix(x) || is.data.frame(x)) {
        x <- as.matrix(x)
        if (is.null(y)) {
            p <- ncol(x)
            result <- matrix(1, nrow = p, ncol = p)
            rownames(result) <- colnames(result) <- colnames(x)

            for (i in 1:p) {
                for (j in 1:p) {
                    if (i != j) {
                        result[i, j] <- result[j, i] <- recor_vector(x[, i], x[, j])
                    }
                }
            }
            return(result)
        } else if (is.matrix(y) || is.data.frame(y)) {
            y <- as.matrix(y)
            if (nrow(x) != nrow(y)) {
                stop("The number of rows of x and y must be the same")
            }

            p <- ncol(x)
            q <- ncol(y)
            result <- matrix(0, nrow = p, ncol = q)
            rownames(result) <- colnames(x)
            colnames(result) <- colnames(y)

            for (i in 1:p) {
                for (j in 1:q) {
                    result[i, j] <- recor_vector(x[, i], y[, j])
                }
            }
            return(result)
        }
    }

    if (is.null(y)) {
        stop("y is needed when x is a vector")
    }

    if (length(x) != length(y)) {
        stop("x and y must have the same length")
    }

    if (length(x) < 2) {
        stop("x and y must have at least two elements")
    }

    recor_vector(x, y)
}

It is to be noted that the above R implementation is for illustrative purposes only. The actual recor package employs a highly optimized C++ backend to ensure efficient computation.

Intuitive Example

Do we need a new monotone measure given that rank-based measures such as Spearman’s \(\rho\) can already measure monotone dependence? The answser is YES in sense that r# has a higher resolution and is more accurate. To take a simple example, let \(x = (4, 3, 2, 1)\) and

\(y_1 = (5, 4, 3, 2)\)
\(y_2 = (5, 4, 3, 3.25)\)
\(y_3 = (5, 4, 3, 3.50)\)
\(y_4 = (5, 4, 3, 3.75)\)
\(y_5 = (5, 4, 3, 4.50)\)

Obviously, \(y_1\) and \(x\) behaves exactly in the same way, with their values getting small and small step by step. The behavior of \(y_2, y_3, y_4\) and \(y_5\) are becoming more and more different from that of \(x\). However, the \(\rho\) values are all the same for \(y_2, y_3, y_4\). In contrast, the \(r^\#\) values can reveal all these differences exactly.

x <- c(4, 3, 2, 1) 

y_list <- list(y1 = c(5, 4, 3, 2.00),
               y2 = c(5, 4, 3, 3.25),
               y3 = c(5, 4, 3, 3.50),
               y4 = c(5, 4, 3, 3.75),
               y5 = c(5, 4, 3, 4.50))

# recor
lapply(y_list, recor, x)
#> $y1
#> [1] 1
#> 
#> $y2
#> [1] 0.9259259
#> 
#> $y3
#> [1] 0.8461538
#> 
#> $y4
#> [1] 0.76
#> 
#> $y5
#> [1] 0.3846154

#cor
lapply(y_list, cor, x, method = "spearman")
#> $y1
#> [1] 1
#> 
#> $y2
#> [1] 0.8
#> 
#> $y3
#> [1] 0.8
#> 
#> $y4
#> [1] 0.8
#> 
#> $y5
#> [1] 0.4

References

Ai, X. (2024). Adjust Pearson’s r to Measure Arbitrary Monotone Dependence. In Advances in Neural Information Processing Systems (Vol. 37, pp. 37385-37407).

License

This project is licensed under GPL-3.

Support & Feedback

📧 Email Support: axb@bupt.edu.cn
🐛 Issue Reporting: GitHub Issues
📚 Documentation: Complete Documentation

Citation

If you use this package in your research, please cite our work as:

@inproceedings{NEURIPS2024_41c38a83,
 author = {Ai, Xinbo},
 booktitle = {Advances in Neural Information Processing Systems},
 editor = {A. Globerson and L. Mackey and D. Belgrave and A. Fan and U. Paquet and J. Tomczak and C. Zhang},
 pages = {37385--37407},
 publisher = {Curran Associates, Inc.},
 title = {Adjust Pearson\textquotesingle s r to Measure Arbitrary Monotone Dependence},
 url = {https://proceedings.neurips.cc/paper_files/paper/2024/file/41c38a83bd97ba28505b4def82676ba5-Paper-Conference.pdf},
 volume = {37},
 year = {2024}
}

recor: Making Correlation Measurement More Accurate