# Correlations

#### 2023-09-14

TOSTER has a few different functions to calculate correlations. All the included functions are based on a few papers by Goertzen and Cribbie (2010) (z_cor_test & compare_cor), and Wilcox (2011) (boot_cor_test)1.

# Simple Correlation Test

Simple tests of association can be accomplished with the z_cor_test function. This function was stylized after the cor.test function, but you will notice that the results may differ. This is caused by fact that z_cor_test uses Fisher’s z transformation as the basis for all significance tests (i.e., p-values). However, notice that the confidence intervals are the same.

library(TOSTER)
cor.test(mtcars$mpg, mtcars$qsec)
##
##  Pearson's product-moment correlation
##
## data:  mtcars$mpg and mtcars$qsec
## t = 2.5252, df = 30, p-value = 0.01708
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.08195487 0.66961864
## sample estimates:
##      cor
## 0.418684
z_cor_test(mtcars$mpg, mtcars$qsec)
##
##  Pearson's product-moment correlation
##
## data:  mtcars$mpg and mtcars$qsec
## z = 2.4023, N = 32, p-value = 0.01629
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.08195487 0.66961864
## sample estimates:
##      cor
## 0.418684

But, just as cor.test, the Spearman and Kendall correlation coefficients can be estimated.

z_cor_test(mtcars$mpg, mtcars$qsec,
method = "spear") # Don't need to spell full name
##
##  Spearman's rank correlation rho
##
## data:  mtcars$mpg and mtcars$qsec
## z = 2.6474, N = 32, p-value = 0.008111
## alternative hypothesis: true rho is not equal to 0
## 95 percent confidence interval:
##  0.1306771 0.7068501
## sample estimates:
##       rho
## 0.4669358
z_cor_test(mtcars$mpg, mtcars$qsec,
method = "kendall")
##
##  Kendall's rank correlation tau
##
## data:  mtcars$mpg and mtcars$qsec
## z = 2.6134, N = 32, p-value = 0.008964
## alternative hypothesis: true tau is not equal to 0
## 95 percent confidence interval:
##  0.08145572 0.51634821
## sample estimates:
##       tau
## 0.3153652

The main advantage of z_cor_test is that it can perform equivalence testing (TOST), or any hypothesis test where the null isn’t zero.

z_cor_test(mtcars$mpg, mtcars$qsec,
alternative = "e", # e for equivalence
null = .4)
##
##  Pearson's product-moment correlation
##
## data:  mtcars$mpg and mtcars$qsec
## z = 0.12088, N = 32, p-value = 0.5481
## alternative hypothesis: equivalence
## null values:
## correlation correlation
##         0.4        -0.4
## 90 percent confidence interval:
##  0.1397334 0.6360650
## sample estimates:
##      cor
## 0.418684

## Summary Statistics

If you only have the summary statistics you perform the same tests. Just imagine you are reviewing a study with an observed correlation of 0.121 with a sample size of 105 paired observations. You could then perform an equivalence test with the following code.

corsum_test(r = .121,
n = 105,
alternative = "e",
null = .4)
##
##  Pearson's product-moment correlation
##
## data:  x and y
## z = -3.0506, N = 105, p-value = 0.001142
## alternative hypothesis: equivalence
## null values:
## correlation correlation
##         0.4        -0.4
## 90 percent confidence interval:
##  -0.0412456  0.2770284
## sample estimates:
##   cor
## 0.121

# Bootstrapped Correlation Test

If the raw data is available, I would strongly recommend using the bootstrapping function which should be more robust than the Fisher’s z based function. Further, the boot_cor_test function also has 2 other correlations that can be estimated: a Winsorized correlation and the percentage bend correlation. The input for the function is fairly similar to the z_cor_test function.

set.seed(993)
boot_cor_test(mtcars$mpg, mtcars$qsec,
alternative = "e",
null = .4)
##
##  Bootstrapped Pearson's product-moment correlation
##
## data:  mtcars$mpg and mtcars$qsec
## N = 32, p-value = 0.6088
## alternative hypothesis: equivalence
## null values:
## correlation correlation
##         0.4        -0.4
## 90 percent confidence interval:
##  0.2445273 0.5848411
## sample estimates:
##      cor
## 0.418684
boot_cor_test(mtcars$mpg, mtcars$qsec,
method = "spear",
alternative = "e",
null = .4)
##
##  Bootstrapped Spearman's rank correlation rho
##
## data:  mtcars$mpg and mtcars$qsec
## N = 32, p-value = 0.6713
## alternative hypothesis: equivalence
## null values:
##  rho  rho
##  0.4 -0.4
## 90 percent confidence interval:
##  0.1983190 0.6656253
## sample estimates:
##       rho
## 0.4669358
boot_cor_test(mtcars$mpg, mtcars$qsec,
method = "ken",
alternative = "e",
null = .4)
##
##  Bootstrapped Kendall's rank correlation tau
##
## data:  mtcars$mpg and mtcars$qsec
## N = 32, p-value = 0.2276
## alternative hypothesis: equivalence
## null values:
##  tau  tau
##  0.4 -0.4
## 90 percent confidence interval:
##  0.1217169 0.4864510
## sample estimates:
##       tau
## 0.3153652

Robust correlations, such as a winsorized correlation coefficient or percentage bend correlation, can also be tested.

boot_cor_test(mtcars$mpg, mtcars$qsec,
method = "win",
alternative = "e",
null = .4,
tr = .1) # set trim
##
##  Bootstrapped Winsorized correlation wincor
##
## data:  mtcars$mpg and mtcars$qsec
## N = 32, p-value = 0.6878
## alternative hypothesis: equivalence
## null values:
## wincor wincor
##    0.4   -0.4
## 90 percent confidence interval:
##  0.2163284 0.6629980
## sample estimates:
##   wincor
## 0.464062
boot_cor_test(mtcars$mpg, mtcars$qsec,
method = "bend",
alternative = "e",
null = .4,
beta = .15) # bend argument
##
##  Bootstrapped percentage bend correlation pb
##
## data:  mtcars$mpg and mtcars$qsec
## N = 32, p-value = 0.6933
## alternative hypothesis: equivalence
## null values:
##   pb   pb
##  0.4 -0.4
## 90 percent confidence interval:
##  0.2341348 0.6455867
## sample estimates:
##        pb
## 0.4484488

# Compare Correlations

In some cases, researchers may want to compare two independent correlations. Sometimes this may be used to compare correlations between two variables between two groups (e.g., the correlation between two variables between male and female subjects) or between two independent studies (e.g., replication study).

When only summary statistics are available the compare_cor function can be used. All the user needs is the correlations (r1 and r2) and the degrees of freedom for each correlation. The degrees of freedom for most cases would the number of pairs minus 2 ($$df = N-2$$). Note: this function, similar to z_cor_test is an approximation.

compare_cor(r1 = .8,
df1 = 38,
r2 = .2,
df2 = 98)
##
##  Difference between two independent correlations (Fisher's z transform)
##
## data:  Summary Statistics
## z = 4.6364, p-value = 3.545e-06
## alternative hypothesis: true difference between correlations is not equal to 0
## sample estimates:
## difference between correlations
##                             0.6
compare_cor(r1 = .8,
df1 = 38,
r2 = .2,
df2 = 98)
##
##  Difference between two independent correlations (Fisher's z transform)
##
## data:  Summary Statistics
## z = 4.6364, p-value = 3.545e-06
## alternative hypothesis: true difference between correlations is not equal to 0
## sample estimates:
## difference between correlations
##                             0.6

The methods included to compare correlations include Fisher’s z transformation (“fisher”), and Kraatz’s method (“kraatz”). The Fisher and Kraatz methods are appropriate for general significance tests, but may have low statistical power (Counsell and Cribbie 2015). The Fisher’s method can test the difference between correlations on the z-transformed scale while Kraatz’s methods directly measures the difference between the correlation coefficients. My personal recommendation would is Fisher’s method.

compare_cor(r1 = .8,
df1 = 38,
r2 = .2,
df2 = 98,
null = .2,
method = "f", # Fisher
alternative = "e") # Equivalence
##
##  Difference between two independent correlations (Fisher's z transform)
##
## data:  Summary Statistics
## z = 0.69315, p-value = 0.9998
## alternative hypothesis: equivalence
## null values:
## difference between correlations difference between correlations
##                             0.2                            -0.2
## sample estimates:
## difference between correlations
##                             0.6

## Bootstrapped

When data is available for both correlations then the boot_compare_cor function can be utilized.

set.seed(8922)
x1 = rnorm(40)
y1 = rnorm(40)

x2 = rnorm(100)
y2 = rnorm(100)

boot_compare_cor(
x1 = x1,
x2 = x2,
y1 = y1,
y2 = y2,
null = .2,
alternative = "e", # Equivalence
method = "win" # Winsorized correlation
)
##
##  Bootstrapped difference in Winsorized correlation wincor
##
## data:  x1 and y1 vs. x2 and y2
## n1 = 40, n2 = 100, p-value = 0.7739
## alternative hypothesis: true differnce in wincor is  0.2
## 90 percent confidence interval:
##  -0.2970547  0.3978333
## sample estimates:
##     wincor
## 0.06383164

1. Bootstrapped functions were based off code posted by Rand Wilcox on his website, and was modified after looking at Guillaume Rousselet’s code, bootcorci R package, on GitHub https://github.com/GRousselet↩︎