Frédéric Santos' notebook
Emacs and R tricks for anthropologists and archaeologists.
01 avril 2020

# Comparing correlation coefficients with R

In biological anthropology, a common problem is to compare correlation coefficients between two pairs of variables. Here is a motivating example, with a small artificial dataset.

Let’s say that you have measured, on 25 individuals, a given variable $$X$$ in the canine. You have also measured another variable at two different levels (or more formally, percentages of biomechanical length: say 20% and 50%) along the diaphysis of the radius, which we will denote $$Y_{20}$$ and $$Y_{50}$$. The following dataframe contains the values you got for those variables:

dat <- structure(list(X = c(6.728, 7.766, 5.725, 8.512, 6.954, 6.183, 9.545,
8.543, 11.381, 16.02, 16.055, 14.772, 16.315, 14.593,
13.922),
Y20 = c(22.832, 20.928, 16.956, 30.856, 16.442, 15.066,
28.464, 24.815, 34.889, 42.15, 41.267, 49.669, 48.32,
37.916, 35.206),
Y50 = c(21.552, 16.731, 22.023, 19.256, 21.845, 32.57,
37.75, 22.717, 31.655, 59.547, 38.507, 45.775, 49.664,
45.003, 35.152)),
row.names = c(NA, -15L), class = "data.frame")
dat

Table 1: Artificial dataset generated in R.
X Y20 Y50
1 6.728 22.832 21.552
2 7.766 20.928 16.731
3 5.725 16.956 22.023
4 8.512 30.856 19.256
5 6.954 16.442 21.845
6 6.183 15.066 32.57
7 9.545 28.464 37.75
8 8.543 24.815 22.717
9 11.381 34.889 31.655
10 16.02 42.15 59.547
11 16.055 41.267 38.507
12 14.772 49.669 45.775
13 16.315 48.32 49.664
14 14.593 37.916 45.003
15 13.922 35.206 35.152

Here is the (rounded) correlation matrix between those three variables:

round(cor(dat), 3)

Table 2: Correlation matrix between the canine variable ($$X$$) and the radius variables $$(Y_{20}, Y_{50})$$.
X Y20 Y50
X 1 0.942 0.846
Y20 0.942 1 0.782
Y50 0.846 0.782 1

A natural question arises: we can see on the correlation matrix that $$X$$ appears to be more correlated to $$Y_{20}$$ than $$Y_{50}$$. But given the very limited sample size, how much confident can we be to draw a general conclusion about the covariation pattern between the canine and the radius diaphysis? Formally, we would like to test the null hypothesis $$\mathcal{H}_0: \rho_{X, Y_{20}} = \rho_{X, Y_{50}}$$ against the alternative hypothesis $$\mathcal{H}_1: \rho_{X, Y_{20}} \neq \rho_{X, Y_{50}}$$. Although frequent, this problem is rarely covered in most of the introductory statistics textbooks. In this blog post, I’ll try to present a few approaches (among many others!) that may be useful to compare correlation coefficients in various contexts.

## 1 Definitions

Let’s start with a bit of terminology. When we want to compare two correlation coefficients, we are necessarily in one of those three cases [1]:

1. Dependent and overlapping correlations: we want to compare $$\rho_{X, Y}$$ and $$\rho_{X, Z}$$ (i.e., one variable $$X$$ is present in both coefficients), and the data have been collected on the same individuals for $$X$$, $$Y$$ and $$Z$$.
2. Dependent and non-overlapping correlations: we want to compare $$\rho_{X, Y}$$ and $$\rho_{W, Z}$$ (i.e., no variable is repeated), and the data have been collected on the same individuals for $$X$$, $$Y$$, $$W$$ and $$Z$$.
3. Independent correlations: we want to compare the correlation between $$X$$ and $$Y$$ in two different samples or groups.

For each for those three cases, different methods apply.

## 2 Comparing dependent, overlapping correlations

This first case corresponds to our motivating example: comparing $$\rho_{X, Y_{20}}$$ and $$\rho_{X, Y_{50}}$$, given that all the data have been collected on the same individuals and under the same experimental conditions.

### 2.1 A resampling approach

An empirical bootstrap approach may be useful to understand the underlying concepts. This approach proceeds through the following steps:

1. Resample $$n$$ individuals with replacement ($$n$$ being the original sample size)
2. Compute the two correlation coefficients $$\rho^*_{X, Y_{20}}$$ and $$\rho^*_{X, Y_{50}}$$ using the bootstrap samples
3. Compute their difference $$d^* = \rho^*_{X, Y_{20}} - \rho^*_{X, Y_{50}}$$
4. Execute steps 1 to 3 $$B$$ times ($$B > 500$$ or $$1000$$ may seem a reasonable choice, depending on the original sample size $$n$$)
5. Use the empirical distribution of the $$B$$ values of $$d^*$$ to compute a 95% confidence interval for the difference between the two correlation coefficients.

This is a standard bootstrap procedure, recalled in many textbooks [2,8]. Here is an example of R implementation (I don’t use the R package boot here, so as to make the algorithm more readable and transparent):

## Set some constants:
set.seed(2020)
B <- 999 # number of bootstrap replicates
n <- nrow(dat) # total sample size
## Initialize an empty vector for bootstrap statistics:
dstar <- rep(NA, B)

## Run the bootstrap procedure:
for (b in 1:B) {
indices <- sample(x = 1:n, size = n, replace = TRUE)
bootsample <- dat[indices, ]
rho1 <- cor(bootsample$X, bootsample$Y20)
rho2 <- cor(bootsample$X, bootsample$Y50)
dstar[b] <- rho1 - rho2
}

## Plot histogram and display confidence interval:
hist(dstar, main = "Difference in correlation coefficients",
xlab = "d*")
dstar <- sort(dstar) # sort to compute empirical CI
abline(v = dstar[c(0.025*B, 0.975*B)], lty = 2, col = "blue")


As shown on figure 1, the 95% empirical confidence interval includes the null value. This bootstrap procedure does not provide sufficient evidence to discard the null hypothesis here, and we cannot conclude (or generalize) that $$Y_{50}$$ is really a better candidate than $$Y_{20}$$ to be correlated with the canine values $$X$$.

### 2.2 Using the R package cocor

There are several exact methods to provide an alternative to the resampling approach developed in the previous section. The R package cocor (whose the usage is extensively, and very clearly exposed in [1]) implements many of them. Furthermore, it is really easy-to-use, either through the command-line interface or through a dedicated website.

The main function of the package, also named cocor, can handle R formulas, as shown below. Here I ask for the results returned by two modern methods: Zou’s method [9] and Hittner’s method [4].

library(cocor)

cocor(~X + Y20 | X + Y50, data = dat,
test = c("hittner2003", "zou2007"))



Results of a comparison of two overlapping correlations based on dependent groups

Comparison between r.jk (X, Y20) = 0.9424 and r.jh (X, Y50) = 0.8463
Difference: r.jk - r.jh = 0.0961
Related correlation: r.kh = 0.782
Data: dat: j = X, k = Y20, h = Y50
Group size: n = 15
Null hypothesis: r.jk is equal to r.jh
Alternative hypothesis: r.jk is not equal to r.jh (two-sided)
Alpha: 0.05

hittner2003: Hittner, May, and Silver's (2003) modification of Dunn and Clark's z (1969) using a backtransformed average Fisher's (1921) Z procedure
z = 1.6186, p-value = 0.1055
Null hypothesis retained

zou2007: Zou's (2007) confidence interval
95% confidence interval for r.jk - r.jh: -0.0217 0.3404
Null hypothesis retained (Interval includes 0)


None of those methods can help to discard the null hypothesis at the specified alpha level (0.05), which is consistent with the bootstrap simulations.

### 2.3 But hey, do the methods necessarily give consistent results?

Well, this question has received quite enlightening answers on a thread on Researchgate, and I could probably not give a better one!

## 3 Comparing dependent, non-overlapping correlations

In this scenario, the researcher wants to know whether the correlation between a given pair of variables $$(X, Y)$$ is higher than the correlation between another pair of variables $$(W, Z)$$, all of them being measured on the same individuals.

Let’s say that, in the classical iris dataset included in R, we want to know whether the correlation between Sepal.Length and Sepal.Width is higher than the correlation between Petal.Length and Petal.Width for the species setosa, i.e. whether petal or sepal dimensions are most correlated. Here are the first rows of this famous dataset:

data(iris)
setosa <- subset(iris, Species == "setosa")

Table 3: An overview of Fisher’s iris dataset for the setosa species.
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa

The correlation for the dimensions of the sepals is:

with(setosa, cor(Sepal.Length, Sepal.Width))

[1] 0.7425467


and for the petals:

with(setosa, cor(Petal.Length, Petal.Width))

[1] 0.33163


In this case, the result seems to be clear enough, but let’s illustrate formal procedures for this scenario.

### 3.1 A resampling approach

A straightforward modification of the previous case apply: the same bootstrap procedure remain efficient here. Once again, we will compute a 95% confidence interval for the difference between the two correlation coefficients under study. But this time, let’s use the R package boot for a greater efficiency.

library(boot)
set.seed(2020)
## First define a function to work out the difference of coefs:
diff_corr <- function(data, indices) {
data <- data[indices, ]
cor1 <- cor(data$Sepal.Length, data$Sepal.Width)
cor2 <- cor(data$Petal.Length, data$Petal.Width)
return(cor1 - cor2)
}
## Then apply a bootstrap procedure with 999 draws:
res_boot <- boot(data = setosa,
R = 999,
statistic = diff_corr,
stype = "i")
## Retrieve the empirical 95% confidence interval:
boot.ci(res_boot, type = "perc", conf = 0.95)


BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 999 bootstrap replicates

CALL :
boot.ci(boot.out = res_boot, conf = 0.95, type = "perc")

Intervals :
Level     Percentile
95%   ( 0.1877,  0.6805 )
Calculations and Intervals on Original Scale


This time, the null value is not a plausible value for the difference between the two correlation coefficients, thus bringing a strong evidence against the null hypothesis of no difference.

### 3.2 Using the R package cocor

The interface formula remains very easy-to-use in this case. Let’s try with the methods zou2007 [9] and silver2004 [5]:

cocor(~Sepal.Length + Sepal.Width | Petal.Length + Petal.Width,
data = setosa, test = c("silver2004", "zou2007"))



Results of a comparison of two nonoverlapping correlations based on dependent groups

Comparison between r.jk (Sepal.Length, Sepal.Width) = 0.7425 and r.hm (Petal.Length, Petal.Width) = 0.3316
Difference: r.jk - r.hm = 0.4109
Related correlations: r.jh = 0.2672, r.jm = 0.2781, r.kh = 0.1777, r.km = 0.2328
Data: setosa: j = Sepal.Length, k = Sepal.Width, h = Petal.Length, m = Petal.Width
Group size: n = 50
Null hypothesis: r.jk is equal to r.hm
Alternative hypothesis: r.jk is not equal to r.hm (two-sided)
Alpha: 0.05

silver2004: Silver, Hittner, and May's (2004) modification of Dunn and Clark's z (1969) using a backtransformed average Fisher's (1921) Z procedure
z = 3.0306, p-value = 0.0024
Null hypothesis rejected

zou2007: Zou's (2007) confidence interval
95% confidence interval for r.jk - r.hm: 0.1405 0.6986
Null hypothesis rejected (Interval does not include 0)


The results from those two methods are consistent with the conclusion (and confidence interval) derived from our home-made bootstrap procedure, and brings a strong evidence against the null hypothesis of no difference.

## 4 Comparing independent correlations

This time the problem is quite different: the researcher wants to know whether a given pair of variables $$(X, Y)$$ is more correlated in a group A than in a group B. With the previous iris dataset, this could correspond to the following question: is the correlation between petal length and petal width different within the species setosa and virginica? Let’s illustrate this situation by also extracting the 50 flowers belonging to the viriginica species from the original dataset:

versicolor <- subset(iris, Species == "versicolor")


We already know that the correlation between those two variables for the species setosa is approximately equal to 0.33. Let’s compute this correlation within the species virginica:

with(versicolor, cor(Petal.Length, Petal.Width))

[1] 0.7866681


The two correlation coefficients seem to differ strongly, and we can expect than this observed difference (with a reasonable sample size of 50 individuals per group) comes from a true difference between those two species. Let’s verify it with cocor.

The most classical method in this case is the Fisher’s z transformation [3], also implemented in the R package cocor. Still with the same formula interface:

cocor(~ Petal.Length + Petal.Width | Petal.Length + Petal.Width,
data = list(setosa, versicolor))



Results of a comparison of two correlations based on independent groups

Comparison between r1.jk (Petal.Length, Petal.Width) = 0.3316 and r2.hm (Petal.Length, Petal.Width) = 0.7867
Difference: r1.jk - r2.hm = -0.455
Data: list(setosa, versicolor): j = Petal.Length, k = Petal.Width, h = Petal.Length, m = Petal.Width
Group sizes: n1 = 50, n2 = 50
Null hypothesis: r1.jk is equal to r2.hm
Alternative hypothesis: r1.jk is not equal to r2.hm (two-sided)
Alpha: 0.05

fisher1925: Fisher's z (1925)
z = -3.4805, p-value = 0.0005
Null hypothesis rejected

zou2007: Zou's (2007) confidence interval
95% confidence interval for r1.jk - r2.hm: -0.7415 -0.1907
Null hypothesis rejected (Interval does not include 0)


Both methods bring a strong evidence against the null hypothesis of no difference.

## 5 Conclusive remarks

Even if it was not done here, one should always begin with scatterplots, so as to detect potential outliers or non-linear relationships. In the first case (outliers), robust alternatives do exist [6,7]. In both cases, using Sperman correlation coefficients instead of classical Pearson coefficients may be preferable.

## References

 [1] Birk Diedenhofen and Jochen Musch. Cocor: A Comprehensive Solution for the Statistical Comparison of Correlations. PLOS ONE, 10(4):e0121945, 2015. [ DOI ] [2] Bradley Efron and Robert Tibshirani. An Introduction to the Bootstrap. Number 57 in Monographs on Statistics and Applied Probability. Chapman & Hall, New York, 1993. [3] R. A. Fisher. Statistical Methods for Research Workers. Oliver and Boyd, Edinburgh, Scotland, 1925. [4] James B. Hittner, Kim May, and N. CLAYTON Silver. A Monte Carlo Evaluation of Tests for Comparing Dependent Correlations. The Journal of General Psychology, 130(2):149--168, 2003. [ DOI ] [5] N. Clayton Silver, James B. Hittner, and Kim May. Testing Dependent Correlations With Nonoverlapping Variables: A Monte Carlo Simulation. The Journal of Experimental Education, 73(1):53--69, 2004. [ DOI ] [6] Rand R. Wilcox. Comparing Pearson Correlations: Dealing with Heteroscedasticity and Nonnormality. Communications in Statistics - Simulation and Computation, 38(10):2220--2234, 2009. [ DOI ] [7] Rand R. Wilcox. Comparing dependent robust correlations. British Journal of Mathematical and Statistical Psychology, 69(3):215--224, 2016. [ DOI ] [8] Andrew Zieffler, Jeffrey Harring, and Jeffrey D. Long. Comparing Groups: Randomization and Bootstrap Methods Using R. Wiley, Hoboken, N.J, 2011. [9] Guang Yong Zou. Toward using confidence intervals to compare correlations. Psychological Methods, 12(4):399--413, 2007. [ DOI ]