Comparing correlation coefficients with R
In biological anthropology, a common problem is to compare correlation coefficients between two pairs of variables. Here is a motivating example, with a small artificial dataset.
Let’s say that you have measured, on 25 individuals, a given variable \(X\) in the canine. You have also measured another variable at two different levels (or more formally, percentages of biomechanical length: say 20% and 50%) along the diaphysis of the radius, which we will denote \(Y_{20}\) and \(Y_{50}\). The following dataframe contains the values you got for those variables:
dat <- structure(list(X = c(6.728, 7.766, 5.725, 8.512, 6.954, 6.183, 9.545, 8.543, 11.381, 16.02, 16.055, 14.772, 16.315, 14.593, 13.922), Y20 = c(22.832, 20.928, 16.956, 30.856, 16.442, 15.066, 28.464, 24.815, 34.889, 42.15, 41.267, 49.669, 48.32, 37.916, 35.206), Y50 = c(21.552, 16.731, 22.023, 19.256, 21.845, 32.57, 37.75, 22.717, 31.655, 59.547, 38.507, 45.775, 49.664, 45.003, 35.152)), row.names = c(NA, -15L), class = "data.frame") dat
X | Y20 | Y50 | |
---|---|---|---|
1 | 6.728 | 22.832 | 21.552 |
2 | 7.766 | 20.928 | 16.731 |
3 | 5.725 | 16.956 | 22.023 |
4 | 8.512 | 30.856 | 19.256 |
5 | 6.954 | 16.442 | 21.845 |
6 | 6.183 | 15.066 | 32.57 |
7 | 9.545 | 28.464 | 37.75 |
8 | 8.543 | 24.815 | 22.717 |
9 | 11.381 | 34.889 | 31.655 |
10 | 16.02 | 42.15 | 59.547 |
11 | 16.055 | 41.267 | 38.507 |
12 | 14.772 | 49.669 | 45.775 |
13 | 16.315 | 48.32 | 49.664 |
14 | 14.593 | 37.916 | 45.003 |
15 | 13.922 | 35.206 | 35.152 |
Here is the (rounded) correlation matrix between those three variables:
round(cor(dat), 3)
X | Y20 | Y50 | |
---|---|---|---|
X | 1 | 0.942 | 0.846 |
Y20 | 0.942 | 1 | 0.782 |
Y50 | 0.846 | 0.782 | 1 |
A natural question arises: we can see on the correlation matrix that \(X\) appears to be more correlated to \(Y_{20}\) than \(Y_{50}\). But given the very limited sample size, how much confident can we be to draw a general conclusion about the covariation pattern between the canine and the radius diaphysis? Formally, we would like to test the null hypothesis \(\mathcal{H}_0: \rho_{X, Y_{20}} = \rho_{X, Y_{50}}\) against the alternative hypothesis \(\mathcal{H}_1: \rho_{X, Y_{20}} \neq \rho_{X, Y_{50}}\). Although frequent, this problem is rarely covered in most of the introductory statistics textbooks. In this blog post, I’ll try to present a few approaches (among many others!) that may be useful to compare correlation coefficients in various contexts.
1 Definitions
Let’s start with a bit of terminology. When we want to compare two correlation coefficients, we are necessarily in one of those three cases [1]:
- Dependent and overlapping correlations: we want to compare \(\rho_{X, Y}\) and \(\rho_{X, Z}\) (i.e., one variable \(X\) is present in both coefficients), and the data have been collected on the same individuals for \(X\), \(Y\) and \(Z\).
- Dependent and non-overlapping correlations: we want to compare \(\rho_{X, Y}\) and \(\rho_{W, Z}\) (i.e., no variable is repeated), and the data have been collected on the same individuals for \(X\), \(Y\), \(W\) and \(Z\).
- Independent correlations: we want to compare the correlation between \(X\) and \(Y\) in two different samples or groups.
For each for those three cases, different methods apply.
2 Comparing dependent, overlapping correlations
This first case corresponds to our motivating example: comparing \(\rho_{X, Y_{20}}\) and \(\rho_{X, Y_{50}}\), given that all the data have been collected on the same individuals and under the same experimental conditions.
2.1 A resampling approach
An empirical bootstrap approach may be useful to understand the underlying concepts. This approach proceeds through the following steps:
- Resample \(n\) individuals with replacement (\(n\) being the original sample size)
- Compute the two correlation coefficients \(\rho^*_{X, Y_{20}}\) and \(\rho^*_{X, Y_{50}}\) using the bootstrap samples
- Compute their difference \(d^* = \rho^*_{X, Y_{20}} - \rho^*_{X, Y_{50}}\)
- Execute steps 1 to 3 \(B\) times (\(B > 500\) or \(1000\) may seem a reasonable choice, depending on the original sample size \(n\))
- Use the empirical distribution of the \(B\) values of \(d^*\) to compute a 95% confidence interval for the difference between the two correlation coefficients.
This is a standard bootstrap procedure, recalled in many textbooks [2,8]. Here is an example of R implementation (I don’t use the R package boot
here, so as to make the algorithm more readable and transparent):
## Set some constants: set.seed(2020) B <- 999 # number of bootstrap replicates n <- nrow(dat) # total sample size ## Initialize an empty vector for bootstrap statistics: dstar <- rep(NA, B) ## Run the bootstrap procedure: for (b in 1:B) { indices <- sample(x = 1:n, size = n, replace = TRUE) bootsample <- dat[indices, ] rho1 <- cor(bootsample$X, bootsample$Y20) rho2 <- cor(bootsample$X, bootsample$Y50) dstar[b] <- rho1 - rho2 } ## Plot histogram and display confidence interval: hist(dstar, main = "Difference in correlation coefficients", xlab = "d*") dstar <- sort(dstar) # sort to compute empirical CI abline(v = dstar[c(0.025*B, 0.975*B)], lty = 2, col = "blue")
As shown on figure 1, the 95% empirical confidence interval includes the null value. This bootstrap procedure does not provide sufficient evidence to discard the null hypothesis here, and we cannot conclude (or generalize) that \(Y_{50}\) is really a better candidate than \(Y_{20}\) to be correlated with the canine values \(X\).
2.2 Using the R package cocor
There are several exact methods to provide an alternative to the resampling approach developed in the previous section. The R package cocor
(whose the usage is extensively, and very clearly exposed in [1]) implements many of them. Furthermore, it is really easy-to-use, either through the command-line interface or through a dedicated website.
The main function of the package, also named cocor
, can handle R formulas, as shown below. Here I ask for the results returned by two modern methods: Zou’s method [9] and Hittner’s method [4].
library(cocor) cocor(~X + Y20 | X + Y50, data = dat, test = c("hittner2003", "zou2007"))
Results of a comparison of two overlapping correlations based on dependent groups Comparison between r.jk (X, Y20) = 0.9424 and r.jh (X, Y50) = 0.8463 Difference: r.jk - r.jh = 0.0961 Related correlation: r.kh = 0.782 Data: dat: j = X, k = Y20, h = Y50 Group size: n = 15 Null hypothesis: r.jk is equal to r.jh Alternative hypothesis: r.jk is not equal to r.jh (two-sided) Alpha: 0.05 hittner2003: Hittner, May, and Silver's (2003) modification of Dunn and Clark's z (1969) using a backtransformed average Fisher's (1921) Z procedure z = 1.6186, p-value = 0.1055 Null hypothesis retained zou2007: Zou's (2007) confidence interval 95% confidence interval for r.jk - r.jh: -0.0217 0.3404 Null hypothesis retained (Interval includes 0)
None of those methods can help to discard the null hypothesis at the specified alpha level (0.05), which is consistent with the bootstrap simulations.
2.3 But hey, do the methods necessarily give consistent results?
Well, this question has received quite enlightening answers on a thread on Researchgate, and I could probably not give a better one!
3 Comparing dependent, non-overlapping correlations
In this scenario, the researcher wants to know whether the correlation between a given pair of variables \((X, Y)\) is higher than the correlation between another pair of variables \((W, Z)\), all of them being measured on the same individuals.
Let’s say that, in the classical iris
dataset included in R, we want to know whether the correlation between Sepal.Length
and Sepal.Width
is higher than the correlation between Petal.Length
and Petal.Width
for the species setosa, i.e. whether petal or sepal dimensions are most correlated. Here are the first rows of this famous dataset:
data(iris) setosa <- subset(iris, Species == "setosa") head(setosa)
Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species | |
---|---|---|---|---|---|
1 | 5.1 | 3.5 | 1.4 | 0.2 | setosa |
2 | 4.9 | 3 | 1.4 | 0.2 | setosa |
3 | 4.7 | 3.2 | 1.3 | 0.2 | setosa |
4 | 4.6 | 3.1 | 1.5 | 0.2 | setosa |
5 | 5 | 3.6 | 1.4 | 0.2 | setosa |
6 | 5.4 | 3.9 | 1.7 | 0.4 | setosa |
The correlation for the dimensions of the sepals is:
with(setosa, cor(Sepal.Length, Sepal.Width))
[1] 0.7425467
and for the petals:
with(setosa, cor(Petal.Length, Petal.Width))
[1] 0.33163
In this case, the result seems to be clear enough, but let’s illustrate formal procedures for this scenario.
3.1 A resampling approach
A straightforward modification of the previous case apply: the same bootstrap procedure remain efficient here. Once again, we will compute a 95% confidence interval for the difference between the two correlation coefficients under study. But this time, let’s use the R package boot
for a greater efficiency.
library(boot) set.seed(2020) ## First define a function to work out the difference of coefs: diff_corr <- function(data, indices) { data <- data[indices, ] cor1 <- cor(data$Sepal.Length, data$Sepal.Width) cor2 <- cor(data$Petal.Length, data$Petal.Width) return(cor1 - cor2) } ## Then apply a bootstrap procedure with 999 draws: res_boot <- boot(data = setosa, R = 999, statistic = diff_corr, stype = "i") ## Retrieve the empirical 95% confidence interval: boot.ci(res_boot, type = "perc", conf = 0.95)
BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS Based on 999 bootstrap replicates CALL : boot.ci(boot.out = res_boot, conf = 0.95, type = "perc") Intervals : Level Percentile 95% ( 0.1877, 0.6805 ) Calculations and Intervals on Original Scale
This time, the null value is not a plausible value for the difference between the two correlation coefficients, thus bringing a strong evidence against the null hypothesis of no difference.
3.2 Using the R package cocor
The interface formula remains very easy-to-use in this case. Let’s try with the methods zou2007
[9] and silver2004
[5]:
cocor(~Sepal.Length + Sepal.Width | Petal.Length + Petal.Width, data = setosa, test = c("silver2004", "zou2007"))
Results of a comparison of two nonoverlapping correlations based on dependent groups Comparison between r.jk (Sepal.Length, Sepal.Width) = 0.7425 and r.hm (Petal.Length, Petal.Width) = 0.3316 Difference: r.jk - r.hm = 0.4109 Related correlations: r.jh = 0.2672, r.jm = 0.2781, r.kh = 0.1777, r.km = 0.2328 Data: setosa: j = Sepal.Length, k = Sepal.Width, h = Petal.Length, m = Petal.Width Group size: n = 50 Null hypothesis: r.jk is equal to r.hm Alternative hypothesis: r.jk is not equal to r.hm (two-sided) Alpha: 0.05 silver2004: Silver, Hittner, and May's (2004) modification of Dunn and Clark's z (1969) using a backtransformed average Fisher's (1921) Z procedure z = 3.0306, p-value = 0.0024 Null hypothesis rejected zou2007: Zou's (2007) confidence interval 95% confidence interval for r.jk - r.hm: 0.1405 0.6986 Null hypothesis rejected (Interval does not include 0)
The results from those two methods are consistent with the conclusion (and confidence interval) derived from our home-made bootstrap procedure, and brings a strong evidence against the null hypothesis of no difference.
4 Comparing independent correlations
This time the problem is quite different: the researcher wants to know whether a given pair of variables \((X, Y)\) is more correlated in a group A than in a group B. With the previous iris
dataset, this could correspond to the following question: is the correlation between petal length and petal width different within the species setosa and virginica? Let’s illustrate this situation by also extracting the 50 flowers belonging to the viriginica species from the original dataset:
versicolor <- subset(iris, Species == "versicolor")
We already know that the correlation between those two variables for the species setosa is approximately equal to 0.33. Let’s compute this correlation within the species virginica:
with(versicolor, cor(Petal.Length, Petal.Width))
[1] 0.7866681
The two correlation coefficients seem to differ strongly, and we can expect than this observed difference (with a reasonable sample size of 50 individuals per group) comes from a true difference between those two species. Let’s verify it with cocor
.
The most classical method in this case is the Fisher’s z transformation [3], also implemented in the R package cocor
. Still with the same formula interface:
cocor(~ Petal.Length + Petal.Width | Petal.Length + Petal.Width, data = list(setosa, versicolor))
Results of a comparison of two correlations based on independent groups Comparison between r1.jk (Petal.Length, Petal.Width) = 0.3316 and r2.hm (Petal.Length, Petal.Width) = 0.7867 Difference: r1.jk - r2.hm = -0.455 Data: list(setosa, versicolor): j = Petal.Length, k = Petal.Width, h = Petal.Length, m = Petal.Width Group sizes: n1 = 50, n2 = 50 Null hypothesis: r1.jk is equal to r2.hm Alternative hypothesis: r1.jk is not equal to r2.hm (two-sided) Alpha: 0.05 fisher1925: Fisher's z (1925) z = -3.4805, p-value = 0.0005 Null hypothesis rejected zou2007: Zou's (2007) confidence interval 95% confidence interval for r1.jk - r2.hm: -0.7415 -0.1907 Null hypothesis rejected (Interval does not include 0)
Both methods bring a strong evidence against the null hypothesis of no difference.
5 Conclusive remarks
Even if it was not done here, one should always begin with scatterplots, so as to detect potential outliers or non-linear relationships. In the first case (outliers), robust alternatives do exist [6,7]. In both cases, using Sperman correlation coefficients instead of classical Pearson coefficients may be preferable.
References
[1] | Birk Diedenhofen and Jochen Musch. Cocor: A Comprehensive Solution for the Statistical Comparison of Correlations. PLOS ONE, 10(4):e0121945, 2015. [ DOI ] |
[2] | Bradley Efron and Robert Tibshirani. An Introduction to the Bootstrap. Number 57 in Monographs on Statistics and Applied Probability. Chapman & Hall, New York, 1993. |
[3] | R. A. Fisher. Statistical Methods for Research Workers. Oliver and Boyd, Edinburgh, Scotland, 1925. |
[4] | James B. Hittner, Kim May, and N. CLAYTON Silver. A Monte Carlo Evaluation of Tests for Comparing Dependent Correlations. The Journal of General Psychology, 130(2):149--168, 2003. [ DOI ] |
[5] | N. Clayton Silver, James B. Hittner, and Kim May. Testing Dependent Correlations With Nonoverlapping Variables: A Monte Carlo Simulation. The Journal of Experimental Education, 73(1):53--69, 2004. [ DOI ] |
[6] | Rand R. Wilcox. Comparing Pearson Correlations: Dealing with Heteroscedasticity and Nonnormality. Communications in Statistics - Simulation and Computation, 38(10):2220--2234, 2009. [ DOI ] |
[7] | Rand R. Wilcox. Comparing dependent robust correlations. British Journal of Mathematical and Statistical Psychology, 69(3):215--224, 2016. [ DOI ] |
[8] | Andrew Zieffler, Jeffrey Harring, and Jeffrey D. Long. Comparing Groups: Randomization and Bootstrap Methods Using R. Wiley, Hoboken, N.J, 2011. |
[9] | Guang Yong Zou. Toward using confidence intervals to compare correlations. Psychological Methods, 12(4):399--413, 2007. [ DOI ] |