Note this version of the R package and tutorial addresses two errors
in the original paper: 1) The harmonic mean *p*-value for set
\(\mathcal{R}\) should be compared
against a threshold \(\alpha_L\), not
\(\alpha_{|\mathcal{R}|}\). 2) The
mechanism for calculating asymptotically exact harmonic mean
*p*-values was wrongly stated. The harmonicmeanp package version
2.0 only corrected error 1. Here both errors are addressed, and many of
the results have changed quantitatively. In particular, correcting the
errors makes the test less powerful than previously claimed for
narrowing down regions of significance. See the updated correction for
further information: https://blog.danielwilson.me.uk/2019/07/correction-harmonic-mean-p-value-for.html

The harmonic mean *p*-value (HMP) is a method for performing a
combined test of the null hypothesis that no *p*-value is
significant (Wilson, 2019). It is more robust to dependence between
*p*-values than Fisher’s (1934) method, making it more broadly
applicable. Like Bonferroni correction, the HMP procedure controls the
strong-sense family-wise error rate (ssFWER). It is more powerful than
common alternatives including Bonferroni and Simes procedures when
combining large proportions of all the *p*-values, at the cost of
slightly lower power when combining small proportions of all the
*p*-values. It rivals the power of the BH procedure (Benjamini
and Hochberg, 1995) which controls both the weak-sense family-wise error
rate (wsFWER) and the false discovery rate (FDR), in the sense that the
HMP is expected to find one or more *p*-values or groups of
*p*-values significant more often than the BH procedure finds one
or more *p*-values significant.

Method | Robust to dependence | Indicative power^{1} |
Controls | ||||
---|---|---|---|---|---|---|---|

Significance very rare | Significance uncommon | Significance common | FDR | wsFWER | ssFWER | ||

Fisher | \(\times\) | \(\bullet\) | \(\bullet\,\bullet\,\bullet\,\bullet\) | \(\bullet\,\bullet\,\bullet\,\bullet\,\bullet\) | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) |

HMP | \(\checkmark\) | \(\bullet\,\bullet\,\circ\) | \(\bullet\,\bullet\,\bullet\) | \(\bullet\,\bullet\,\bullet\,\bullet\) | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) |

BH | \(\checkmark\checkmark\) | \(\bullet\,\bullet\,\circ\) | \(\bullet\,\bullet\,\circ\) | \(\bullet\,\bullet\,\circ\) | \(\checkmark\) | \(\checkmark\) | \(\times\) |

Bonferroni | \(\checkmark\checkmark\checkmark\) | \(\bullet\,\bullet\,\circ\) | \(\bullet\,\bullet\) | \(\bullet\,\bullet\) | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) |

There are two approaches to applying the HMP:

One compares \(\overset{\circ}{p}_{\mathcal{R}}\), the HMP of a set of

*p*-values \(\mathcal{R}\), to significance threshold \(\alpha_L\,w_\mathcal{R}\).Equivalently, one compares \(p_{\overset{\circ}{p}_{\mathcal{R}}}\), an asymptotically exact HMP, to the significance threshold \(\alpha\,w_\mathcal{R}\).

In the above,

\(\overset{\circ}{p}_{\mathcal{R}}=\left(\sum_{i\in\mathcal{R}}w_{i}\right)/\left(\sum_{i\in\mathcal{R}}w_{i}/p_{i}\right)\) defines the HMP for the set of

*p*-values \(\mathcal{R}\). It is computed in R by \(\texttt{hmp.stat}\).\(\alpha_L\) is a significance threshold that depends on the desired family-wise error rate \(\alpha\) and the total number of individual

*p*-values \(L\). It is computed in R by \(\texttt{qharmonicmeanp}\).\(w_1 \dots w_L\) are weights for the individual

*p*-values that must sum to one, i.e. \(\sum_{i=1}^{L}w_{i}=1\), and \(w_\mathcal{R} = \sum_{i\in\mathcal{R}} w_i\) equals the sum of weights of the individual*p*-values in set \(\mathcal{R}\). The HMP is robust to the choice of weights, so it is reasonable to start with equal weights (\(w_{i}=1/L\)).The asymptotically exact

*p*-value is computed by R using the \(\texttt{p.hmp}\) command, which takes into account the total number of individual*p*-values, \(L\).

In this tutorial, I will focus on the second approach because one
usually wishes to quote in reports a *p*-value that can be
directly compared to the usual significance threshold, e.g. \(\alpha=0.05\). When the subscript \(\mathcal{R}\) is dropped from \(\overset{\circ}{p}_{\mathcal{R}}\) and
\(p_{\overset{\circ}{p}_{\mathcal{R}}}\) it
means \(\mathcal{R}\) includes all the
*p*-values. This is the “headline” HMP.

The method may be used as follows:

The headline HMP is deemed significant when \(\overset{\circ}{p}\leq\alpha_L\) or, equivalently, \(p_{\overset{\circ}{p}}\leq\alpha\). Here significant means that we reject the null hypothesis that none of the

*p*-values are significant.If the headline HMP is not significant, neither is the HMP for any subset. If the headline HMP is significant, subsets may also be significant. The significance thresholds are all pre-determined so the number of subsets that are tested does not affect them.

The HMP for a subset of

*p*-values is deemed significant when \(\overset{\circ}{p}_{\mathcal{R}}\leq\alpha_L\,w_{\mathcal{R}}\) or, equivalently, \(p_{\overset{\circ}{p}_{\mathcal{R}}}\leq\alpha\,w_{\mathcal{R}}\). Here significant means that we reject the null hypothesis that none of the*p*-values in subset \(\mathcal{R}\) are significant.

To use this tutorial, copy and paste the R code from your web browser to the R console. In the HTML version, you can select and copy R code simply by clicking within the code snippet (as long as JQuery is enabled in your web browser and the page was compiled with Rmarkdown v2). To ensure the tutorial runs correctly, execute each code snippet once, in order. This tutorial was compiled using Rmarkdown (Allaire et al., 2018), with equations rendered by Mathjax.

If you haven’t already installed the package, type at the R console:

`install.packages("harmonicmeanp")`

Once you have installed the package, load it in the usual way:

`library(harmonicmeanp)`

`## Loading required package: FMStable`

To check you have version 3.0 installed, type

`stopifnot(packageVersion("harmonicmeanp")>=3.0)`

Download the 312457 *p*-values from chromosome 12 of the
genome-wide association study (GWAS) for neuroticism (Okbay et al.,
2016). This file is an excerpt of the original data. It took me a few
seconds to download the data excerpt. The 8 megabyte file contains rs
identifiers and SNP positions as per human genome build GRCh37/hg19 as
well as the *p*-values.

```
system.time((gwas = read.delim("https://www.danielwilson.me.uk/files/Neuroticism_ch12.txt",
header=TRUE,as.is=TRUE)))
```

```
## user system elapsed
## 0.795 0.046 1.447
```

`head(gwas)`

```
## rs pos p
## 1 rs7959779 149478 0.3034
## 2 rs4980821 149884 0.5905
## 3 rs192950336 150256 0.1125
## 4 rs61907205 151213 0.4896
## 5 rs2368809 151236 0.7066
## 6 rs4018398 151469 0.9420
```

The harmonic mean *p*-value (HMP) is a statistic with which
one can perform a combined test of the null hypothesis that none of the
*p*-values is significant even when the *p*-values are
dependent. In GWAS, *p*-values will often be dependent because of
genetic linkage. The HMP can be used to test the null hypothesis that no
SNPs on chromosome 12 are significant. Let’s do it manually by first
calculating the HMP, assuming equal weights. Note that a total of
*L*=6524432 tests were performed genome-wide, so this number must
be used to determine the weights if we are to control the genome-wide
ssFWER, even though we are only analysing the 312457 SNPs on chromosome
12 in this example.

```
L = 6524432
gwas$w = 1/L
R = 1:nrow(gwas)
(HMP.R = sum(gwas$w[R])/sum(gwas$w[R]/gwas$p[R]))
```

`## [1] 0.0008734522`

One of the remarkable properties of the HMP is that for small values
(e.g. below 0.05), the HMP can be directly interpreted as a
*p*-value *after adjusting for multiple comparisons*. That
the HMP equals \(\overset{\circ}{p}_{\mathcal{R}}=0.0008734522\)
suggests it is strongly significant *before multiple testing
correction*. To test this formally, first the HMP significance
threshold is computed. For that I will assume a false positive rate of
\(\alpha=0.05\), i.e. 5%.

```
# Specify the false positive rate
alpha = 0.05
# Compute the HMP significance threshold
(alpha.L = qharmonicmeanp(alpha, L))
```

`## [1] 0.02593083`

The multiple testing-adjusted threshold against which to evaluate the
significance of the combined test is determined by the sum of the
weights for the *p*-values being combined. The HMP for subset
\(\mathcal{R}\) is significant when
\(\overset{\circ}{p}_\mathcal{R}\leq \alpha_L
w_\mathcal{R}\).

```
# Test whether the HMP for subset R is significance
w.R = sum(gwas$w[R])
alpha.L * w.R
```

`## [1] 0.001241835`

Therefore *after adjusting for multiple comparison* we can
reject the null hypothesis of no association on chromosome 12 at level
\(\alpha=0.05\) because 0.0008734522 is
below 0.001241835.

An equivalent approach is to calculate an asymptotically exact
*p*-value based on the HMP.

```
# Use p.hmp instead to compute the HMP test statistic and
# calculate its asymptotically exact p-value in one step
# Note this line has changed because of a previous error.
w.R*pharmonicmeanp(HMP.R/w.R, L=L, lower.tail=TRUE)
```

`## [1] 0.001343897`

```
# Compare it to the multiple testing threshold
w.R*alpha
```

`## [1] 0.002394515`

The asymptotically exact *p*-value of \(p_{\overset{\circ}{p}_{\mathcal{R}}}=0.001343897\)
is close to the HMP of \(\overset{\circ}{p}_{\mathcal{R}}=0.0008734522\)
and also significant because it is below \(0.002394515\). Note however that direct
interpretation of the HMP is anti-conservative compared to the
asymptotically exact test, which is why the HMP had to be compared
directly to the more stringent threshold \(\alpha_L=0.02593083\). The asymptotically
exact *p*-value can be computed in one step:

```
# Note that the p.hmp function has been redefined to take argument L. Omitting L will issue a warning.
R = 1:nrow(gwas)
p.hmp(gwas$p[R],gwas$w[R],L)
```

```
## p.hmp
## 0.001343897
```

The combined *p*-value for chromosome 12 is useful because if
the combined *p*-value is not significant, neither is any
constituent *p*-value, after multiple testing correction, as
always. Conversely, if the combined *p*-value is significant,
there may be one or more subsets of constituent *p*-values that
are also significant. These subsets can be hunted down because another
useful property of the HMP is that the significance thresholds of these
further tests are the same no matter how many combinations of subsets of
the constituent *p*-values are tested. Specifically, for any
subset \(\mathcal{R}\) of the
*L* *p*-values, the HMP is compared against a threshold
\(\alpha_L\,w_{\mathcal{R}}\)
(equivalently, the asymptotically exact HMP is compared against a
threshold \(\alpha\,w_{\mathcal{R}}\)),
where \(w_{\mathcal{\mathcal{R}}}=\sum_{i\in\mathcal{R}}w_{i}\)
and the \(w_{i}\)s are the weights of
the individual *p*-values, constrained to sum to one. Assuming
equal weights, \(w_{i}=1/L\), meaning
that \(w_{\mathcal{R}}=\left|\mathcal{R}\right|/L\)
equals the fraction of all tests being combined. In what follows I will
mainly use the asymptotically exact *p*-values, rather than
directly interpreting the HMP.

For example, separately test the *p*-values occurring at even
and odd positions on chromosome 12:

```
R = which(gwas$pos%%2==0)
p.hmp(gwas$p[R],gwas$w[R],L)
```

```
## p.hmp
## 0.002658581
```

```
w.R = sum(gwas$w[R])
alpha*w.R
```

`## [1] 0.001200587`

```
R = which(gwas$pos%%2==1)
p.hmp(gwas$p[R],gwas$w[R],L)
```

```
## p.hmp
## 0.00230653
```

```
w.R = sum(gwas$w[R])
alpha*w.R
```

`## [1] 0.001193928`

Neither of the two tests is significant individually: for even
positions, the combined *p*-value was \(p_{\overset{\circ}{p}_{\mathcal{R}}}=0.002658581\)
which was above the significance threshold of \(\alpha\,w_{\mathcal{R}}=0.001200587\) and
for odd positions, the combined *p*-value was \(p_{\overset{\circ}{p}_{\mathcal{R}}}=0.00230653\)
which was above the significance threshold of \(\alpha\,w_{\mathcal{R}}=0.001193928\).

Comparing *p*-values with different significance thresholds
can be confusing. Instead, it is useful to calculate adjusted
*p*-values, which are compared directly to \(\alpha\), the intended strong-sense
familywise error rate. An adjusted *p*-value is simply divided by
its weight *w*. For example:

```
R = which(gwas$pos%%2==0)
p.R = p.hmp(gwas$p[R],gwas$w[R],L)
w.R = sum(gwas$w[R])
(p.R.adjust = p.R/w.R)
```

```
## p.hmp
## 0.11072
```

```
R = which(gwas$pos%%2==1)
p.R = p.hmp(gwas$p[R],gwas$w[R],L)
w.R = sum(gwas$w[R])
(p.R.adjust = p.R/w.R)
```

```
## p.hmp
## 0.09659422
```

Now it is easy to see that both tests are non-significant, assuming \(\alpha=0.05\).

Of course it makes little sense to combine *p*-values
according to whether their position is an even or odd number. Instead we
might wish to test the first 156229 SNPs on chromosome 12 separately
from the second 156228 SNPs to begin to narrow down regions of
significance.

```
R = 1:156229
p.R = p.hmp(gwas$p[R],gwas$w[R],L)
w.R = sum(gwas$w[R])
(p.R.adjust = p.R/w.R)
```

```
## p.hmp
## 1
```

```
R = 156230:312457
p.R = p.hmp(gwas$p[R],gwas$w[R],L)
w.R = sum(gwas$w[R])
(p.R.adjust = p.R/w.R)
```

```
## p.hmp
## 0.02842931
```

This is much clearer: only in the second half of the chromosome can
we reject the null hypothesis of no significant *p*-values at the
\(\alpha=0.05\) level. For the first
half of the chromosome, the adjusted *p*-value was \(p_{\overset{\circ}{p}_{\mathcal{R}}}/w_{\mathcal{R}}=1\).
By the corrected definition of asymptotically exact HMPs, the adjusted
*p*-value will not exceed 1, although in general while
*p*-values must be 1 or below, adjusted *p*-values need
not be. For the second half of the chromosome, the adjusted
*p*-value was \(p_{\overset{\circ}{p}_{\mathcal{R}}}/w_{\mathcal{R}}=0.02842931\)
which is below the standard significance threshold of \(\alpha=0.05\).

Note that it was completely irrelevant that we had already performed
tests of even- and odd-positioned SNPs: as mentioned above, the
significance thresholds are pre-determined by the \(w_{\mathcal{R}}\)’s no matter how many
subsets of *p*-values are tested and no matter in what
combinations. We can test any subset of the *p*-values without
incurring further multiple testing penalties. For example, let’s test 50
megabase windows overlaping at 10 megabase intervals. Testing
overlapping versus non-overlapping windows has no effect on the
significance thresholds, but of course it has an effect on the
resolution of our conclusions and on the computational time.

```
# Define overlapping sliding windows of 50 megabase at 10 megabase intervals
win.50M.beg = outer(0:floor(max(gwas$pos/50e6-1)),(0:4)/5,"+")*50e6
win.50M.beg = win.50M.beg[win.50M.beg+50e6<=max(gwas$pos)]
# Calculate the combined p-values for each window
system.time({
p.50M = sapply(win.50M.beg,function(beg) {
R = which(gwas$pos>=beg & gwas$pos<(beg+50e6))
p.hmp(gwas$p[R],gwas$w[R],L)
})
})
```

```
## user system elapsed
## 0.083 0.017 0.117
```

```
# Calculate sums of weights for each combined test
system.time({
w.50M = sapply(win.50M.beg,function(beg) {
R = which(gwas$pos>=beg & gwas$pos<(beg+50e6))
sum(gwas$w[R])
})
})
```

```
## user system elapsed
## 0.049 0.009 0.059
```

```
# Calculate adjusted p-value for each window
p.50M.adj = p.50M/w.50M
```

Now plot them

```
# Took a few seconds, plotting over 312k points
gwas$p.adj = gwas$p/gwas$w
plot(gwas$pos/1e6,-log10(gwas$p.adj),pch=".",xlab="Position on chromosome 12 (megabases)",
ylab="Adjusted significance (-log10 adjusted p-value)",
ylim=sort(-log10(range(gwas$p.adj,p.50M.adj,na.rm=TRUE))))
arrows(win.50M.beg/1e6,-log10(p.50M.adj),(win.50M.beg+50e6)/1e6,len=0,col="#D3C991",lwd=2)
# Superimpose the significance threshold, alpha, e.g. alpha=0.05
abline(h=-log10(0.05),col="black",lty=2)
# When using the HMP to evaluate individual p-values, the HMP threshold must be used,
# which is slightly more stringent than Bonferroni for individual tests
abline(h=-log10(qharmonicmeanp(0.05,L)),col="grey",lty=2)
# For comparison, plot the conventional GWAS threshold of 5e-8. Need to convert
# this into the adjusted p-value scale. Instead of comparing each raw p-value
# against a Bonferonni threshold of alpha/L=0.05/6524432, we would be comparing each
# against 5e-8. So the adjusted p-values p/w=p*L would be compared against
# 5e-8*L = 5e-8 * 6524432 = 0.3262216
abline(h=-log10(0.3262216),col="grey",lty=3)
```