Here, we sum up the statistics in HaDeX2. Some of the elements are discussed in other articles in appropriate places, but this article gathers this information in one place.
The propagation of uncertainty (Puchała et al. 2020; Weis 2021) is necessary when we are transforming the measured values. In HDX-MS, we repeat the measurements in triplicate in order to calculate the uncertainty of mass measurement. However, when transforming mass measurements into deuterium uptake, we need to propagate mass measurement uncertainty, using the Law of Propagation of Uncertainty (Joint Committee for Guides in Metrology 2008):
\[u_{c}(y) = \sqrt{\sum_{k} \left[ \frac{\partial y}{\partial x_{k}} u(x_{k}) \right]^2}\]
Where:
This is a generic equation used the derivatives of functions. It is created for deuterium uptake in the appropriate forms (as the equations differ based on the parameters of calculations) and described in detail in the article vignette("datafiles").
HaDeX offers the hybrid testing procedure (Hageman and Weis 2019) because it addressed the trade-off between false positives and loss of power in the analysis of HDX-MS data. The hybrid test applies a two-stage decision rule to identify significantly different deuterium uptakes. First, the observed difference in deuterium uptake must exceed a globally estimated uncertainty threshold derived from pooled experimental standard deviations, ensuring that the effect is larger than expected measurement error. Only differences passing this magnitude-based filter are then evaluated using a replicate-aware hypothesis test, implemented as a Welch’s t-test, to assess whether the observed difference is statistically supported given the individual variances. A difference is considered significant only when both criteria are satisfied simultaneously.
Although the hybrid test offers superior statistical power compared to more classical statistical approaches, it is more data-intensive and therefore can be performed only when the user provides the experiment with at least three experimental replicates.
This test is done for the time points chosen for a given plot e.q. for the volcano plot, where presenting multiple time points of measurement, we take the values from all of the presented time points. However, for Woods Plot we only take into account only one time point - presented on the plot.
Houde interval (Houde, Berkowitz, and Engen 2011) is calculated based on the uncertainty of the measurement - or, more precisely, the propagated uncertainty of the deuterium uptake (in the same form as values presented on the plot). As described in the equation:
\[interval = \frac{\sum_i^n u_c(du_n)}{i}*tvalue(k) \] where:
\(tvalue\) is calculated as follows, using R-function qt:
\[tvalue = qt(c(alpha/2, 1-alpha/2), df = k-1)\]
where the degree of freedom is the number of replicates minus one, and alpha is \(1-\textrm{confidence limit}\) for the desired confidence level (usually 0.98).
Basically, we take the mean uncertainty of deuterium uptake and widen this range by the appropriate value to get an interval. Values under the interval are too small and may be mistaken with the uncertainty. We are not interested in them.
In order to use student t-test, we need at least three values from each group - in the case of the differential analysis - at least three replicate values at given time for each biological state.
This test shows us if the values are from two different distributions (desired option) or from one - and are the same. We are not interested in the latter case.
We use the unpaired Student’s t-test to calculate p-value. The null hypothesis is that this two distributions are the same. If calculated p-value exceeded limit set for chosen confidence limit, we reject the null hypothesis and assume that the distributions are different.
To calculate p-value we use base R-function t.test
t.test(x = st_1, y = st_2,
paired = FALSE,
alternative = "two.sided",
conf.level = confidence_level)$p.valuewhere \(st_1\) is a set of values from the first state, and \(st_2\) from the second.
If this option is chosen, we adjust the p-value using appropriate adjustment method (with three options: ‘none’, ‘BH’ and ‘bonferroni’):
p-value is usually presented in the form of \(-log(\textrm{p-value})\), e.g. on the volcano plot.