MPT Modeling: Basics, Identifiability, Power

Florian Wickelmaier

2024-10-11

This vignette illustrates some basic use cases of the mpt package, how to check for model identifiability, and how to perform simulation-based power analysis.

library(mpt)

The examples have been inspired by

Schmidt, O., Erdfelder, E., & Heck, D. W. (2023). How to develop, test, and extend multinomial processing tree models: A tutorial. Psychological Methods. https://doi.org/10.1037/met0000561

Basics

Age differences in episodic memory

Bayen (1990) presented 40 younger and 40 older adults with a list of 50 words to be learned. The list consisted of 20 semantic word pairs and 10 singleton words. In a later memory test, participants freely recalled the presented words. For pairs, responses were classified into four categories: both words in a pair are recalled adjacently (E1) or non-adjacently (E2), one word in a pair is recalled (E3), neither word in a pair is recalled (E4); for singletons, into two categories: word recalled (F1), word not recalled (F2).

The individual recall frequencies are available in Schmidt, Erdfelder, and Heck (2023), see ?agememory. Displayed below are the aggregate recall frequencies for pairs and singletons by age group and lag condition.

dat <- read.table(header = TRUE, text = "
  lag age resp freq
    0 young E1 90     # adjacent recall: apple, banana
    0 young E2 14     # non-adjacent recall: apple, ..., banana
    0 young E3 84     # single recall: only apple or only banana
    0 young E4 212    # non-recall: neither apple nor banana
  n/a young F1 102    # recall
  n/a young F2 298    # non-recall
    0 old   E1 42
    0 old   E2 5
    0 old   E3 63
    0 old   E4 290
  n/a old   F1 64
  n/a old   F2 336
   15 young E1 67
   15 young E2 18
   15 young E3 123
   15 young E4 192
   15 old   E1 30
   15 old   E2 13
   15 old   E3 95
   15 old   E4 262
")
dat$age <- factor(dat$age, levels = c("young", "old"))

Specifying and fitting an MPT model

Specifying and fitting an MPT model involves two steps: mptspec() specifies the model, mpt() fits it to the data.

For example, the storage-retrieval pair-clustering model for a single condition consists of two multinomial category systems (trees): one for pairs, one for singletons. The dot notation in mptspec() can be used to identify the trees.

# Pairs                    Singletons
#      
#     /r -- E1
#    c
#   / \1-r -- E4             a -- F1       c: cluster storage
#  /                        /              r: cluster retrieval
#  \       /u -- E2         \              u: non-clustered storage/retrieval
#   \     u                  1-a -- F2     a: singleton storage/retrieval
#    \   / \1-u -- E3
#     1-c
#        \   /u -- E3
#         1-u
#            \1-u -- E4
mptspec(                                                 # specify model
  E.1 = c*r,
  E.2 = (1 - c)*u^2,                                     # dot notation:
  E.3 = 2 * (1 - c)*u*(1 - u),                           #   tree.response
  E.4 = c*(1 - r) + (1 - c)*(1 - u)^2,
  F.1 = a,
  F.2 = 1 - a,
  .restr = list(a = u)
) |>
  mpt(data = dat[dat$lag != 15 & dat$age == "young", ])  # fit to data
## 
## Multinomial processing tree (MPT) models
## 
## Parameter estimates:
##      c       r       u  
## 0.4481  0.5021  0.2543  
## 
## Goodness of fit (2 log likelihood ratio):
##  G2(1) = 0.007307, p = 0.9319

Alternatively, some model structures are pre-specified and can be called by passing a string label, see ?mptspec.

m <- mpt(mptspec("SR"),
         data = dat[dat$lag != 15 & dat$age == "young", ])

Model output

Standard extractor functions work on the model object.

print(m)
## 
## Multinomial processing tree (MPT) models
## 
## Parameter estimates:
##      c       r       u  
## 0.4481  0.5021  0.2543  
## 
## Goodness of fit (2 log likelihood ratio):
##  G2(1) = 0.007307, p = 0.9319
summary(m)
## 
## Coefficients:
##   Estimate Logit Estim. Std. Error z value Pr(>|z|)    
## c   0.4481    -0.208231   0.247619  -0.841    0.400    
## r   0.5021     0.008348   0.291927   0.029    0.977    
## u   0.2543    -1.075762   0.106569 -10.094   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Likelihood ratio G2: 0.007307 on 1 df, p-value: 0.9319 
## Pearson X2: 0.007273,    AIC: 28.642
## Number of trees: 2
logLik(m)
## 'log Lik.' -11.32105 (df=3)
AIC(m)
## [1] 28.64211
BIC(m)
## [1] 26.80099
coef(m)
##         c         r         u 
## 0.4481295 0.5020870 0.2543089
confint(m, logit = FALSE)
##       2.5 %    97.5 %
## c 0.3281043 0.5681547
## r 0.3590481 0.6451259
## u 0.2146992 0.2939186

Exercise: one-high-threshold model

  1. Use mptspec() to specify the one-high-threshold model:
# Target                  Distractor
#
#   r -- Hit
#  /             } 55      b -- FA    45
# /    b -- Hit           /                  r: recognition
# \   /                   \                  b: bias
#  1-r                     1-b -- CR 765
#     \
#      1-b -- Miss 35
  1. Use mpt() to fit the model to the response frequencies (taken from Bröder and Schütz 2009, see ?recogROC).

Detour: numerical optimization by hand

There are two parameter estimation methods available in mpt(). One is an application of the EM algorithm (Hu 1999). The other (the default) uses optim()’s BFGS method with analytic gradients.

A simplified version of the latter is illustrated here. It requires two parts. First, a function that returns the negative log-likelihood of the MPT model.

nll <- function(theta, data, treeid) {   # negative log-likelihood
  c <- theta[1]
  r <- theta[2]
  u <- theta[3]
  a <- theta[3]  # a = u
  pcat <- c(
    E.1 = c*r,
    E.2 = (1 - c)*u^2,
    E.3 = 2 * (1 - c)*u*(1 - u),
    E.4 = c*(1 - r) + (1 - c)*(1 - u)^2,
    F.1 = a,
    F.2 = 1 - a
  )
  -sum(
    dmultinom(data[treeid == 1], size = sum(data[treeid == 1]),
              prob = pcat[treeid == 1], log = TRUE),
    dmultinom(data[treeid == 2], size = sum(data[treeid == 2]),
              prob = pcat[treeid == 2], log = TRUE)
  )
}

Second, a call to optim() passing the function as an argument.

optim(par = c(.5, .5, .5),                    # starting values
       fn = nll,                              # function to be minimized
     data = dat[dat$lag != 15 &
                dat$age == "young", "freq"],  # arguments passed to fn
   treeid = c(1, 1, 1, 1, 2, 2),
   method = "L-BFGS-B",                       # algorithm
    lower = rep(.001, 3),                     # boundaries
    upper = rep(.999, 3),
  hessian = TRUE)
## $par
## [1] 0.4481327 0.5020844 0.2543100
## 
## $value
## [1] 11.32105
## 
## $counts
## function gradient 
##       12       12 
## 
## $convergence
## [1] 0
## 
## $message
## [1] "CONVERGENCE: REL_REDUCTION_OF_F <= FACTR*EPSMCH"
## 
## $hessian
##           [,1]     [,2]      [,3]
## [1,]  772.4919 419.6633 -560.4388
## [2,]  419.6633 508.5846  278.3646
## [3,] -560.4388 278.3646 4065.7657

Exercise: numerical optimization by hand

  1. Write an R function that returns the negative log-likelihood of the one-high-threshold model.

  2. Use optim() to estimate the parameters for the example data from before.

  3. Compare the results to the output of mpt().

Model comparison

Model comparison is a two-step procedure. First, specify and fit a restricted model, where the restriction corresponds to the hypothesis being tested (here, H0: \(c = 0.5\)).

m0 <-
  update(m$spec, .restr = list(c = 0.5)) |>  # specify restriction(s)
  mpt(data = m$y) |>                         # refit
  print()
## 
## Multinomial processing tree (MPT) models
## 
## Parameter estimates:
##      r       u  
## 0.4591  0.2649  
## 
## Goodness of fit (2 log likelihood ratio):
##  G2(2) = 0.7907, p = 0.6735

Second, test the hypothesis by comparing the likelihoods of the restricted and the unrestricted models.

anova(m0, m)
## Analysis of Deviance Table
## 
## Model 1: m0
## Model 2: m
##   Resid. Df Resid. Dev Df Deviance Pr(>Chi)
## 1         2    0.79066                     
## 2         1    0.00731  1  0.78335   0.3761

Exercise: age-group model

  1. Fit a storage-retrieval pair-clustering model that jointly accounts for young and old participants (ignoring the lag-15 pairs).

  2. Test the age effect on

  1. Discuss the results.

Reparameterization: order constraints

Consider the storage-retrieval pair-clustering model for young and old participants in its original parameterization.

m1 <- mptspec("SR", .replicates = 2) |>
  mpt(data = dat[dat$lag != 15, ]) |>
  print()
## 
## Multinomial processing tree (MPT) models
## 
## Parameter estimates:
##     c1      r1      u1      c2      r2      u2  
## 0.4481  0.5021  0.2543  0.4156  0.2526  0.1579  
## 
## Goodness of fit (2 log likelihood ratio):
##  G2(2) = 0.1554, p = 0.9252

An alternative parameterization substitutes \(c_{old} = s \cdot c_{young}\), where \(s \in (0, 1)\) is a shrinkage parameter that enforces the order constraint \(c_{old} \leq c_{young}\).

update(m1$spec, .restr = list(c2 = s * c1)) |>
  mpt(data = m1$y)
## 
## Multinomial processing tree (MPT) models
## 
## Parameter estimates:
##     c1      r1      u1       s      r2      u2  
## 0.4481  0.5021  0.2543  0.9275  0.2526  0.1579  
## 
## Goodness of fit (2 log likelihood ratio):
##  G2(2) = 0.1554, p = 0.9252

Testing interactions

The idea of shrinkage parameters is useful for testing interaction hypotheses:

H0: Age decline in \(c\) and \(r\) parameters is equally strong (no interaction between memory process and age), \(s_c = s_r\).

m5 <-
  update(m1$spec, .restr = list(c2 = s_c * c1,      # two shrinkage pars
                                r2 = s_r * r1)) |>
  mpt(data = m1$y)
m6 <-
  update(m5$spec, .restr = list(s_c = s_r)) |>      # single shrinkage par
  mpt(data = m1$y)
anova(m6, m5)
## Analysis of Deviance Table
## 
## Model 1: m6
## Model 2: m5
##   Resid. Df Resid. Dev Df Deviance Pr(>Chi)
## 1         3    1.29956                     
## 2         2    0.15539  1   1.1442   0.2848

Exercise: age-lag interaction

  1. Fit a storage-retrieval pair-clustering model that jointly accounts for all observations:

The resulting model has four sets of \(c\), \(r\), and \(u\) parameters (but be careful: singletons have no lag!).

  1. Refine the previous model such that it includes only a single \(u_{young}\) and a single \(u_{old}\) parameter.

  2. Reparameterize the refined model to account for an age-decline effect on the \(r\) parameters. Include

  1. Test whether age decline is equally strong in both lag conditions. Discuss the results.

Identifiability

Jacobian matrix

A model is locally identifiable if its prediction function in the neighborhood of some parameter values is one to one, so that its Jacobian (the matrix of partial derivatives) will have full rank.

In the storage-retrieval pair-clustering model, one might assume that processing of non-clustered words in a pair differs for the first and the second word. This implies two different parameters (\(u_1\) and \(u_2\)) and thus renders the model non-identifiable.

s1 <- mptspec(                         # not identifiable: 5 parameters
  E.1 = c * r,
  E.2 = (1 - c) * u1 * u2,
  E.3 = (1 - c) * u1 * (1 - u2) + (1 - c) * u2 * (1 - u1),
  E.4 = c * (1 - r) + (1 - c) * (1 - u1) * (1 - u2),
  F.1 = a,
  F.2 = 1 - a
)

Its Jacobian at random parameter location is rank deficient: the rank is less than its number of parameters.

runif(5) |> s1$par2deriv() |> _$deriv |> print() |> qr() |> _$rank
##           E.1         E.2        E.3        E.4 F.1 F.2
## c  0.04732983 -0.10270071 -0.5632964  0.6186673   0   0
## r  0.46842641  0.00000000  0.0000000 -0.4684264   0   0
## u1 0.00000000  0.31710280 -0.1026320 -0.2144708   0   0
## u2 0.00000000  0.09151666  0.3485403 -0.4400569   0   0
## a  0.00000000  0.00000000  0.0000000  0.0000000   1  -1
## [1] 4

Setting \(u_1 = u_2\) makes the model identifiable. Its Jacobian has full rank.

s2 <- update(s1, .restr = list(u1 = u, u2 = u))  # make it identifiable
runif(4) |> s2$par2deriv() |> _$deriv |> print() |> qr() |> _$rank
##         E.1         E.2        E.3        E.4 F.1 F.2
## c 0.7256432 -0.01705329 -0.2270699 -0.4815200   0   0
## r 0.3567334  0.00000000  0.0000000 -0.3567334   0   0
## u 0.0000000  0.16800613  0.9505210 -1.1185271   0   0
## a 0.0000000  0.00000000  0.0000000  0.0000000   1  -1
## [1] 4

Simulated identifiability

Another quick check of local identifiability is to re-estimate the model from random starting values. If the model converges to the same maximum but has non-unique estimates, it possibly is not identifiable.

suppressWarnings(replicate(10,
  mpt(s1, data = dat[dat$lag != 15 & dat$age == "young", ],
      start = runif(5)) |>
  coef()
)) |> t()  # not identifiable: parameter trade-offs
##               c         r        u1        u2         a
##  [1,] 0.7423224 0.3031028 0.9425163 0.1441128 0.2550000
##  [2,] 0.5529095 0.4069382 0.4537418 0.1725297 0.2550000
##  [3,] 0.5834305 0.3856500 0.5061639 0.1659929 0.2550000
##  [4,] 0.6678801 0.3368867 0.6904356 0.1526337 0.2550000
##  [5,] 0.4422502 0.5087617 0.2669200 0.2350972 0.2550000
##  [6,] 0.6734033 0.3341237 0.7054056 0.1519208 0.2550000
##  [7,] 0.7168590 0.3138693 0.8421181 0.1467886 0.2549999
##  [8,] 0.6908334 0.3256936 0.7558943 0.1497668 0.2550001
##  [9,] 0.7507914 0.2996838 0.9802879 0.1432687 0.2550000
## [10,] 0.6101037 0.3687898 0.5569680 0.1611717 0.2550000
replicate(10,
  mpt(s2, data = dat[dat$lag != 15 & dat$age == "young", ],
      start = runif(4)) |>
  coef()
) |> t()   # identifiable: unique estimates
##               c         r         u     a
##  [1,] 0.4400000 0.5113636 0.2500000 0.255
##  [2,] 0.4400000 0.5113636 0.2500000 0.255
##  [3,] 0.4399999 0.5113636 0.2499999 0.255
##  [4,] 0.4400000 0.5113636 0.2500000 0.255
##  [5,] 0.4400000 0.5113637 0.2500000 0.255
##  [6,] 0.4400000 0.5113636 0.2500000 0.255
##  [7,] 0.4399993 0.5113643 0.2499996 0.255
##  [8,] 0.4400000 0.5113636 0.2500000 0.255
##  [9,] 0.4400000 0.5113636 0.2500000 0.255
## [10,] 0.4400000 0.5113636 0.2500000 0.255

Exercise: identifiability

  1. Specify a one-high-threshold model that is limited to target items only and check its identifiability by
  1. Fix the bias parameter to, e.g., \(b = 0.2\) and redo the identifiability checks. What do you notice?

Power analysis

Simulation-based power analysis involves four steps (e.g., Wickelmaier 2022): specifying a model that includes the effect of interest, generating data from the model, testing the null hypothesis, repeating data generation and testing. The proportion of times the test turns out significant is an estimate of its power.

Check simulation setup: parameter recovery

Set up a data generator.

s <- mptspec(
  E.1 = c * r,
  E.2 = (1 - c) * u^2,
  E.3 = 2 * (1 - c) * u * (1 - u),
  E.4 = c * (1 - r) + (1 - c) * (1 - u)^2,
  F.1 = a,
  F.2 = 1 - a
)
s$par     # before you use par2prob(), carefully check position of parameters!
##  c  r  u  a 
## NA NA NA NA
s$par2prob(c(c = 0.5, r = 0.5, u = 0.4, a = 0.6))   # evaluate model equations
##  E.1  E.2  E.3  E.4  F.1  F.2 
## 0.25 0.08 0.24 0.43 0.60 0.40
dataGen <- function(nn, d) {
  structure(list(                                            # stub mpt object
    treeid = s$treeid,
         n = setNames((nn * c(2, 1)/3)[s$treeid], s$treeid), # 2:1 ratio
      pcat = s$par2prob(c(c = 0.5, r = 0.5,
                          u = 0.5 - d/2, a = 0.5 + d/2))
  ), class = "mpt") |>
    simulate()
}
dataGen(960, 0.2)
## E.1 E.2 E.3 E.4 F.1 F.2 
## 159  52 138 291 194 126

Try to recover the parameter values from the generated responses.

replicate(20, dataGen(960, 0.2) |> mpt(s, data = _) |> coef()) |>
  t() |>
  boxplot(horizontal = TRUE)

Exercise: parameter recovery

Run your own parameter recovery:

Did you succeed in recovering the parameters?

Power simulation: goodness-of-fit test

The first example is a power analysis of the goodness-of-fit test of the storage-retrieval pair-clustering model. The model assumes identical processing of non-clustered pairs and singletons, so H0: \(a = u\).

Set up a function that calls the data generator, performs the test, and returns its p value.

testFun <- function(nn, d) {
  y <- dataGen(nn, d)                            # generate data with effect
  m1 <- mpt(s, y)
  m2 <- mpt(update(s, .restr = list(a = u)), y)
  anova(m2, m1)$"Pr(>Chi)"[2]                    # test H0, return p value
}

Repeating the data generation and testing steps for each condition may take some time, depending on the model, the number of conditions, and the number of replications.

# pwrsim1 <- expand.grid(d = seq(0, 0.5, 0.1), n = 30 * 2^(0:5))
# 
# system.time(  # about 20 min
#   pwrsim1$pval <-
#     mapply(function(nn, d) replicate(500, testFun(nn, d)),
#            nn = pwrsim1$n, d = pwrsim1$d, SIMPLIFY = FALSE)
# )
# pwrsim1$pwr <- sapply(pwrsim1$pval, function(p) mean(p < .05))
# }

To save a bit of time, the above simulation has been pre-run, see ?pwrsim, and results are reused here for illustration.

data(pwrsim)  # provides pwrsim1 and pwrsim2
if(requireNamespace("lattice")) {
  lattice::xyplot(pwr ~ d, pwrsim1, groups = n, type = c("g", "b"),
                  auto.key = list(space = "right"),
                  xlab = "Parameter difference a - u",
                  ylab = "Simulated power")
}
## Lade nötigen Namensraum: lattice

Power simulation: age differences in retrieval

The second example is a power analysis of a test of age differences in retrieval. It requires a two-group model, so the hypothesis \(r_{young} = r_{old}\) can be tested.

s <- mptspec("SR", .replicates = 2)

dataGen <- function(nn, d) {
  structure(list(
    treeid = s$treeid,
         n = setNames((nn/2 * c(2, 1, 2, 1)/3)[s$treeid], s$treeid),
      pcat = s$par2prob(c(c1 = 0.5, r1 = 0.4 + d/2, u1 = 0.3,   # young
                          c2 = 0.5, r2 = 0.4 - d/2, u2 = 0.3))  # old
  ), class = "mpt") |>
    simulate(m)
}

testFun <- function(nn, d) {
  y <- dataGen(nn, d)
  m1 <- mpt(s, y)
  m2 <- mpt(update(s, .restr = list(r1 = r2)), y)
  anova(m2, m1)$"Pr(>Chi)"[2]
}

# pwrsim2 <- expand.grid(d = seq(0, 0.4, 0.1), n = 120 * 2^(0:2))

if(requireNamespace("lattice")) {
  lattice::xyplot(pwr ~ d, pwrsim2, groups = n, type = c("g", "b"),
                  auto.key = list(space = "right"),
                  xlab = "Parameter difference r1 - r2",
                  ylab = "Simulated power")
}

Exercise: power simulation

In the storage-retrieval pair-clustering model, what sample size is required to detect a parameter difference \(r_{young} - r_{old} = 0.4\) with a power of about 0.8?

Simulate power for

References

Bayen, U. J. 1990. “Zur Lokalisation von Altersdifferenzen im episodischen Gedächtnis Erwachsener: Eine Querschnittsuntersuchung auf der Basis eines mathematischen Modells.” Berichte aus dem Psychologischen Institut der Universität Bonn 16: 1–125.
Bröder, A., and J. Schütz. 2009. “Recognition ROCs Are Curvilinear—or Are They? On Premature Arguments Against the Two-High-Threshold Model of Recognition.” Journal of Experimental Psychology: Learning, Memory, and Cognition 35: 587–606. https://doi.org/10.1037/a0024783.
Hu, X. 1999. “Multinomial Processing Tree Models: An Implementation.” Behavior Research Methods, Instruments, & Computers 31: 689–95. https://doi.org/10.3758/BF03200747.
Schmidt, O., E. Erdfelder, and D. W. Heck. 2023. “How to Develop, Test, and Extend Multinomial Processing Tree Models: A Tutorial.” Psychological Methods. https://doi.org/10.1037/met0000561.
Wickelmaier, F. 2022. “Simulating the Power of Statistical Tests: A Collection of R Examples.” ArXiv. https://doi.org/10.48550/arXiv.2110.09836.