`CIPerm`

with ‘naive’
approachWe run simple timing comparisons to show how our package’s approach with Nguyen (2009) compares against a “naive” grid-based search approach to confidence intervals from permutation methods.

We can use `CIPerm`

’s `cint(dset(x, y))`

directly, and it already wins against the naive for-loop or array-based
approaches. But since `dset`

and `cint`

both have
extra cruft built into them, for an even “fairer” comparison we recreate
the core of `cint(dset())`

here without all the extra
variables and without passing copies of dataframes.

First we re-define the internal function
`roundOrCeiling()`

, then we define a streamlined
`cint.nguyen()`

:

```
#### Define a streamlined function for Nguyen approach ####
# 1. use round(siglevel*num), if siglevel*num is basically an integer,
# to within the default numerical tolerance of all.equal();
# 2. or otherwise use ceiling(siglevel*num) instead.
<- function(x) {
roundOrCeiling ifelse(isTRUE(all.equal(round(x), x)), # is x==round(x) to numerical tolerance?
round(x),
ceiling(x))
}
<- function(x, y, nmc = 10000, conf.level = 0.95) {
cint.nguyen # Two-tailed CIs only, for now
<- 1 - conf.level
sig
# Code copied/modified from within CIPerm::dset()
<- length(x)
n <- length(y)
m <- n + m
N <- choose(N, n) # number of possible combinations
num # Form a matrix where each column contains indices in new "group1" for that comb or perm
if(nmc == 0 | num <= nmc) { # take all possible combinations
<- utils::combn(1:N, n)
dcombn1 else { # use Monte Carlo sample of permutations, not all possible combinations
} <- replicate(nmc, sample(N, n))
dcombn1 1] <- 1:n # force the 1st "combination" to be original data order
dcombn1[,<- nmc
num
}# Form the equivalent matrix for indices in new "group2"
<- apply(dcombn1, 2, function(x) setdiff(1:N, x))
dcombn2
# Form the corresponding matrices of data values, not data indices
<- c(x, y)
combined <- matrix(combined[dcombn1], nrow = n)
group1_perm <- matrix(combined[dcombn2], nrow = m)
group2_perm # For each comb or perm, compute difference in group means, k, and w_{k,d}
<- colMeans(group1_perm) - colMeans(group2_perm)
diffmean <- colSums(matrix(dcombn1 %in% ((n+1):N),
k nrow = n))
<- (diffmean[1] - diffmean) / (k * (1/n + 1/m))
wkd
# Code copied/modified from within CIPerm::cint()
# Sort wkd values and find desired quantiles
<- sort(wkd, decreasing = FALSE, na.last = FALSE)
w.i <- (1 - conf.level)/2
siglevel <- roundOrCeiling(siglevel*num) - 1
index # When dset's nmc leads us to use Monte Carlo sims,
# we may get some permutations equivalent to orig data
# i.e. we may get SEVERAL k=0 and therefore several w.i=NaN.
<- sum(k == 0)
nk0 # Start counting from (1+nk0)'th element of w.i
# (not the 1st, which will always be 'NaN' since k[1] is 0)
<- w.i[1 + nk0 + index]
LB <- w.i[(num - index)]
UB
<- c(LB, UB)
CI <- 1 - (2*(index+1) / num)
conf.achieved message(paste0("Achieved conf. level: 1-2*(", index+1, "/", num, ")"))
return(list(conf.int = CI,
conf.level.achieved = conf.achieved))
}
```

Next, we define an equivalent function to implement the “naive” approach:

- Choose a grid of possible values of delta = mu_X-mu_Y to try
- For each delta…
- subtract delta off of the x’s,
- and carry out a test of H_0: mu_X=mu_Y on the resulting data

- Our confidence interval consists of the range of delta values for which H_0 is NOT rejected

```
#### Define a function for "naive" approach with for-loop ####
<- function(x, y, deltas,
cint.naive.forloop nmc = 10000,
conf.level = 0.95) {
# Two-tailed CIs only, for now
<- 1 - conf.level
sig <- rep(1, length(deltas))
pvalmeans
# Code copied/modified from within CIPerm::dset()
<- length(x)
n <- length(y)
m <- n + m
N <- choose(N, n) # number of possible combinations
num # Form a matrix where each column contains indices in new "group1" for that comb or perm
if(nmc == 0 | num <= nmc) { # take all possible combinations
<- utils::combn(1:N, n)
dcombn1 else { # use Monte Carlo sample of permutations, not all possible combinations
} <- replicate(nmc, sample(N, n))
dcombn1 1] <- 1:n # force the 1st "combination" to be original data order
dcombn1[,<- nmc
num
}# Form the equivalent matrix for indices in new "group2"
<- apply(dcombn1, 2, function(x) setdiff(1:N, x))
dcombn2
for(dd in 1:length(deltas)) {
<- x - deltas[dd]
xtmp
# Code copied/modified from within CIPerm::dset()
<- c(xtmp, y)
combined # Form the corresponding matrices of data values, not data indices
<- matrix(combined[dcombn1], nrow = n)
group1_perm <- matrix(combined[dcombn2], nrow = m)
group2_perm # For each comb or perm, compute difference in group means
<- colMeans(group1_perm) - colMeans(group2_perm)
diffmean
# Code copied/modified from within CIPerm::pval()
<- sum(abs(diffmean - mean(diffmean)) >= abs(diffmean[1] - mean(diffmean)))/length(diffmean)
pvalmeans[dd]
}print( range(deltas[which(pvalmeans >= sig)]) )
return(list(cint = range(deltas[which(pvalmeans >= sig)]),
pvalmeans = pvalmeans,
deltas = deltas))
}
```

We also wrote another “naive” function that avoids for-loops and
takes a “vectorized” approach instead. We created large arrays: each
permutation requires a matrix, and the 3rd array dimension is over
permutations. Then `apply`

and `colSums`

work very
quickly to carry out all the per-permutations steps on this array at
once. However, we found that unless datasets were trivially small,
creating and storing the array took a lot of time and memory, and it
ended up far slower than the for-loop approach. Consequently we do not
report this function or its results here, although curious readers can
find it hidden in the vignette’s .Rmd file.

When we compare the timings, `cint.naive()`

will need to
have a reasonable search grid for values of `delta`

. On this
problem, we happen to know the correct CI endpoints are the integers
`(-21, 3)`

, so we “cheat” by using `(-22):4`

as
the grid for `cint.naive()`

. Of course in practice, if you
had only the naive approach, you would probably have to try a wider
grid, since you wouldn’t already know the answer.

```
#### Speed tests on Nguyen's tiny dataset ####
# Use 1st tiny dataset from Nguyen's paper
library(CIPerm)
<- c(19, 22, 25, 26)
x <- c(23, 33, 40)
y
# Actual CIPerm package's approach:
system.time({
print( cint(dset(x, y), conf.level = 0.95, tail = "Two") )
})#> Achieved conf. level: 1-2*(1/35)
#> $conf.int
#> [1] -21 3
#>
#> $conf.level.achieved
#> [1] 0.9428571
#> user system elapsed
#> 0 0 0
# Streamlined version of CIPerm approach:
system.time({
print( cint.nguyen(x, y, conf.level = 0.95) )
})#> Achieved conf. level: 1-2*(1/35)
#> $conf.int
#> [1] -21 3
#>
#> $conf.level.achieved
#> [1] 0.9428571
#> user system elapsed
#> 0.05 0.00 0.04
# Naive approach with for-loops:
<- ((-22):4)
deltas system.time({
<- cint.naive.forloop(x, y, deltas)$pvalmeans
pvalmeans
})#> [1] -21 3
#> user system elapsed
#> 0.05 0.00 0.05
# Sanity check to debug `cint.naive`:
# are the p-vals always higher when closer to middle of CI?
# Are they always above 0.05 inside and below it outside CI?
cbind(deltas, pvalmeans)
#> deltas pvalmeans
#> [1,] -22 0.02857143
#> [2,] -21 0.05714286
#> [3,] -20 0.05714286
#> [4,] -19 0.05714286
#> [5,] -18 0.14285714
#> [6,] -17 0.14285714
#> [7,] -16 0.20000000
#> [8,] -15 0.25714286
#> [9,] -14 0.37142857
#> [10,] -13 0.54285714
#> [11,] -12 0.60000000
#> [12,] -11 0.80000000
#> [13,] -10 0.85714286
#> [14,] -9 1.00000000
#> [15,] -8 0.88571429
#> [16,] -7 0.74285714
#> [17,] -6 0.57142857
#> [18,] -5 0.45714286
#> [19,] -4 0.34285714
#> [20,] -3 0.25714286
#> [21,] -2 0.22857143
#> [22,] -1 0.14285714
#> [23,] 0 0.11428571
#> [24,] 1 0.08571429
#> [25,] 2 0.08571429
#> [26,] 3 0.05714286
#> [27,] 4 0.02857143
plot(deltas, pvalmeans)
abline(h = 0.05)
abline(v = c(-21, 3), lty = 2)
```

`# Yes, it's as it should be :)`

Above we see timings for the original
`CIPerm::cint(dset))`

, its streamlined
version`cint.nguyen()`

, and the naive equivalent
`cint.naive()`

.

In the case of a tiny dataset like this, all 3 approaches take almost no
time. So it can be a tossup as to which one is fastest on this
example.

Try a slightly larger dataset, where the total number of permutations
is above the default `nmc=10000`

but still manageable on a
laptop.

Again, `cint.naive()`

will need to have a reasonable
search grid for values of `delta`

. This time again we will
set up a grid that just barely covers the correct endpoints from
`cint.nguyen()`

, and again we’ll try to keep it at around 20
to 25 grid points in all. Then we’ll try timing it again with a slightly
finer grid.

```
choose(18, 9) ## 48620
#> [1] 48620
set.seed(20220528)
<- rnorm(9, mean = 0, sd = 1)
x <- rnorm(9, mean = 1, sd = 1)
y
# Actual CIPerm approach
# (with nmc = 0,
# so that we use ALL of the choose(N,n) combinations
# instead of a MC sample of permutations)
system.time({
print( cint(dset(x, y, nmc = 0),
conf.level = 0.95, tail = "Two") )
})#> Achieved conf. level: 1-2*(1216/48620)
#> $conf.int
#> [1] -2.06178174 0.07679298
#>
#> $conf.level.achieved
#> [1] 0.9499794
#> user system elapsed
#> 0.90 0.03 0.93
# Streamlined version of CIPerm approach:
system.time({
print( cint.nguyen(x, y, nmc = 0, conf.level = 0.95) )
})#> Achieved conf. level: 1-2*(1216/48620)
#> $conf.int
#> [1] -2.06178174 0.07679298
#>
#> $conf.level.achieved
#> [1] 0.9499794
#> user system elapsed
#> 0.76 0.04 0.81
# Coarser grid
<- ((-21):(1))/10 # grid steps of 0.1
deltas
# Naive with for-loops:
system.time({
<- cint.naive.forloop(x, y, deltas, nmc = 0)$pvalmeans
pvalmeans
})#> [1] -2 0
#> user system elapsed
#> 1.54 0.29 1.90
# Finer grid
<- ((-21*2):(1*2))/20 # grid steps of 0.05
deltas
# Naive with for-loops:
system.time({
<- cint.naive.forloop(x, y, deltas, nmc = 0)$pvalmeans
pvalmeans
})#> [1] -2.05 0.05
#> user system elapsed
#> 3.19 0.50 3.96
```

Now, `CIPerm::cint(dset())`

and the
`cint.nguyen`

approach take about the same amount of time,
and they are noticeably faster than the `cint.naive`

approach. (And even more so when we use a finer search grid.) That
happens even with the “cheat” of using `CIPerm`

first to find
the right CI limits so that our naive search grid isn’t too wide, just
stepping over the minimal required range with a reasonable
grid-coarseness.

(For the original data where CI = (-21, 3), we stepped from -22 to 4 in integers. For latest data where CI = (-2.06, 0.08), we stepped from -2.1 to 0.1 in units of 0.1 and then 0.05.)

In practice there are probably cleverer ways for the naive method to choose grid discreteness and endpoints… but still, this seems like a more-than-fair chance for the naive approach.

Try an even larger example, where we’re OK with a smallish total
number of permutations (eg `nmc=10000`

), but the dataset
itself is “huge”, so that each individual permutation takes a longer
time.

We still don’t make it all that large, since we want this vignette to knit in a reasonable time. But we’ve seen similar results when we made even-larger datasets that took longer to run.

Once again, for `cint.naive()`

we will set up a search
grid that just barely covers the correct endpoints from
`cint.nguyen()`

, and again we’ll try to keep it at around 20
to 25 grid points in all.

```
#### Speed tests on much larger dataset ####
set.seed(20220528)
<- rnorm(5e3, mean = 0, sd = 1)
x <- rnorm(5e3, mean = 1, sd = 1)
y
# Actual CIPerm approach
# (with nmc = 2000 << choose(N,n),
# so it only takes a MC sample of permutations
# instead of running all possible combinations)
system.time({
print( cint(dset(x, y, nmc = 2e3),
conf.level = 0.95, tail = "Two") )
})#> Achieved conf. level: 1-2*(50/2000)
#> $conf.int
#> [1] -1.0227345 -0.9437299
#>
#> $conf.level.achieved
#> [1] 0.95
#> user system elapsed
#> 8.76 1.63 11.23
# Streamlined version of CIPerm approach:
system.time({
print( cint.nguyen(x, y, nmc = 2e3, conf.level = 0.95) )
})#> Achieved conf. level: 1-2*(50/2000)
#> $conf.int
#> [1] -1.0232198 -0.9435962
#>
#> $conf.level.achieved
#> [1] 0.95
#> user system elapsed
#> 8.84 1.54 11.04
# Grid of around 20ish steps
<- ((-11*10):(-9*10))/100 # grid steps of 0.01
deltas
# Naive with for-loops:
system.time({
<- cint.naive.forloop(x, y, deltas, nmc = 2e3)$pvalmeans
pvalmeans
})#> [1] -1.02 -0.95
#> user system elapsed
#> 13.83 6.75 21.71
```

Here we finally see that the overhead built into
`CIPerm::cint(dset())`

makes it slightly slower than the
streamlined `cint.nguyen()`

. However, both are
*substantially* faster than `cint.naive()`

—they run in
less than half the time of the naive approach.

`bench::mark()`

and `profvis`

The results above are from one run per method on each example dataset.

Using the last (“largest”) dataset, we also tried using
`bench::mark()`

to summarize the distribution of runtimes
over many runs, as well as memory usage.

*(Not actually run when the vignette knits, in order to avoid needing
bench as a dependency.)*

```
# bench::mark(cint.nguyen(x, y, nmc = 2e3),
# cint.naive.forloop(x, y, deltas, nmc = 2e3),
# check = FALSE, min_iterations = 10)
#> # A tibble: 2 × 13
#> expression min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc total_time result memory time gc
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl> <int> <dbl> <bch:tm> <list> <list> <list> <list>
#> 1 cint.nguyen(x, y, nmc = 2000) 2.5s 2.71s 0.364 1.58GB 2.92 10 80 27.44s <NULL> <Rprofmem [32,057 × 3]> <bench_tm [10]> <tibble [10 × 3]>
#> 2 cint.naive.forloop(x, y, deltas, nmc = 2000) 7.87s 8.07s 0.120 7.4GB 4.64 10 388 1.39m <NULL> <Rprofmem [32,256 × 3]> <bench_tm [10]> <tibble [10 × 3]>
```

Finally, we also tried using the `profvis`

package to
check what steps are taking the longest time.

*(Again, not actually run when the vignette knits, in order to avoid
needing profvis as a dependency; but screenshots are
included below.)*

```
# profvis::profvis(cint.nguyen(x, y, nmc = 2e3))
# profvis::profvis(cint.naive.forloop(x, y, deltas, nmc = 2e3))
```

For Nguyen’s method, the initial setup with `combn()`

or
`sample()`

followed by `apply(setdiff())`

takes
about 80% of the time, and the per-permutation calculations only take
about 20% of the time:

`profvis::profvis(cint.nguyen(x, y, nmc = 2e3))`

For the naive method, the initial setup is identical, and each round
of per-permutation calculations is slightly faster than in Nguyen’s
method… but because you have to repeat them many times (*unless you
already know the CI endpoints, in which case you wouldn’t need to run
this at all!*), they can add up to substantially more total time
than for Nguyen’s method.

`profvis::profvis(cint.naive.forloop(x, y, deltas, nmc = 2e3))`