We consider the testing problem introduced in Jin, Ke, and Luo (2019) where one tests a DCSBM with \(K=1\) communities (model 1) and a DCSBM
with \(K \ge 2\) communities (model 2),
both having the same expected degree for every node. Model 1 can be
considered the **degree-corrected Erdős–Rényi** model.

We mostly follow the notation of Jin, Ke, and Luo (2019) with some minor deviations (e.g., we use \(\lambda\) to denote the expected average degree of the network). They consider the degree-corrected mixed membership model (DCMM) with the following expectation for the adjacency matrix \[\begin{align*} \mathop{\mathrm{\mathbb E}}[A_{ij} \mid \theta, \pi] = \theta_i \theta_j \,\pi_i^T P \pi_j, \quad i <j, \quad \pi_i \sim F \end{align*}\] where \(\pi_i\) is a random element of \([0,1]^K\). In the special case of the DCSBM, we have \(\pi_i = e_{z_i}\) where \(z_i\) is a random label in \([K]\) and \(e_k\) is the \(k\)th basis vector of \(\mathbb R^K\). This special case is what we consider here. For identifibaility, it is assumed that \(\|P \|_\infty = \max_{k\ell} |P_{k\ell}|= 1\). They also let \[h = \mathop{\mathrm{\mathbb E}}[\pi_i]\] denote the class prior.

For **model 1**, we have a DCSBM with \(K=1\) and node connection propensity
parameter \(\theta_i\), that is, \[
\mathop{\mathrm{\mathbb E}}[A_{ij} \mid \theta] = \theta_i \theta_j,
\quad i < j.
\] Let us denote the degree of node \(i\) as \(\mathop{\mathrm{deg}}(i)\). Under model 1
is \[\mathop{\mathrm{\mathbb
E}}[\mathop{\mathrm{deg}}(i)] = \theta_i \sum_{j \neq i} \theta_j =
\theta_i (\|\theta\|_1 - \theta_i) .\]

For **model 2**, Jin, Ke, and Luo
(2019)
propose to use random propensity parameters \(\widetilde \theta= (\widetilde \theta_j)\),
with \[
\widetilde \theta_i = d_{z_i} \theta_i
\] where \(\{d_k, k \in [K]\}\)
is to be determined. For model 2, we have \(\mathop{\mathrm{\mathbb E}}[A_{ij} \mid \widetilde
\theta, z] = \widetilde \theta_i \widetilde \theta_j \,e_{z_i}^T P
e_{z_j}\) for \(i < j\). The
expected degree of node \(i\) under
model 2 is \[
\mathop{\mathrm{\mathbb E}}[\mathop{\mathrm{deg}}(i)] = \sum_{j\ne
i}\mathop{\mathrm{\mathbb E}}[A_{ij}] = \mathop{\mathrm{\mathbb
E}}[\widetilde \theta_i e_{z_i}^T]\, \sum_{j\neq i} P
\mathop{\mathrm{\mathbb E}}[\widetilde \theta_j e_{z_j}]
\] using implicit independence assumptions. We have \[
\mathop{\mathrm{\mathbb E}}[\widetilde \theta_i e_{z_i}] =
\mathop{\mathrm{\mathbb E}}[ d_{z_i} \theta_i e_{z_i}] = \theta_i
\mathop{\mathrm{\mathbb E}}[d_{z_i} e_{z_i}] = \theta_i \sum_k h_k d_k
e_k = \theta_i D h
\] where \(D =
\mathop{\mathrm{diag}}(d_k)\). Plugging-in we have, under model
2, \[
\mathop{\mathrm{\mathbb E}}[\mathop{\mathrm{deg}}(i)] = \theta_i
\Big(\sum_{j \neq i} \theta_j \Big) h^T D P D h.
\] Since \(h^T 1_K = 1_K\), if
we choose \(D\) such that \(D P D h = 1_K\), then model 2 will have the
same expected degree for each node as model 1.

Since \(P\) is symmetric and has nonnegative entries, under mild conditions on \(P\) and \(h\), the symmetric version of Sinkhorn theorem guarantees the existence of a diagonal matrix \(D = \mathop{\mathrm{diag}}(d_k)\) with positive entries such that \(D P D 1_K = h\) that is, \(D P D\) has row sums \(h\). See Idel (2016) for a review.

Let us use the elementwise notation \(h /
d\) and \(h d\) to denote the
vectors with entries \(h_k / d_k\) and
\(h_k d_k\). The Sinkhorn equation can
be written as \[
P d = h / d
\] or equivalently as the fixed point equation \(d = h / Pd\) which can be solved by
repeated application of the operator \(d
\mapsto h/Pd\) to an initial vector, say, \(1_K\). The iteration in fact oscillates
between the the two solutions \(d_1\)
and \(d_2\) of the nonsymmetric
Sinkhorn equation \(P d_2 = h/ d_1\)
(Knight 2008; Bradley
2010) and can be made to converge by averaging every two
steps. This is implemented in `sinkhorn_knopp`

function in
the `nett`

package.

To get the version we need, that is, \(D P D h = 1_K\), let us write Sinkhorn equation with \(d\) renamed to \(r\). Then, \[ P r = h / r \iff P ((r/h) h) = \frac{1_K}{r/h} \] This shows that defining \(d = r/h\), that is, \(d_k = r_k / h_k\) and letting \(D = \mathop{\mathrm{diag}}(d_k)\) we have the desired scaling \(D P D h = 1_K\). That is, we solve the symmetric Sinkhorn problem with row sum data \(h\), and rescale the output \(r\) to \(r/h\) to get the desired \(d\) for our problem.

Let us verify these ideas:

```
= 4 # number of true communities
Ktru = 0.15 # out-in-ratio
oir = 1:Ktru
h = h / sum(h) # class prior or `pri` in our notation
h
= rep(1, Ktru)
P_diag = oir + diag(P_diag - oir)
P
= nett::sinkhorn_knopp(P, sums=h, sym=T)$r
r = r / h
d diag(d) %*% P %*% diag(d) %*% h # verify Sinkhorn's scaling
#> [,1]
#> [1,] 1
#> [2,] 1
#> [3,] 1
#> [4,] 1
```

The expected average degree of model 1 is \(\alpha^2 := \frac1n(\|\theta\|_1^2 - \|\theta\|_2^2)\). We can rescale \(\theta\) to \(\sqrt{\lambda} \theta / \alpha\) to get a desired expected average degree \(\lambda\):

```
= 2000 # number of nodes
n = 5 # expected average degree
lambda
# Generate from the degree-corrected Erdős–Rényi model
set.seed(1234)
<- EnvStats::rpareto(n, 2/3, 3)
theta = (sum(theta)^2 - sum(theta^2))/n
alpha2 = theta * sqrt(lambda/alpha2)
theta
= nett::sample_dcer(theta) # sample the adjacency matrix matrix
A mean(Matrix::rowSums(A))
#> [1] 4.931
hist(Matrix::rowSums(A), 15, main = "Degree distribution", xlab = "degree")
```

```
= sample(Ktru, n, replace=T, prob=h) # randomly smaple "true community labels"
z = d[z] * theta
tht = nett::sample_dcsbm(z, P, tht) # sample the adjacency matrix
A hist(Matrix::rowSums(A), 15, main = "Degree distribution", xlab = "degree")
```

```
= 10
nrep = degs2 = rep(0, n)
degs1 for (r in 1:nrep) {
= degs1 + Matrix::rowSums(nett::sample_dcer(theta))
degs1 = sample(Ktru, n, replace=T, prob=h) # randomly smaple "true community labels"
z = degs2 + Matrix::rowSums(nett::sample_dcsbm(z, P, d[z] * theta))
degs2
}= degs1 / nrep
degs1 = degs2 / nrep
degs2 # cbind(degs1, degs2)
mean(abs(degs1 - degs2)/(degs1 + degs2)/2)
#> [1] 0.04304521
```

Let us now see how the NAC tests perform for separating these two models. We also compare with the likelihood ratio (LR) and adjusted spectral (AS) tests. Let us define the methods

```
= 1
K_H0 = Ktru
K_H1 = function(A) {
apply_methods = nett::spec_clust(A, K_H0)
z0 = nett::spec_clust(A, K_H0+1)
z0p = nett::spec_clust(A, K_H1)
z1 ::tribble(
tibble~method, ~tstat, ~twosided,
"NAC+", nett::nac_test(A, K_H0, z=z0, y=z0p)$stat, F,
"SNAC+", nett::snac_test(A, K_H0, z=z0)$stat, F,
"AS", nett::adj_spec_test(A, K_H0, z=z0), F,
"LR", nett::eval_dcsbm_loglr(A, cbind(z0, z1), poi=T), F
) }
```

as well as the data generation functions for the null and alternative hypotheses:

```
= function() {
gen_null_data ::sample_dcer(theta)
nett
}
= function() {
gen_alt_data = sample(Ktru, n, replace=T, prob=h)
z ::sample_dcsbm(z, P, d[z] * theta)
nett }
```

We can now simulate the ROC:

```
# try reducing nruns for faster bu cruder estimation of the ROC curves
= nett::simulate_roc(apply_methods,
roc_res gen_null_data = gen_null_data,
gen_alt_data = gen_alt_data,
nruns = 100, core_count = 1)
::printf('time = %3.3f', roc_res$elapsed_time)
nett#> time = 75.238
```

Here are the resulting ROCs:

```
<- roc_res$roc
plot_res $method <- factor(plot_res$method, levels = c("NAC+","SNAC+","AS", "LR"))
plot_res::plot_roc(plot_res) nett
```

The setup we have considered is quite difficult with oir = \(0.15\) and \(\lambda = 5\). You can try to change oir and \(\lambda\) rerun this experiment. They following is observed:

- For oir = 0.1 and \(\lambda = 5\), NAC+ and LR are perfect, SNAC+ is almost perfect and AS is close to random guessing.
- For oir = 0.2 and \(\lambda = 5\), NAC+ is perfect, LR deteriorates to some extent. SNAC+ and AS are almost like random guessing.
- For oir = 0.1 and \(\lambda = 10\), all are perfect.

Bradley, Andrew M. 2010. “Algorithms for the Equilibration of
Matrices and Their Application to Limited-Memory Quasi-Newton
Methods.” Stanford University, CA.

Idel, Martin. 2016. “A Review of Matrix Scaling and Sinkhorn’s
Normal Form for Matrices and Positive Maps.” *arXiv Preprint
arXiv:1609.06349*.

Jin, Jiashun, Zheng Tracy Ke, and Shengming Luo. 2019. “Optimal
Adaptivity of Signed-Polygon Statistics for Network Testing.”
*arXiv Preprint*. https://arxiv.org/abs/1904.09532v2.

Knight, Philip A. 2008. “The Sinkhorn–Knopp Algorithm: Convergence
and Applications.” *SIAM Journal on Matrix Analysis and
Applications* 30 (1): 261–75.

## Comments

Theorem 3.2 of Jin, Ke, and Luo (2019) states that under the condition that \(\|\theta\|_\infty \lesssim 1\) and \(\min_k h_k \gtrsim 1\) if \[\|\theta\|\cdot |\mu_2(P)| \to 0, \] the two hypotheses are indistinguisiable (in the sense that their \(\chi^2\)-distance goes to zero). Here \(\mu_2(P)\) is the 2nd largest eigenvalue of \(P\) in magnitude. In our seteup, \(\mu_2(P) \asymp 1\). So the condition is essentially \(\|\theta\| \to 0\).

Note that the expected average degree of the null model is \[\lambda \sim \frac{\|\theta\|_1^2}{n}.\] We have \(\|\theta\|_2 \ge \frac{\|\theta\|_1}{\sqrt n}\), hence, in the wrost case \(\|\theta\|_2 = \frac{\|\theta\|_1}{\sqrt n} \asymp \sqrt{\lambda}\) and the condition is equivalent to \(\sqrt{\lambda} \to 0\), that is, the expected average degree \(\lambda\) vanishes asympototically. It is quite reasolable that most methods would fail when \(\lambda \to 0\).

In all other cases, the situation is better. In particular, the above experiment shows that even for a sufficiently small \(\lambda = 5\) and a fairly large \(n = 2000\), NAC tests can still be perfect for this problem.