Simulation from the bivariate negative binomial and bi- and trivariate logarithmic series distribution

The multivariate negative binomial distribution.

Recall that a random variable $$X$$ has (univariate) negative binomial law with parameters $$\kappa>0, 0<p<1$$, i.e., $$X\sim \text{NB}(\kappa, p)$$ if its probability mass function is given by $P(X=x) = {\kappa+x-1 \choose x} p^x(1-p)^{\kappa}, \quad x \in \{0,1,\ldots\}.$

A random vector $${\bf X}=(X_1,\dots,X_n)'$$ is said to follow the multivariate negative binomial distribution with parameters $$\kappa, p_1, \dots, p_n$$ if its probability mass function is given by $P({\bf X}={\bf x})=\frac{\Gamma(x_1+\cdots+x_n+\kappa)}{x_1! \cdots x_n! \Gamma(\kappa)}p_1^{x_1}\cdots p_n^{x_n}(1-p_1-\cdots-p_n)^{\kappa},$ where, for $$i=1,\dots,n$$, $$x_i\in\{0,1,\dots\}$$, $$0<p_i<1$$ such that $$\sum_{i=1}^np_i<1$$ and $$\kappa>0$$.

We note that the function stats::rnbinom can be used to simulate from the univariate negative binomial distribution. The trawl package introduces the function Bivariate_NBsim which generates samples from the bivariate negative binomial distribution. The simulation algorithm proceeds in two steps: First, we simulate $$X_1$$ from the univariate negative binomial distribution NB($$\kappa$$,$$p_1/(1-p_2)$$). Second, we simulate $$X_2|X_1=x_1$$ from the univariate negative binomial distribution NB($$\kappa+x_1,p_2$$), see for instance (Dunn 1967).

Example

set.seed(1)
kappa<- 3
p1 <- 0.1
p2 <- 0.85
p <- p1+p2
p0 <-1-p1-p2

N<- 10000

#Simulate from the bivariate negative binomial distribution
y <- trawl::Bivariate_NBsim(N,kappa,p1,p2)

#Compare the empirical and theoretical mean of the first component
base::mean(y[,1])
#>  5.9653
kappa*p1/(1-p)
#>  6

#Compare the empirical and theoretical variance of the first component
stats::var(y[,1])
#>  18.0047
kappa*p1*(1-p2)/(1-p)^2
#>  18

#Compare the empirical and theoretical mean of the second component
base::mean(y[,2])
#>  50.9742
kappa*p2/(1-p)
#>  51

#Compare the empirical and theoretical variance of the second component
stats::var(y[,2])
#>  926.3358
kappa*p2*(1-p1)/(1-p)^2
#>  918

#Compare the empirical and theoretical correlation between the two components
stats::cor(y[,1],y[,2])
#>  0.7891007
(p1*p2/(p0+p1)*(p0+p2))^(1/2)
#>  0.7141428

The multivariate logarithmic series distribution

We say that a vector $${\bf X}=(X_1,\dots,X_n)'$$ follows the multivariate logarithmic series distribution (LSD), see, e.g., (Patil and Bildikar 1967). $${\bf X} \sim \text{LSD}(p_1,\ldots,p_n)$$, where $$0<p_i<1, p:=\sum_{i=1}^np_i <1$$ if for $${\bf x}\in \mathbb{N}_0^n \setminus \{{\bf 0} \}$$, if its probability mass function is given by $P({\bf X}={\bf x})=\frac{\Gamma(x_1+\cdots+x_n)}{x_1! \cdots x_n!}\frac{p_1^{x_1}\cdots p_n^{x_n}}{\{-\ln(1-p)\}}.$ Note that each component $$X_i$$ follows the modified univariate logarithmic distribution with parameters $$\tilde p_i = p_i/(1-p+p_i)$$ and $$\delta_i= \ln(1-p+p_i)/\ln(1-p)$$, i.e., $$X_i\sim\text{ModLSD}(\tilde p_i, \delta_i)$$ with $P(X_i =x_i) = \left\{ \begin{array}{ll} \delta_i, & \text{for } x_i=0\\ (1-\delta_i) \frac{1}{x_i}\frac{\tilde p_i^{x_i}}{\{-\ln(1-\tilde p_i)\}}, & \text{for } x_i \in \mathbb{N}. \end{array} \right.$ Simulations from the univariate LSD can be carried out using the function Runuran::urlogarithmic. The trawl package implements the functions Bivariate_LSDsim and Trivariate_LSDsim to simulate from both the bivariate and the trivariate logarithmic series distribution.

Simulating from the bivariate logarithmic series distribution

The function Bivariate_NBsim can be used to simulate from the bivariate logarithmic series distribution. To this end, note that the probability mass function of a random vector $${\bf X}=(X_1,X_2)'$$ following the bivariate logarithmic series distribution with parameters $$0<p_1, p_2<1$$ with $$p:=p_1+p_2<1$$ is given by $P(X_1=x_1,X_2=x_2)=\frac{\Gamma(x_1+x_2)}{x_1!x_2!} \frac{p_1^{x_1}p_2^{x_2}}{\{-\ln(1-p)\}},$ for $$x_1,x_2=0,1,2,\dots$$ such that $$x_1+x_2>0$$.

The simulation proceeds in two steps: First, $$X_1$$ is simulated from the modified logarithmic distribution with parameters $$\tilde p_1=p_1/(1-p_2)$$ and $$\delta_1=\ln(1-p_2)/\ln(1-p)$$. Then we simulate $$X_2$$ conditional on $$X_1$$. We note that $$X_2|X_1=x_1$$ follows the logarithmic series distribution with parameter $$p_2$$ when $$x_1=0$$, and the negative binomial distribution with parameters $$(x_1,p_2)$$ when $$x_1>0$$.

Example

Next we provide an example of a simulation from the bivariate LSD and we showcase the functions ModLSD_Mean, ModLSD_Var, BivLSD_Cor and BivLSD_Cov which compute the mean and the variance of the univariate modified LSD and the correlation and covariance of the bivariate LSD, respectively.

set.seed(1)
p1<-0.15
p2<-0.3

N<-10000

#Simulate N realisations from the bivariate LSD
y<-trawl::Bivariate_LSDsim(N, p1, p2)

#Compute the empirical and theoretical mean of the first component
base::mean(y[,1])
#>  0.4575
trawl::ModLSD_Mean(base::log(1-p2)/base::log(1-p1-p2),p1/(1-p2))
#>  0.45619

#Compute the empirical and theoretical mean of the second component
base::mean(y[,2])
#>  0.8995
trawl::ModLSD_Mean(base::log(1-p1)/base::log(1-p1-p2),p2/(1-p1))
#>  0.91238

#Compute the empirical and theoretical variance of the first component
stats::var(y[,1])
#>  0.3686306
trawl::ModLSD_Var(base::log(1-p2)/base::log(1-p1-p2),p1/(1-p2))
#>  0.3724961

#Compute the empirical and theoretical variance of the second component
stats::var(y[,2])
#>  0.5690567
trawl::ModLSD_Var(base::log(1-p1)/base::log(1-p1-p2),p2/(1-p1))
#>  0.5776045

##Compute the empirical and theoretical correlation between the two components
stats::cor(y[,1],y[,2])
#>  -0.3961486
trawl::BivLSD_Cor(p1,p2)
#>  -0.3608673

##Compute the empirical and theoretical covariance between the two components
stats::cov(y[,1],y[,2])
#>  -0.1814394
trawl::BivLSD_Cov(p1,p2)
#>  -0.1673877

Simulating from the trivariate logarithmic series distribution

The function Trivariate_NBsim can be used to simulate from the trivariate logarithmic series distribution. The simulation proceeds in two steps: First, $$X_1$$ is simulated from the modified logarithmic distribution with parameters $$\tilde p_1=p_1/(1-p_2-p_3)$$ and $$\delta_1=\ln(1-p_2-p_3)/\ln(1-p)$$. Then we simulate $$(X_2,X_3)'$$ conditional on $$X_1$$. We note that $$(X_2,X_3)'|X_1=x_1$$ follows the bivariate logarithmic series distribution with paramaters $$(p_2,p_3)$$ when $$x_1=0$$, and the bivariate negative binomial distribution with parameters $$(x_1,p_2,p_3)$$ when $$x_1>0$$.

Example

set.seed(1)
p1<-0.15
p2<-0.25
p3<-0.55

N<- 10000

#Simulate N realisations from the bivariate LSD
y<-trawl::Trivariate_LSDsim(N, p1, p2, p3)

#Compute the empirical and theoretical mean of the first component
base::mean(y[,1])
#>  1.0152
trawl::ModLSD_Mean(base::log(1-p2-p3)/base::log(1-p1-p2-p3),p1/(1-p2-p3))
#>  1.001425

#Compute the empirical and theoretical mean of the second component
base::mean(y[,2])
#>  1.6857
trawl::ModLSD_Mean(base::log(1-p1-p3)/base::log(1-p1-p2-p3),p2/(1-p1-p3))
#>  1.669041

#Compute the empirical and theoretical mean of the third component
base::mean(y[,3])
#>  3.7275
trawl::ModLSD_Mean(base::log(1-p1-p2)/base::log(1-p1-p2-p3),p3/(1-p1-p2))
#>  3.67189

#Compute the empirical and theoretical variance of the first component
stats::var(y[,1])
#>  2.999069
trawl::ModLSD_Var(base::log(1-p2-p3)/base::log(1-p1-p2-p3),p1/(1-p2-p3))
#>  3.002847

#Compute the empirical and theoretical variance of the second component
stats::var(y[,2])
#>  7.255041
trawl::ModLSD_Var(base::log(1-p1-p3)/base::log(1-p1-p2-p3),p2/(1-p1-p3))
#>  7.228548

#Compute the empirical and theoretical variance of the third component
stats::var(y[,3])
#>  30.87613
trawl::ModLSD_Var(base::log(1-p1-p2)/base::log(1-p1-p2-p3),p3/(1-p1-p2))
#>  30.5799

#Computing the bivariate covariances and correlations
#Cor(X1,X2):
delta <- base::log(1-p3)/base::log(1-p1-p2-p3)
hatp1 <-p1/(1-p3)
hatp2<-p2/(1-p3)

stats::cov(y[,1],y[,2])
#>  3.316309
trawl::BivModLSD_Cov(delta,hatp1,hatp2)
#>  3.335704

stats::cor(y[,1],y[,2])
#>  0.7109545
trawl::BivModLSD_Cor(delta,hatp1,hatp2)
#>  0.6041258

#Cor(X1,X3):
delta <- log(1-p2)/log(1-p1-p2-p3)
hatp1 <-p1/(1-p2)
hatp2<-p3/(1-p2)

stats::cov(y[,1],y[,3])
#>  7.287671
trawl::BivModLSD_Cov(delta,hatp1,hatp2)
#>  7.338549

stats::cor(y[,1],y[,3])
#>  0.7573281
trawl::BivModLSD_Cor(delta,hatp1,hatp2)
#>  0.7220044

#Cor(X2,X3):
delta <- log(1-p1)/log(1-p1-p2-p3)
hatp1 <-p2/(1-p1)
hatp2<-p3/(1-p1)

stats::cov(y[,2],y[,3])
#>  12.31899
trawl::BivModLSD_Cov(delta,hatp1,hatp2)
#>  12.23092
stats::cor(y[,2],y[,3])
#>  0.8230829
trawl::BivModLSD_Cor(delta,hatp1,hatp2)
#>  0.7969081