`BFI`

## Bayesian Federated Inference

Due to the limited size of the available data sets especially in rare diseases, it is sometimes challenging to identify the most relevant predictive features using multivariable statistical analysis. This issue may be resolved by combining data from multiple centers into one centralized location without sharing their data with each other, but doing so is difficult in reality because of privacy and security concerns.

To address these challenges, we developed and implemented a Bayesian Federated Inference (BFI) framework for multicenter data. It aims to leverage the statistical power of larger (combined) data sets without requiring all the data to be aggregated in one location. The BFI framework allows each center using their own local data to infer the optimal parameter values as well as additional features of the posterior parameter distribution to be able to gather more information which is not captured by alternative techniques. One of the benefit of BFI over alternative approaches is that, only one inference cycle across the centers is required in BFI.

An R package called `BFI`

is created to perform Bayesian
Federated Inference. The following instructions will install the
development version of the `BFI`

package to a computer.
Python and SAS users can also apply the BFI methodology in their
respective environments. For instructions, see here for
Python and here for
SAS.

First, you need to install R and RStudio:

- Install R
- Install RStudio Desktop (once you have R installed)

For more details about installing R and RStudio, see this page. If you need help learning R, see RStudio Education.

`BFI`

packageIn order to install the `BFI`

package, invoke R or RStudio
and then follow one of the following steps:

To install and load the `BFI`

package directly from R,
type the following (in Console)

```
install.packages("BFI")
library(BFI)
```

To install the `BFI`

package directly from Github, you
need to have the **devtools** package. So type the
following:

`if(!require(devtools)) {install.packages("devtools")}`

and then load it by typing:

`library(devtools)`

Next, install `BFI`

as follows:

`::install_github("hassanpazira/BFI", dependencies = TRUE, build_vignettes = TRUE, force = TRUE) devtools`

The package can now be loaded into R and used by:

`library(BFI)`

The latest version of the `BFI`

package is
`2.0.1`

. To check the current version of `BFI`

installed in your R library, use:

`packageVersion("BFI")`

The `BFI`

package provides several functions, the most
important of which are the following two main functions:

`MAP.estimation()`

: should be used by the centers, and the result should be sent to a central server.`bfi()`

: should be used by a central server.

To access the R documentation for these functions, for example
`bfi()`

, enter the following command:

```
help(bfi, package = "BFI") # without loading the BFI package
# or, equivalently, after loading the BFI package
?bfi
```

Letâ€™s look at the following example to see how the `BFI`

package can be used. For more examples and details look at the
`BFI`

vignette by typing

`browseVignettes("BFI") # to see all vignettes from the BFI package in an HTML browser.`

or use `vignette("BFI")`

, `vignette("SAS")`

or
`vignette("Python")`

to see the `BFI`

,
`SAS`

or `Python`

vignettes separately in the Help
tab of RStudio.

Now, we generate two independent (local) data sets from Gaussian
distribution, and then apply the package to see how it works. First
apply the function `MAP.estimation()`

to each local data, and
then apply the `bfi()`

function to the aggregated
results.

```
#-------------
# y ~ Gaussian
#-------------
# model assumptions:
<- 3 # number of coefficients without intercept
p <- c(1, rep(2, p), 1.5) # regression coefficients (theta[1] is the intercept) and sigma2 = 1.5
theta
#-----------------------------------
# Data simulation for local center 1
#-----------------------------------
<- 30 # sample size of center 1
n1 <- data.frame(matrix(rnorm(n1 * p), n1, p)) # continuous variables
X1 # linear predictor:
<- theta[1] + as.matrix(X1) %*% theta[2:4]
eta1 # inverse of the link function ( g^{-1}(\eta) = \mu ):
<- gaussian()$linkinv(eta1)
mu1 <- rnorm(n1, mu1, sd = sqrt(theta[5]))
y1
#-----------------------------------
# Data simulation for local center 2
#-----------------------------------
<- 50 # sample size of center 2
n2 <- data.frame(matrix(rnorm(n2 * p), n2, p)) # continuous variables
X2 # linear predictor:
<- theta[1] + as.matrix(X2) %*% theta[2:4]
eta2 # inverse of the link function:
<- gaussian()$linkinv(eta2)
mu2 <- rnorm(n2, mu2, sd = sqrt(theta[5]))
y2
#---------------------
# Load the BFI package
#---------------------
library(BFI)
#---------------------------
# Inverse Covariance Matrix
#---------------------------
# Creating the inverse covariance matrix for the Gaussian prior distribution:
<- inv.prior.cov(X1, lambda=0.05, family='gaussian') # the same for both centers
Lambda
#--------------------------
# MAP estimates at center 1
#--------------------------
<- MAP.estimation(y1, X1, family='gaussian', Lambda)
fit1 <- fit1$theta_hat # intercept and coefficient estimates
theta_hat1 <- fit1$A_hat # minus the curvature matrix
A_hat1
#--------------------------
# MAP estimates at center 2
#--------------------------
<- MAP.estimation(y2, X2, family='gaussian', Lambda)
fit2 <- fit2$theta_hat
theta_hat2 <- fit2$A_hat
A_hat2
#----------------------
# BFI at central center
#----------------------
<- list(A_hat1, A_hat2)
A_hats <- list(theta_hat1, theta_hat2)
theta_hats <- bfi(theta_hats, A_hats, Lambda)
bfi summary(bfi, cur_mat=TRUE)
#--------------------
# Stratified analysis
#--------------------
# Stratified analysis when 'intercept' varies across two centers:
<- inv.prior.cov(X1, lambda=c(0.1, 0.3), family='gaussian', stratified=TRUE, strat_par = 1)
newLambda1 # 'newLambda1' is used the prior for combined data and 'Lambda' is used the prior for locals
<- bfi(theta_hats, A_hats, list(Lambda, newLambda1), stratified=TRUE, strat_par=1)
bfi1 summary(bfi1, cur_mat=TRUE)
# Stratified analysis when 'sigma2' varies across two centers:
<- inv.prior.cov(X1, lambda=c(0.1, 0.3), family='gaussian', stratified=TRUE, strat_par = 2)
newLambda2 # 'newLambda2' is used the prior for combined data and 'Lambda' is used the prior for locals
<- bfi(theta_hats, A_hats, list(Lambda, newLambda2), stratified=TRUE, strat_par=2)
bfi2 summary(bfi2, cur_mat=TRUE)
# Stratified analysis when 'intercept' and 'sigma2' vary across 2 centers:
<- inv.prior.cov(X1, lambda=c(0.1, 0.2, 0.3), family='gaussian', stratified=TRUE, strat_par = c(1, 2))
newLambda3 # 'newLambda3' is used the prior for combined data and 'Lambda' is used the prior for locals
<- bfi(theta_hats, A_hats, list(Lambda, newLambda3), stratified=TRUE, strat_par=1:2)
bfi3 summary(bfi3, cur_mat=TRUE)
```

To cite `BFI`

in publications, please use:

`citation("BFI")`

Here are some of technical papers of the package:

If you find any errors, have any suggestions, or would like to request that something be added, please file an issue at issue report or send an email to: hassan.pazira@radboudumc.nl.