socialmixr is an `R`

package to derive social mixing matrices from survey data. These are particularly useful for age-structured infectious disease models. For background on age-specific mixing matrices and what data inform them, see, for example, the paper on POLYMOD by Mossong et al.

The latest stable version of the `socialmixr`

package is installed via

`install.packages('socialmixr')`

The latest development version of the `socialmixr`

package can be installed via

`::install_github('sbfnk/socialmixr') devtools`

To load the package, use

`library('socialmixr')`

```
#>
#> Attaching package: 'socialmixr'
#> The following object is masked from 'package:utils':
#>
#> cite
```

At the heart of the `socialmixr`

package is the `contact_matrix`

function. This extracts a contact matrix from survey data. You can use the `R`

help to find out about usage of the `contact_matrix`

function, including a list of examples:

` ?contact_matrix`

The POLYMOD data are included with the package and can be loaded using

`data(polymod)`

An example use would be

```
contact_matrix(polymod, countries = "United Kingdom", age.limits = c(0, 1, 5, 15))
#> Using POLYMOD social contact data. To cite this in a publication, use the 'cite' function
#> Removing participants that have contacts without age information. To change this behaviour, set the 'missing.contact.age' option
#> $matrix
#> contact.age.group
#> age.group [0,1) [1,5) [5,15) 15+
#> [0,1) 0.40000000 0.8000000 1.266667 5.933333
#> [1,5) 0.11250000 1.9375000 1.462500 5.450000
#> [5,15) 0.02450980 0.5049020 7.946078 6.215686
#> 15+ 0.03230337 0.3581461 1.290730 9.594101
#>
#> $participants
#> age.group participants proportion
#> 1: [0,1) 15 0.01483680
#> 2: [1,5) 80 0.07912957
#> 3: [5,15) 204 0.20178042
#> 4: 15+ 712 0.70425321
```

This generates a contact matrix from the UK part of the POLYMOD study, with age groups 0-1, 1-5, 5-15 and 15+ years. It contains the mean number of contacts that each member of an age group (row) has reported with members of the same or another age group (column).

The `contact_matrix`

function requires a survey given as a list of two elements, both given as data.frames: `participants`

and `contacts`

. They must be linked by an ID column that refers to the identity of the queried participants (by default `global_id`

, but this can be changed using the `id.column`

argument). The `participants`

data frame, as a minimum, must have the ID column and a column denoting participant age (which can be set by the `part.age.column`

argument, by default `participant_age`

). The `contacts`

data frame, similarly, must have the ID column and a column denoting age (which can be set by the `contact.age.column`

argument, by default `cnt_age_mean`

).

The function then either randomly samples participants (if `bootstrap`

is set to `TRUE`

) or takes all participants in the survey and determines the mean number of contacts in each age group given the age group of the participant. The age groups can be set using the `age.limits`

argument, which should be set to the lower limits of the age groups (e.g., `age.limits=c(0, 5, 10)`

for age groups 0-5, 5-10, 10+). If these are not given, the narrowest age groups possible given survey and demographic data are used.

The key argument to the `contact_matrix`

function is the `survey`

that it supposed to use. The `socialmixr`

package includes the POLYMOD survey, which will be used if not survey is specified. It also provides access to all surveys in the Social contact data community on Zenodo. The available surveys can be listed (if an internet connection is available) with

`list_surveys()`

A survey can be downloaded using the `get_survey`

command. This will get the relevant data of a survey given its Zenodo DOI (as returned by `list_surveys`

). All other relevant commands in the `socialmixr`

package accept a DOI, but if a survey is to be used repeatedly it is worth downloading it and storing it locally to avoid the need for a network connection and speed up processing.

```
get_survey("https://doi.org/10.5281/zenodo.1095664")
peru_survey <-saveRDS(peru_survey, "peru.rds")
```

This way, the `peru`

data set can be loaded in the future without the need for an internet connection using

` readRDS("peru.rds") peru_survey <-`

Some surveys may contain data from multiple countries. To check this, use the `survey_countries`

function

```
survey_countries(polymod)
#> Using POLYMOD social contact data. To cite this in a publication, use the 'cite' function
#> [1] "Italy" "Germany" "Luxembourg" "Netherlands"
#> [5] "Poland" "United Kingdom" "Finland" "Belgium"
```

If one wishes to get a contact matrix for one or more specific countries, a `countries`

argument can be passed to `contact_matrix`

. If this is not done, the different surveys contained in a dataset are combined as if they were one single sample (i.e., not applying any population-weighting by country or other correction).

By default, socialmixr uses the POLYMOD survey. A reference for any given survey can be obtained using `cite`

, e.g.

```
cite(polymod)
#> Using POLYMOD social contact data. To cite this in a publication, use the 'cite' function
#>
#> To cite POLYMOD social contact data in publications use:
#>
#> Mossong J, Hens N, Jit M, Beutels P, Auranen K, Mikolajczyk R, Massari
#> M, Salmaso S, Tomba GS, Wallinga J, Heijne J, Sadkowska-Todys M,
#> Rosinska M, Edmunds WJ (2017). "POLYMOD social contact data." doi:
#> 10.5281/zenodo.1157934 (URL: https://doi.org/10.5281/zenodo.1157934),
#> Version 1.1.
#>
#> A BibTeX entry for LaTeX users is
#>
#> @Misc{,
#> title = {POLYMOD social contact data},
#> author = {Joel Mossong and Niel Hens and Mark Jit and Philippe Beutels and Kari Auranen and Rafael Mikolajczyk and Marco Massari and Stefania Salmaso and Gianpaolo Scalia Tomba and Jacco Wallinga and Janneke Heijne and Malgorzata Sadkowska-Todys and Magdalena Rosinska and W. John Edmunds},
#> doi = {https://doi.org/10.5281/zenodo.1157934},
#> note = {Version 1.1},
#> year = {2017},
#> }
```

To get an idea of uncertainty of the contact matrices, a bootstrap can be used. If an argument `n`

greater than 1 is passed to `contact_matrix`

, multiple samples of contact matrices are generated. For each sample, participants are sampled (with replacement, to get the same number of participants of the original study), and contacts are sampled from the set of all the contacts of all the participants (again, with replacement). All resulting contact matrices are returned as `matrices`

field in the returned list. From these, derived quantities can be obtained, for example the mean

```
contact_matrix(polymod, countries = "United Kingdom", age.limits = c(0, 1, 5, 15), n=5)
m <-#> Using POLYMOD social contact data. To cite this in a publication, use the 'cite' function
length(m$matrices)
#> [1] 5
Reduce("+", lapply(m$matrices, function(x) {x$matrix})) / length(m$matrices)
mr <-
mr#> contact.age.group
#> age.group [0,1) [1,5) [5,15) 15+
#> [0,1) 0.48757958 0.7366087 1.198042 6.252502
#> [1,5) 0.09955107 2.3929449 1.475027 5.465488
#> [5,15) 0.02275334 0.5021818 7.860182 6.159409
#> 15+ 0.03291598 0.3894526 1.330329 9.516813
```

Obtaining symmetric contact matrices or splitting out their components (see below) requires information about the underlying demographic composition of the survey population. This can be passed to `contact_matrix`

as the `survey.pop`

argument, a `data.frame`

with two columns, `lower.age.limit`

(denoting the lower end of the age groups) and `population`

(denoting the number of people in each age group). If no `survey.pop`

is not given, `contact_matrix`

will try to obtain the age structure of the population (as per the `countries`

argument) from the World Population Prospects of the United Nations, using estimates from the year that closest matches the year in which the contact survey was conducted.

If demographic information is used, this is returned by `contact_matrix`

as the `demography`

field in the results list.

Conceivably, contact matrices should be symmetric: the total number of contacts made by members of one age group with those of another should be the same as vice versa. Mathematically, if \(c_{ij}\) is the mean number of contacts made by members of age group \(i\) with members of age group \(j\), and the total number of people in age group \(i\) is \(N_i\), then

\[c_{ij} N_i = c_{ji}N_j\]

Because of variation in the sample from which the contact matrix is obtained, this relationship is usually not fulfilled exactly. In order to obtain a symmetric contact matrix that fulfills it, one can use

\[c'_{ij} = \frac{1}{2N_i} (c_{ij} N_i + c_{ji} N_j)\]

To get this version of the contact matrix, use `symmetric = TRUE`

when calling the `contact_matrix`

function.

```
contact_matrix(polymod, countries = "United Kingdom", age.limits = c(0, 1, 5, 15), symmetric = TRUE)
#> Using POLYMOD social contact data. To cite this in a publication, use the 'cite' function
#> Removing participants that have contacts without age information. To change this behaviour, set the 'missing.contact.age' option
#> Warning in pop_age(survey.pop, age.limits, ...): Not all age groups represented in population data (5-year age band).
#> Linearly estimating age group sizes from the 5-year bands.
#> $matrix
#> contact.age.group
#> [0,1) [1,5) [5,15) 15+
#> [1,] 0.40000000 0.6250000 0.7643524 4.122001
#> [2,] 0.15625000 1.9375000 1.4059984 5.927286
#> [3,] 0.07149388 0.5260415 7.9460784 7.425725
#> [4,] 0.05762596 0.3314560 1.1098739 9.594101
#>
#> $demography
#> age.group population proportion year
#> 1: [0,1) 690312 0.01146507 2005
#> 2: [1,5) 2761248 0.04586028 2005
#> 3: [5,15) 7380235 0.12257488 2005
#> 4: 15+ 49378217 0.82009977 2005
#>
#> $participants
#> age.group participants proportion
#> 1: [0,1) 15 0.01483680
#> 2: [1,5) 80 0.07912957
#> 3: [5,15) 204 0.20178042
#> 4: 15+ 712 0.70425321
```

The `contact_matrix`

contains a simple model for the elements of the contact matrix, by which it is split into a *global* component, as well as three components representing *contacts*, *assortativity* and *demography*. In other words, the elements \(c_{ij}\) of the contact matrix are modelled as

\[ c_{ij} = q d_i a_{ij} n_j \]

where \(q d_i\) is the number of contacts that a member of group \(i\) makes across age groups, \(n_j\) is the proportion of the surveyed population in age group \(j\). The constant \(q\) is set to the value of the largest eigenvalue of \(c_{ij}\); if used in an infectious disease model, it can be replaced by the basic reproduction number \(R_0\).

To model the contact matrix in this way with the `contact_matrix`

function, set `split = TRUE`

. The components of the matrix are returned as elements `normalisation`

(\(q\)), `contacts`

(\(d_i\)), `matrix`

(\(a_{ij}\)) and `demography`

(\(n_j\)) of the resulting list.

```
contact_matrix(polymod, countries = "United Kingdom", age.limits = c(0, 1, 5, 15), split = TRUE)
#> Using POLYMOD social contact data. To cite this in a publication, use the 'cite' function
#> Removing participants that have contacts without age information. To change this behaviour, set the 'missing.contact.age' option
#> Warning in pop_age(survey.pop, age.limits, ...): Not all age groups represented in population data (5-year age band).
#> Linearly estimating age group sizes from the 5-year bands.
#> $mean.contacts
#> [1] 11.55495
#>
#> $normalisation
#> [1] 1.03915
#>
#> $contacts
#> [1] 0.6995727 0.7464190 1.2235173 0.9390331
#>
#> $matrix
#> [,1] [,2] [,3] [,4]
#> [1,] 4.1534023 2.0767011 1.2302166 0.8612967
#> [2,] 1.0948299 4.7138509 1.3312672 0.7414821
#> [3,] 0.1455146 0.7494002 4.4126024 0.5159003
#> [4,] 0.2498871 0.6926217 0.9339135 1.0375529
#>
#> $demography
#> age.group population proportion year
#> 1: [0,1) 690312 0.01146507 2005
#> 2: [1,5) 2761248 0.04586028 2005
#> 3: [5,15) 7380235 0.12257488 2005
#> 4: 15+ 49378217 0.82009977 2005
#>
#> $participants
#> age.group participants proportion
#> 1: [0,1) 15 0.01483680
#> 2: [1,5) 80 0.07912957
#> 3: [5,15) 204 0.20178042
#> 4: 15+ 712 0.70425321
```

The contact matrices can be plotted, for example, using the `geom_tile`

function of the `ggplot2`

package.

```
library("reshape2")
library("ggplot2")
melt(mr, varnames = c("age1", "age2"), value.name = "contacts")
df <-ggplot(df, aes(x = age2, y = age1, fill = contacts)) + theme(legend.position = "bottom") +
geom_tile()
```