Introduction to PHEindicatormethods

Georgina Anderson

2023-05-05

Introduction

This vignette introduces the following functions from the PHEindicatormethods package and provides basic sample code to demonstrate their execution. The code included is based on the code provided within the ‘examples’ section of the function documentation. This vignette does not explain the methods applied in detail but these can (optionally) be output alongside the statistics or for a more detailed explanation, please see the references section of the function documentation.

The following packages must be installed and loaded if not already available

library(PHEindicatormethods)
library(dplyr)

Package functions

This vignette covers the following functions available within the first release of the package (v1.0.8) but has been updated to apply to these functions in their latest release versions. If further functions are added to the package in future releases these will be explained elsewhere.

Function Type Description
phe_proportion Non-aggregate Performs a calculation on each row of data (unless data is grouped)
phe_rate Non-aggregate Performs a calculation on each row of data (unless data is grouped)
phe_mean Aggregate Performs a calculation on each grouping set
phe_dsr Aggregate, standardised Performs a calculation on each grouping set and requires additional reference inputs
calculate_ISRatio Aggregate, standardised Performs a calculation on each grouping set and requires additional reference inputs
calculate_ISRate Aggregate, standardised Performs a calculation on each grouping set and requires additional reference inputs

Non-aggregate functions

Create some test data for the non-aggregate functions

The following code chunk creates a data frame containing observed number of events and populations for 4 geographical areas over 2 time periods that is used later to demonstrate the PHEindicatormethods package functions:

df <- data.frame(
        area = rep(c("Area1","Area2","Area3","Area4"), 2),
        year = rep(2015:2016, each = 4),
        obs = sample(100, 2 * 4, replace = TRUE),
        pop = sample(100:200, 2 * 4, replace = TRUE))
df
#>    area year obs pop
#> 1 Area1 2015  37 104
#> 2 Area2 2015  96 149
#> 3 Area3 2015  10 161
#> 4 Area4 2015  36 184
#> 5 Area1 2016  19 152
#> 6 Area2 2016  51 187
#> 7 Area3 2016  21 189
#> 8 Area4 2016  81 136

Execute phe_proportion and phe_rate

INPUT: The phe_proportion and phe_rate functions take a single data frame as input with columns representing the numerators and denominators for the statistic. Any other columns present will be retained in the output.

OUTPUT: The functions output the original data frame with additional columns appended. By default the additional columns are the proportion or rate, the lower 95% confidence limit, the upper 95% confidence limit, the confidence level, the statistic name and the method.

OPTIONS: The functions also accept additional arguments to specify the level of confidence, the multiplier and a reduced level of detail to be output.

Here are some example code chunks to demonstrate these two functions and the arguments that can optionally be specified

# default proportion
phe_proportion(df, obs, pop)
#>    area year obs pop     value    lowercl   uppercl confidence       statistic
#> 1 Area1 2015  37 104 0.3557692 0.27040433 0.4514095        95% proportion of 1
#> 2 Area2 2015  96 149 0.6442953 0.56468675 0.7166505        95% proportion of 1
#> 3 Area3 2015  10 161 0.0621118 0.03408441 0.1105482        95% proportion of 1
#> 4 Area4 2015  36 184 0.1956522 0.14480534 0.2589472        95% proportion of 1
#> 5 Area1 2016  19 152 0.1250000 0.08150359 0.1869837        95% proportion of 1
#> 6 Area2 2016  51 187 0.2727273 0.21395011 0.3406540        95% proportion of 1
#> 7 Area3 2016  21 189 0.1111111 0.07383069 0.1638851        95% proportion of 1
#> 8 Area4 2016  81 136 0.5955882 0.51157800 0.6743468        95% proportion of 1
#>   method
#> 1 Wilson
#> 2 Wilson
#> 3 Wilson
#> 4 Wilson
#> 5 Wilson
#> 6 Wilson
#> 7 Wilson
#> 8 Wilson

# specify confidence level for proportion
phe_proportion(df, obs, pop, confidence=99.8)
#>    area year obs pop     value    lowercl   uppercl confidence       statistic
#> 1 Area1 2015  37 104 0.3557692 0.22853378 0.5072643      99.8% proportion of 1
#> 2 Area2 2015  96 149 0.6442953 0.51779464 0.7534140      99.8% proportion of 1
#> 3 Area3 2015  10 161 0.0621118 0.02447767 0.1487830      99.8% proportion of 1
#> 4 Area4 2015  36 184 0.1956522 0.12128114 0.3000556      99.8% proportion of 1
#> 5 Area1 2016  19 152 0.1250000 0.06375981 0.2305743      99.8% proportion of 1
#> 6 Area2 2016  51 187 0.2727273 0.18498287 0.3825562      99.8% proportion of 1
#> 7 Area3 2016  21 189 0.1111111 0.05840020 0.2012304      99.8% proportion of 1
#> 8 Area4 2016  81 136 0.5955882 0.46345009 0.7151833      99.8% proportion of 1
#>   method
#> 1 Wilson
#> 2 Wilson
#> 3 Wilson
#> 4 Wilson
#> 5 Wilson
#> 6 Wilson
#> 7 Wilson
#> 8 Wilson

# specify to output proportions as percentages
phe_proportion(df, obs, pop, multiplier=100)
#>    area year obs pop    value   lowercl  uppercl confidence  statistic method
#> 1 Area1 2015  37 104 35.57692 27.040433 45.14095        95% percentage Wilson
#> 2 Area2 2015  96 149 64.42953 56.468675 71.66505        95% percentage Wilson
#> 3 Area3 2015  10 161  6.21118  3.408441 11.05482        95% percentage Wilson
#> 4 Area4 2015  36 184 19.56522 14.480534 25.89472        95% percentage Wilson
#> 5 Area1 2016  19 152 12.50000  8.150359 18.69837        95% percentage Wilson
#> 6 Area2 2016  51 187 27.27273 21.395011 34.06540        95% percentage Wilson
#> 7 Area3 2016  21 189 11.11111  7.383069 16.38851        95% percentage Wilson
#> 8 Area4 2016  81 136 59.55882 51.157800 67.43468        95% percentage Wilson

# specify level of detail to output for proportion
phe_proportion(df, obs, pop, confidence=99.8, multiplier=100)
#>    area year obs pop    value   lowercl  uppercl confidence  statistic method
#> 1 Area1 2015  37 104 35.57692 22.853378 50.72643      99.8% percentage Wilson
#> 2 Area2 2015  96 149 64.42953 51.779464 75.34140      99.8% percentage Wilson
#> 3 Area3 2015  10 161  6.21118  2.447767 14.87830      99.8% percentage Wilson
#> 4 Area4 2015  36 184 19.56522 12.128114 30.00556      99.8% percentage Wilson
#> 5 Area1 2016  19 152 12.50000  6.375981 23.05743      99.8% percentage Wilson
#> 6 Area2 2016  51 187 27.27273 18.498287 38.25562      99.8% percentage Wilson
#> 7 Area3 2016  21 189 11.11111  5.840020 20.12304      99.8% percentage Wilson
#> 8 Area4 2016  81 136 59.55882 46.345009 71.51833      99.8% percentage Wilson

# specify level of detail to output for proportion and remove metadata columns
phe_proportion(df, obs, pop, confidence=99.8, multiplier=100, type="standard")
#>    area year obs pop    value   lowercl  uppercl
#> 1 Area1 2015  37 104 35.57692 22.853378 50.72643
#> 2 Area2 2015  96 149 64.42953 51.779464 75.34140
#> 3 Area3 2015  10 161  6.21118  2.447767 14.87830
#> 4 Area4 2015  36 184 19.56522 12.128114 30.00556
#> 5 Area1 2016  19 152 12.50000  6.375981 23.05743
#> 6 Area2 2016  51 187 27.27273 18.498287 38.25562
#> 7 Area3 2016  21 189 11.11111  5.840020 20.12304
#> 8 Area4 2016  81 136 59.55882 46.345009 71.51833

# default rate
phe_rate(df, obs, pop)
#>    area year obs pop    value   lowercl  uppercl confidence       statistic
#> 1 Area1 2015  37 104 35576.92 25046.120 49039.52        95% rate per 100000
#> 2 Area2 2015  96 149 64429.53 52186.830 78680.26        95% rate per 100000
#> 3 Area3 2015  10 161  6211.18  2973.571 11423.27        95% rate per 100000
#> 4 Area4 2015  36 184 19565.22 13701.330 27087.31        95% rate per 100000
#> 5 Area1 2016  19 152 12500.00  7522.356 19521.28        95% rate per 100000
#> 6 Area2 2016  51 187 27272.73 20304.824 35859.34        95% rate per 100000
#> 7 Area3 2016  21 189 11111.11  6875.342 16985.31        95% rate per 100000
#> 8 Area4 2016  81 136 59558.82 47296.731 74027.08        95% rate per 100000
#>   method
#> 1  Byars
#> 2  Byars
#> 3  Byars
#> 4  Byars
#> 5  Byars
#> 6  Byars
#> 7  Byars
#> 8  Byars

# specify rate parameters
phe_rate(df, obs, pop, confidence=99.8, multiplier=100)
#>    area year obs pop    value   lowercl  uppercl confidence    statistic method
#> 1 Area1 2015  37 104 35.57692 20.170365 57.65101      99.8% rate per 100  Byars
#> 2 Area2 2015  96 149 64.42953 45.991324 87.46519      99.8% rate per 100  Byars
#> 3 Area3 2015  10 161  6.21118  1.811378 15.02716      99.8% rate per 100  Byars
#> 4 Area4 2015  36 184 19.56522 10.995526 31.90510      99.8% rate per 100  Byars
#> 5 Area1 2016  19 152 12.50000  5.440463 24.17448      99.8% rate per 100  Byars
#> 6 Area2 2016  51 187 27.27273 16.961482 41.27496      99.8% rate per 100  Byars
#> 7 Area3 2016  21 189 11.11111  5.071160 20.85551      99.8% rate per 100  Byars
#> 8 Area4 2016  81 136 59.55882 41.168127 82.99569      99.8% rate per 100  Byars

# specify rate parameters and reduce columns output and remove metadata columns
phe_rate(df, obs, pop, type="standard", confidence=99.8, multiplier=100)
#>    area year obs pop    value   lowercl  uppercl
#> 1 Area1 2015  37 104 35.57692 20.170365 57.65101
#> 2 Area2 2015  96 149 64.42953 45.991324 87.46519
#> 3 Area3 2015  10 161  6.21118  1.811378 15.02716
#> 4 Area4 2015  36 184 19.56522 10.995526 31.90510
#> 5 Area1 2016  19 152 12.50000  5.440463 24.17448
#> 6 Area2 2016  51 187 27.27273 16.961482 41.27496
#> 7 Area3 2016  21 189 11.11111  5.071160 20.85551
#> 8 Area4 2016  81 136 59.55882 41.168127 82.99569

These functions can also return aggregate data if the input dataframes are grouped:

# default proportion - grouped
df %>%
  group_by(year) %>%
  phe_proportion(obs, pop)
#> # A tibble: 2 × 9
#> # Groups:   year [2]
#>    year   obs   pop value lowercl uppercl confidence statistic       method
#>   <int> <int> <int> <dbl>   <dbl>   <dbl> <chr>      <chr>           <chr> 
#> 1  2015   179   598 0.299   0.264   0.337 95%        proportion of 1 Wilson
#> 2  2016   172   664 0.259   0.227   0.294 95%        proportion of 1 Wilson

# default rate - grouped
df %>%
  group_by(year) %>%
  phe_rate(obs, pop)
#> # A tibble: 2 × 9
#> # Groups:   year [2]
#>    year   obs   pop  value lowercl uppercl confidence statistic       method
#>   <int> <int> <int>  <dbl>   <dbl>   <dbl> <chr>      <chr>           <chr> 
#> 1  2015   179   598 29933.  25708.  34654. 95%        rate per 100000 Byars 
#> 2  2016   172   664 25904.  22177.  30077. 95%        rate per 100000 Byars



Aggregate functions

The remaining functions aggregate the rows in the input data frame to produce a single statistic. It is also possible to calculate multiple statistics in a single execution of these functions if the input data frame is grouped - for example by indicator ID, geographic area or time period (or all three). The output contains only the grouping variables and the values calculated by the function - any additional unused columns provided in the input data frame will not be retained in the output.

The df test data generated earlier can be used to demonstrate phe_mean:

Execute phe_mean

INPUT: The phe_mean function take a single data frame as input with a column representing the numbers to be averaged.

OUTPUT: By default, the function outputs one row per grouping set containing the grouping variable values (if applicable), the mean, the lower 95% confidence limit, the upper 95% confidence limit, the confidence level, the statistic name and the method.

OPTIONS: The function also accepts additional arguments to specify the level of confidence and a reduced level of detail to be output.

Here are some example code chunks to demonstrate the phe_mean function and the arguments that can optionally be specified

# default mean
phe_mean(df,obs)
#>   value_sum value_count    stdev  value  lowercl  uppercl confidence statistic
#> 1       351           8 30.57748 43.875 18.31159 69.43841        95%      mean
#>                     method
#> 1 Student's t-distribution

# multiple means in a single execution with 99.8% confidence
df %>%
    group_by(year) %>%
        phe_mean(obs, confidence=0.998)
#> # A tibble: 2 × 10
#> # Groups:   year [2]
#>    year value_sum value_count stdev value lowercl uppercl confi…¹ stati…² method
#>   <int>     <int>       <int> <dbl> <dbl>   <dbl>   <dbl> <chr>   <chr>   <chr> 
#> 1  2015       179           4  36.4  44.8   -141.    231. 99.8%   mean    Stude…
#> 2  2016       172           4  29.3  43     -106.    192. 99.8%   mean    Stude…
#> # … with abbreviated variable names ¹​confidence, ²​statistic

# multiple means in a single execution with 99.8% confidence and data-only output
df %>%
    group_by(year) %>%
        phe_mean(obs, type = "standard", confidence=0.998)
#> # A tibble: 2 × 7
#> # Groups:   year [2]
#>    year value_sum value_count stdev value lowercl uppercl
#>   <int>     <int>       <int> <dbl> <dbl>   <dbl>   <dbl>
#> 1  2015       179           4  36.4  44.8   -141.    231.
#> 2  2016       172           4  29.3  43     -106.    192.

Standardised Aggregate functions

Create some test data for the standardised aggregate functions

The following code chunk creates a data frame containing observed number of events and populations by age band for 4 areas, 5 time periods and 2 sexes:

df_std <- data.frame(
            area = rep(c("Area1", "Area2", "Area3", "Area4"), each = 19 * 2 * 5),
            year = rep(2006:2010, each = 19 * 2),
            sex = rep(rep(c("Male", "Female"), each = 19), 5),
            ageband = rep(c(0, 5,10,15,20,25,30,35,40,45,
                           50,55,60,65,70,75,80,85,90), times = 10),
            obs = sample(200, 19 * 2 * 5 * 4, replace = TRUE),
            pop = sample(10000:20000, 19 * 2 * 5 * 4, replace = TRUE))
head(df_std)
#>    area year  sex ageband obs   pop
#> 1 Area1 2006 Male       0 117 13514
#> 2 Area1 2006 Male       5 168 18639
#> 3 Area1 2006 Male      10 107 12462
#> 4 Area1 2006 Male      15  21 14494
#> 5 Area1 2006 Male      20 168 11097
#> 6 Area1 2006 Male      25  57 15297

Execute phe_dsr

INPUT: The minimum input requirement for the phe_dsr function is a single data frame with columns representing the numerators and denominators for each standardisation category. This is sufficient if the data is:

The 2013 European Standard Population is provided within the package in vector form (esp2013) and is used by default by this function. Alternative standard populations can be used but must be provided by the user. When the function joins a standard population vector to the input data frame it does this by position so it is important that the data is sorted accordingly. This is a user responsibility.

The function can also accept standard populations provided as a column within the input data frame.

OUTPUT: By default, the function outputs one row per grouping set containing the grouping variable values, the total count, the total population, the dsr, the lower 95% confidence limit, the upper 95% confidence limit, the confidence level, the statistic name and the method.

OPTIONS: If standard populations are being provided as a column within the input data frame then the user must specify this using the stdpoptype argument as the function expects a vector by default. The function also accepts additional arguments to specify the standard populations, the level of confidence, the multiplier and a reduced level of detail to be output.

Here are some example code chunks to demonstrate the phe_dsr function and the arguments that can optionally be specified

# calculate separate dsrs for each area, year and sex
df_std %>%
    group_by(area, year, sex) %>%
    phe_dsr(obs, pop)
#> # A tibble: 40 × 11
#> # Groups:   area, year, sex [40]
#>    area   year sex    total_count total_…¹ value lowercl uppercl confi…² stati…³
#>    <chr> <int> <chr>        <int>    <int> <dbl>   <dbl>   <dbl> <chr>   <chr>  
#>  1 Area1  2006 Female        1482   285827  515.    487.    544. 95%     dsr pe…
#>  2 Area1  2006 Male          2022   269373  734.    699.    770. 95%     dsr pe…
#>  3 Area1  2007 Female        1980   276436  774.    737.    812. 95%     dsr pe…
#>  4 Area1  2007 Male          1785   282487  641.    609.    674. 95%     dsr pe…
#>  5 Area1  2008 Female        1752   291891  632.    601.    664. 95%     dsr pe…
#>  6 Area1  2008 Male          2482   279738  898.    859.    938. 95%     dsr pe…
#>  7 Area1  2009 Female        2033   305778  707.    674.    741. 95%     dsr pe…
#>  8 Area1  2009 Male          1765   288388  613.    582.    645. 95%     dsr pe…
#>  9 Area1  2010 Female        1935   285218  670.    638.    703. 95%     dsr pe…
#> 10 Area1  2010 Male          1851   259257  766.    728.    805. 95%     dsr pe…
#> # … with 30 more rows, 1 more variable: method <chr>, and abbreviated variable
#> #   names ¹​total_pop, ²​confidence, ³​statistic

# calculate separate dsrs for each area, year and sex and drop metadata fields from output
df_std %>%
    group_by(area, year, sex) %>%
    phe_dsr(obs, pop, type="standard")
#> # A tibble: 40 × 8
#> # Groups:   area, year, sex [40]
#>    area   year sex    total_count total_pop value lowercl uppercl
#>    <chr> <int> <chr>        <int>     <int> <dbl>   <dbl>   <dbl>
#>  1 Area1  2006 Female        1482    285827  515.    487.    544.
#>  2 Area1  2006 Male          2022    269373  734.    699.    770.
#>  3 Area1  2007 Female        1980    276436  774.    737.    812.
#>  4 Area1  2007 Male          1785    282487  641.    609.    674.
#>  5 Area1  2008 Female        1752    291891  632.    601.    664.
#>  6 Area1  2008 Male          2482    279738  898.    859.    938.
#>  7 Area1  2009 Female        2033    305778  707.    674.    741.
#>  8 Area1  2009 Male          1765    288388  613.    582.    645.
#>  9 Area1  2010 Female        1935    285218  670.    638.    703.
#> 10 Area1  2010 Male          1851    259257  766.    728.    805.
#> # … with 30 more rows

# calculate same specifying standard population in vector form
df_std %>%
    group_by(area, year, sex) %>%
    phe_dsr(obs, pop, stdpop = esp2013)
#> # A tibble: 40 × 11
#> # Groups:   area, year, sex [40]
#>    area   year sex    total_count total_…¹ value lowercl uppercl confi…² stati…³
#>    <chr> <int> <chr>        <int>    <int> <dbl>   <dbl>   <dbl> <chr>   <chr>  
#>  1 Area1  2006 Female        1482   285827  515.    487.    544. 95%     dsr pe…
#>  2 Area1  2006 Male          2022   269373  734.    699.    770. 95%     dsr pe…
#>  3 Area1  2007 Female        1980   276436  774.    737.    812. 95%     dsr pe…
#>  4 Area1  2007 Male          1785   282487  641.    609.    674. 95%     dsr pe…
#>  5 Area1  2008 Female        1752   291891  632.    601.    664. 95%     dsr pe…
#>  6 Area1  2008 Male          2482   279738  898.    859.    938. 95%     dsr pe…
#>  7 Area1  2009 Female        2033   305778  707.    674.    741. 95%     dsr pe…
#>  8 Area1  2009 Male          1765   288388  613.    582.    645. 95%     dsr pe…
#>  9 Area1  2010 Female        1935   285218  670.    638.    703. 95%     dsr pe…
#> 10 Area1  2010 Male          1851   259257  766.    728.    805. 95%     dsr pe…
#> # … with 30 more rows, 1 more variable: method <chr>, and abbreviated variable
#> #   names ¹​total_pop, ²​confidence, ³​statistic

# calculate the same dsrs by appending the standard populations to the data frame
df_std %>%
    mutate(refpop = rep(esp2013,40)) %>%
    group_by(area, year, sex) %>%
    phe_dsr(obs,pop, stdpop=refpop, stdpoptype="field")
#> # A tibble: 40 × 11
#> # Groups:   area, year, sex [40]
#>    area   year sex    total_count total_…¹ value lowercl uppercl confi…² stati…³
#>    <chr> <int> <chr>        <int>    <int> <dbl>   <dbl>   <dbl> <chr>   <chr>  
#>  1 Area1  2006 Female        1482   285827  515.    487.    544. 95%     dsr pe…
#>  2 Area1  2006 Male          2022   269373  734.    699.    770. 95%     dsr pe…
#>  3 Area1  2007 Female        1980   276436  774.    737.    812. 95%     dsr pe…
#>  4 Area1  2007 Male          1785   282487  641.    609.    674. 95%     dsr pe…
#>  5 Area1  2008 Female        1752   291891  632.    601.    664. 95%     dsr pe…
#>  6 Area1  2008 Male          2482   279738  898.    859.    938. 95%     dsr pe…
#>  7 Area1  2009 Female        2033   305778  707.    674.    741. 95%     dsr pe…
#>  8 Area1  2009 Male          1765   288388  613.    582.    645. 95%     dsr pe…
#>  9 Area1  2010 Female        1935   285218  670.    638.    703. 95%     dsr pe…
#> 10 Area1  2010 Male          1851   259257  766.    728.    805. 95%     dsr pe…
#> # … with 30 more rows, 1 more variable: method <chr>, and abbreviated variable
#> #   names ¹​total_pop, ²​confidence, ³​statistic

# calculate for under 75s by filtering out records for 75+ from input data frame and standard population
df_std %>%
    filter(ageband <= 70) %>%
    group_by(area, year, sex) %>%
    phe_dsr(obs, pop, stdpop = esp2013[1:15])
#> # A tibble: 40 × 11
#> # Groups:   area, year, sex [40]
#>    area   year sex    total_count total_…¹ value lowercl uppercl confi…² stati…³
#>    <chr> <int> <chr>        <int>    <int> <dbl>   <dbl>   <dbl> <chr>   <chr>  
#>  1 Area1  2006 Female        1189   234243  523.    493.    554. 95%     dsr pe…
#>  2 Area1  2006 Male          1451   210700  719.    681.    758. 95%     dsr pe…
#>  3 Area1  2007 Female        1422   221255  736.    697.    777. 95%     dsr pe…
#>  4 Area1  2007 Male          1233   225911  573.    541.    607. 95%     dsr pe…
#>  5 Area1  2008 Female        1433   238125  638.    604.    672. 95%     dsr pe…
#>  6 Area1  2008 Male          1822   215570  886.    844.    930. 95%     dsr pe…
#>  7 Area1  2009 Female        1662   236502  716.    681.    753. 95%     dsr pe…
#>  8 Area1  2009 Male          1240   225003  588.    555.    622. 95%     dsr pe…
#>  9 Area1  2010 Female        1522   222696  691.    656.    727. 95%     dsr pe…
#> 10 Area1  2010 Male          1410   188156  773.    732.    815. 95%     dsr pe…
#> # … with 30 more rows, 1 more variable: method <chr>, and abbreviated variable
#> #   names ¹​total_pop, ²​confidence, ³​statistic
    
# calculate separate dsrs for persons for each area and year)
df_std %>%
    group_by(area, year, ageband) %>%
    summarise(obs = sum(obs),
              pop = sum(pop),
              .groups = "drop_last") %>%
    phe_dsr(obs,pop)
#> # A tibble: 20 × 10
#> # Groups:   area, year [20]
#>    area   year total_count total_…¹ value lowercl uppercl confi…² stati…³ method
#>    <chr> <int>       <int>    <int> <dbl>   <dbl>   <dbl> <chr>   <chr>   <chr> 
#>  1 Area1  2006        3504   555200  626.    604.    649. 95%     dsr pe… Dobson
#>  2 Area1  2007        3765   558923  678.    655.    702. 95%     dsr pe… Dobson
#>  3 Area1  2008        4234   571629  737.    714.    761. 95%     dsr pe… Dobson
#>  4 Area1  2009        3798   594166  645.    623.    667. 95%     dsr pe… Dobson
#>  5 Area1  2010        3786   544475  701.    677.    726. 95%     dsr pe… Dobson
#>  6 Area2  2006        3784   584731  667.    645.    690. 95%     dsr pe… Dobson
#>  7 Area2  2007        4203   557600  755.    731.    779. 95%     dsr pe… Dobson
#>  8 Area2  2008        3562   602957  637.    615.    660. 95%     dsr pe… Dobson
#>  9 Area2  2009        3488   571701  622.    601.    645. 95%     dsr pe… Dobson
#> 10 Area2  2010        3754   560347  692.    668.    715. 95%     dsr pe… Dobson
#> 11 Area3  2006        3948   565913  712.    689.    736. 95%     dsr pe… Dobson
#> 12 Area3  2007        4088   573314  683.    661.    706. 95%     dsr pe… Dobson
#> 13 Area3  2008        3835   556410  736.    712.    761. 95%     dsr pe… Dobson
#> 14 Area3  2009        3240   590352  575.    555.    596. 95%     dsr pe… Dobson
#> 15 Area3  2010        2886   584177  496.    476.    515. 95%     dsr pe… Dobson
#> 16 Area4  2006        3926   607962  689.    666.    713. 95%     dsr pe… Dobson
#> 17 Area4  2007        3990   582280  709.    686.    733. 95%     dsr pe… Dobson
#> 18 Area4  2008        3659   572429  618.    597.    640. 95%     dsr pe… Dobson
#> 19 Area4  2009        3619   539841  680.    657.    704. 95%     dsr pe… Dobson
#> 20 Area4  2010        3565   569012  618.    596.    640. 95%     dsr pe… Dobson
#> # … with abbreviated variable names ¹​total_pop, ²​confidence, ³​statistic

Execute calculate_ISRatio and calculate_ISRate

INPUT: Unlike the phe_dsr function, there is no default standard or reference data for the calculate_ISRatio and calculate_ISRate functions. These functions take a single data frame as input, with columns representing the numerators and denominators for each standardisation category, plus reference numerators and denominators for each standardisation category.

The reference data can either be provided in a separate data frame/vectors or as columns within the input data frame:

OUTPUT: By default, the functions output one row per grouping set containing the grouping variable values, the observed and expected counts, the reference rate (ISRate only), the indirectly standardised rate or ratio, the lower 95% confidence limit, and the upper 95% confidence limit, the confidence level, the statistic name and the method.

OPTIONS: If reference data are being provided as columns within the input data frame then the user must specify this as the function expects vectors by default. The function also accepts additional arguments to specify the level of confidence, the multiplier and a reduced level of detail to be output.

The following code chunk creates a data frame containing the reference data - this example uses the all area data for persons in the baseline year:

df_ref <- df_std %>%
    filter(year == 2006) %>%
    group_by(ageband) %>%
    summarise(obs = sum(obs),
              pop = sum(pop),
              .groups = "drop_last")
    
head(df_ref)
#> # A tibble: 6 × 3
#>   ageband   obs    pop
#>     <dbl> <int>  <int>
#> 1       0   749 132885
#> 2       5   757 115939
#> 3      10   841 127102
#> 4      15   826 131611
#> 5      20  1056 117404
#> 6      25   861 122266

Here are some example code chunks to demonstrate the calculate_ISRatio function and the arguments that can optionally be specified

# calculate separate smrs for each area, year and sex
# standardised against the all-year, all-sex, all-area reference data
df_std %>%
    group_by(area, year, sex) %>%
    calculate_ISRatio(obs, pop, df_ref$obs, df_ref$pop)
#> # A tibble: 40 × 11
#> # Groups:   area, year, sex [40]
#>    area   year sex    observed expected value lowercl uppercl confidence stati…¹
#>    <chr> <int> <chr>     <int>    <dbl> <dbl>   <dbl>   <dbl> <chr>      <chr>  
#>  1 Area1  2006 Female     1482    1845. 0.803   0.763   0.845 95%        indire…
#>  2 Area1  2006 Male       2022    1766. 1.14    1.10    1.20  95%        indire…
#>  3 Area1  2007 Female     1980    1816. 1.09    1.04    1.14  95%        indire…
#>  4 Area1  2007 Male       1785    1860. 0.960   0.916   1.01  95%        indire…
#>  5 Area1  2008 Female     1752    1912. 0.916   0.874   0.960 95%        indire…
#>  6 Area1  2008 Male       2482    1831. 1.36    1.30    1.41  95%        indire…
#>  7 Area1  2009 Female     2033    2021. 1.01    0.963   1.05  95%        indire…
#>  8 Area1  2009 Male       1765    1892. 0.933   0.890   0.978 95%        indire…
#>  9 Area1  2010 Female     1935    1882. 1.03    0.983   1.08  95%        indire…
#> 10 Area1  2010 Male       1851    1674. 1.11    1.06    1.16  95%        indire…
#> # … with 30 more rows, 1 more variable: method <chr>, and abbreviated variable
#> #   name ¹​statistic

# calculate the same smrs by appending the reference data to the data frame
# and drop metadata columns from output
df_std %>%
    mutate(refobs = rep(df_ref$obs,40),
           refpop = rep(df_ref$pop,40)) %>%
    group_by(area, year, sex) %>%
    calculate_ISRatio(obs, pop, refobs, refpop, refpoptype="field",
                      type = "standard")
#> # A tibble: 40 × 8
#> # Groups:   area, year, sex [40]
#>    area   year sex    observed expected value lowercl uppercl
#>    <chr> <int> <chr>     <int>    <dbl> <dbl>   <dbl>   <dbl>
#>  1 Area1  2006 Female     1482    1845. 0.803   0.763   0.845
#>  2 Area1  2006 Male       2022    1766. 1.14    1.10    1.20 
#>  3 Area1  2007 Female     1980    1816. 1.09    1.04    1.14 
#>  4 Area1  2007 Male       1785    1860. 0.960   0.916   1.01 
#>  5 Area1  2008 Female     1752    1912. 0.916   0.874   0.960
#>  6 Area1  2008 Male       2482    1831. 1.36    1.30    1.41 
#>  7 Area1  2009 Female     2033    2021. 1.01    0.963   1.05 
#>  8 Area1  2009 Male       1765    1892. 0.933   0.890   0.978
#>  9 Area1  2010 Female     1935    1882. 1.03    0.983   1.08 
#> 10 Area1  2010 Male       1851    1674. 1.11    1.06    1.16 
#> # … with 30 more rows

The calculate_ISRate function works exactly the same way but instead of expressing the result as a ratio of the observed and expected rates the result is expressed as a rate and the reference rate is also provided. Here are some examples:

# calculate separate indirectly standardised rates for each area, year and sex
# standardised against the all-year, all-sex, all-area reference data
df_std %>%
    group_by(area, year, sex) %>%
    calculate_ISRate(obs, pop, df_ref$obs, df_ref$pop)
#> # A tibble: 40 × 12
#> # Groups:   area, year, sex [40]
#>    area   year sex    observed expected ref_rate value lowercl uppercl confide…¹
#>    <chr> <int> <chr>     <int>    <dbl>    <dbl> <dbl>   <dbl>   <dbl> <chr>    
#>  1 Area1  2006 Female     1482    1845.     655.  526.    500.    554. 95%      
#>  2 Area1  2006 Male       2022    1766.     655.  750.    718.    784. 95%      
#>  3 Area1  2007 Female     1980    1816.     655.  715.    683.    747. 95%      
#>  4 Area1  2007 Male       1785    1860.     655.  629.    600.    659. 95%      
#>  5 Area1  2008 Female     1752    1912.     655.  600.    573.    629. 95%      
#>  6 Area1  2008 Male       2482    1831.     655.  888.    853.    924. 95%      
#>  7 Area1  2009 Female     2033    2021.     655.  659.    631.    689. 95%      
#>  8 Area1  2009 Male       1765    1892.     655.  611.    583.    641. 95%      
#>  9 Area1  2010 Female     1935    1882.     655.  674.    644.    704. 95%      
#> 10 Area1  2010 Male       1851    1674.     655.  724.    692.    758. 95%      
#> # … with 30 more rows, 2 more variables: statistic <chr>, method <chr>, and
#> #   abbreviated variable name ¹​confidence

# calculate the same indirectly standardised rates by appending the reference data to the data frame
# and drop metadata columns from output
df_std %>%
    mutate(refobs = rep(df_ref$obs,40),
           refpop = rep(df_ref$pop,40)) %>%
    group_by(area, year, sex) %>%
    calculate_ISRate(obs, pop, refobs, refpop, refpoptype="field",
                     type = "standard")
#> # A tibble: 40 × 9
#> # Groups:   area, year, sex [40]
#>    area   year sex    observed expected ref_rate value lowercl uppercl
#>    <chr> <int> <chr>     <int>    <dbl>    <dbl> <dbl>   <dbl>   <dbl>
#>  1 Area1  2006 Female     1482    1845.     655.  526.    500.    554.
#>  2 Area1  2006 Male       2022    1766.     655.  750.    718.    784.
#>  3 Area1  2007 Female     1980    1816.     655.  715.    683.    747.
#>  4 Area1  2007 Male       1785    1860.     655.  629.    600.    659.
#>  5 Area1  2008 Female     1752    1912.     655.  600.    573.    629.
#>  6 Area1  2008 Male       2482    1831.     655.  888.    853.    924.
#>  7 Area1  2009 Female     2033    2021.     655.  659.    631.    689.
#>  8 Area1  2009 Male       1765    1892.     655.  611.    583.    641.
#>  9 Area1  2010 Female     1935    1882.     655.  674.    644.    704.
#> 10 Area1  2010 Male       1851    1674.     655.  724.    692.    758.
#> # … with 30 more rows