When working with Generalized Linear Models it is often useful to
create informative and beautiful summaries of the fitted model
coefficients. The goal of `prettyglm`

is to provide a set of
functions to visualize the Generalized Linear Models coefficients and
performance in interactive plots which can easily be embedded in
rmarkdown reports or separately exported and shared with stakeholders.
This document introduces `prettyglm`

’s main sets of
functions, and shows you how to apply them.

Please see the website prettyglm for more detailed documentation with html outputs, some of the outputs have been excluded from this documentation for publication on CRAN.

If you don’t find the function you are looking for in
`prettyglm`

consider checking out some other great packages
which help visualize the output from glms:

`tidycat`

`jtools`

You can install the latest CRAN release with:

`install.packages('prettyglm')`

To explore the functionality of `prettyglm`

we will use
the titanic data set to perform logistic regression. This data set was
sourced from kaggle
and contains information about passengers aboard the titanic, and a
target variable which indicates if they survived.

```
library(dplyr)
library(prettyglm)
data('titanic')
head(titanic) %>%
select(-c(PassengerId, Name, Ticket)) %>%
knitr::kable(table.attr = "style='width:10%;'" ) %>%
kableExtra::kable_styling(bootstrap_options = c("striped", "hover", "condensed"))
```

Survived | Pclass | Sex | Age | SibSp | Parch | Fare | Cabin | Embarked | Cabintype |
---|---|---|---|---|---|---|---|---|---|

0 | 3 | male | 22 | 1 | 0 | 7.2500 | Missing | S | Missing |

1 | 1 | female | 38 | 1 | 0 | 71.2833 | C85 | C | C |

1 | 3 | female | 26 | 0 | 0 | 7.9250 | Missing | S | Missing |

1 | 1 | female | 35 | 1 | 0 | 53.1000 | C123 | S | C |

0 | 3 | male | 35 | 0 | 0 | 8.0500 | Missing | S | Missing |

0 | 3 | male | NA | 0 | 0 | 8.4583 | Missing | Q | Missing |

A critical step for this package to work is to **set all
categorical predictors as factors**.

```
# Easy way to convert multiple columns to a factor.
columns_to_factor <- c('Pclass',
'Sex',
'Cabin',
'Embarked',
'Cabintype')
meanage <- base::mean(titanic$Age, na.rm=T)
titanic <- titanic %>%
dplyr::mutate_at(columns_to_factor, list(~factor(.))) %>%
dplyr::mutate(Age =base::ifelse(is.na(Age)==T,meanage,Age))
```

For this vignette we will use `stats::glm()`

to build a
logistic regression model. Currently working on support for
`parsnip`

and `workflow`

model objects which use
the `glm`

model engine.

```
survival_model <- stats::glm(Survived ~ Pclass +
Sex +
Fare +
Age +
Embarked +
SibSp +
Parch,
data = titanic,
family = binomial(link = 'logit'))
```

`pretty_coefficients()`

The function `pretty_coefficients()`

allows you to create
a pretty table of model coefficients, which by default includes
categorical base levels.

The simplest way to call this function is just with the model object.

`pretty_coefficients(model_object = survival_model)`

You can also complete a type III test on the coefficients by
specifying a `type_iii`

argument. Warning `Wald`

type III tests will fail if there are aliased coefficients in the
model.

You can change the significance level highlighted in the table with
`significance_level`

.

`pretty_coefficients(survival_model, type_iii = 'Wald', significance_level = 0.1)`

By default `pretty_coefficients`

shows “model” variable
importance. But `vimethod`

also accepts “permute” and “firm”
methods from . Additional parameters for these methods should also be
passed into `pretty_coefficients`

.

```
pretty_coefficients(model_object = survival_model,
type_iii = 'Wald',
significance_level = 0.1,
vimethod = 'permute',
target = 'Survived',
metric = 'auc',
pred_wrapper = predict.glm,
reference_class = 0)
```

`pretty_relativities()`

`pretty_relativities()`

will create a plot of the desired
model variable. A different plot will be generated depending on the
class of the variable.

A model relativity is a transform of the model estimate. By default
`pretty_relativities()`

uses ‘exp(estimate)-1’ which is
useful for GLM’s which use a log or logit link function.

The term ‘relativity’ is some times referred to as “odds-ratio” or
“Likelihood”. You can customize the label with the
`relativity_label`

input.

For categorical variables `pretty_relativities()`

creates
an interactive duel axis plot, which plots the fitted relativity on one
y axis, and the number of records in that category on the other y
axis.

```
pretty_relativities(feature_to_plot= 'Embarked',
model_object = survival_model,
relativity_label = 'Liklihood of Survival'
)
```

For continuous variables `pretty_relativities`

will plot
the relativity over the variables range, and the density of that
variable on a duel axis.

If desired you can cut off the tail end of the distributions with
`upper_percentile_to_cut`

or
`lower_percentile_to_cut`

.

```
pretty_relativities(feature_to_plot= 'Fare',
model_object = survival_model,
relativity_label = 'Liklihood of Survival',
upper_percentile_to_cut = 0.1)
```

To highlight some more of `prettyglm`

’s functionality we
will now build a logistic regression model with some interactions.

```
survival_model2 <- stats::glm(Survived ~ Pclass:Fare +
Age +
Embarked:Sex +
SibSp +
Parch,
data = titanic,
family = binomial(link = 'logit'))
```

You can also choose to facet the plots by one of the variables.

```
pretty_relativities(feature_to_plot= 'Embarked:Sex',
model_object = survival_model2,
relativity_label = 'Liklihood of Survival',
iteractionplottype = 'facet',
facetorcolourby = 'Sex'
)
```

You can also choose to colour the plots by one of the variables.

```
pretty_relativities(feature_to_plot= 'Embarked:Sex',
model_object = survival_model2,
relativity_label = 'Liklihood of Survival',
iteractionplottype = 'colour',
facetorcolourby = 'Embarked'
)
```

You can create these relativity plots as you would for a non-interaction.

```
pretty_relativities(feature_to_plot= 'Embarked:Sex',
model_object = survival_model2,
relativity_label = 'Liklihood of Survival'
)
```

By default continuous and factor interaction plots will colour by the factor variable.

```
pretty_relativities(feature_to_plot= 'Pclass:Fare',
model_object = survival_model2,
relativity_label = 'Liklihood of Survival',
upper_percentile_to_cut = 0.03
)
```

You can also facet by the factor variable.

```
pretty_relativities(feature_to_plot= 'Pclass:Fare',
model_object = survival_model2,
relativity_label = 'Liklihood of Survival',
iteractionplottype = 'facet',
upper_percentile_to_cut = 0.03,
height = 800
)
```

To highlight some more of `prettyglm`

’s functionality we
will now build a logistic regression model with a
**spline**.

`prettyglm`

includes a function `splineit`

to
help construct splines. This can be incorporated in the dplyr workflow
as follows.

For splines to work nicely in `prettyglm`

use the naming
convention Variable#Start#End where # represents your desired
separator.

```
titanic <- titanic %>%
dplyr::mutate(Age_0_18 = prettyglm::splineit(Age,0,18),
Age_18_35 = prettyglm::splineit(Age,18,35),
Age_35_120 = prettyglm::splineit(Age,35,120)) %>%
dplyr::mutate(Fare_0_55 = prettyglm::splineit(Fare,0,55),
Fare_55_600 = prettyglm::splineit(Fare,55,600))
```

```
survival_model4 <- stats::glm(Survived ~ Pclass +
Sex:Fare_0_55 +
Sex:Fare_55_600 +
Age_0_18 +
Age_18_35 +
Age_35_120 +
Embarked +
SibSp +
Parch,
data = titanic,
family = binomial(link = 'logit'))
```

For interactions variables are grouped on the left pane.

`pretty_coefficients(survival_model4, significance_level = 0.1, spline_seperator = '_')`

You also need to provide a `spline_seperator`

input in
`pretty_relativities`

.

```
pretty_relativities(feature_to_plot= 'Age',
model_object = survival_model4,
relativity_label = 'Liklihood of Survival',
spline_seperator = '_'
)
```

By default `pretty_relativities`

will colour by the factor
variable.

```
pretty_relativities(feature_to_plot= 'Sex:Fare',
model_object = survival_model4,
relativity_label = 'Liklihood of Survival',
spline_seperator = '_',
upper_percentile_to_cut = 0.03
)
```

If you prefer to facet by the factor variable, change
`iteractionplottype`

to “facet”

```
pretty_relativities(feature_to_plot= 'Sex:Fare',
model_object = survival_model4,
relativity_label = 'Liklihood of Survival',
spline_seperator = '_',
upper_percentile_to_cut = 0.03,
iteractionplottype = 'facet'
)
```

`one_way_ave()`

For continuous variables `one_way_ave`

will bucket value
into 30 buckets by default, and plot the density on a dual axis.

```
one_way_ave(feature_to_plot = 'Age',
model_object = survival_model4,
target_variable = 'Survived',
data_set = titanic,
upper_percentile_to_cut = 0.1,
lower_percentile_to_cut = 0.1)
```

```
one_way_ave(feature_to_plot = 'Cabintype',
model_object = survival_model4,
target_variable = 'Survived',
data_set = titanic)
```

You can facet the `one_way_ave`

plot by providing a
variable to facet by in `facetby`

.

```
one_way_ave(feature_to_plot = 'Age',
model_object = survival_model4,
target_variable = 'Survived',
facetby = 'Sex',
data_set = titanic,
upper_percentile_to_cut = 0.1,
lower_percentile_to_cut = 0.1)
```

By default `one_way_ave`

uses . If you would like to use
`one_way_ave`

with another model type (which is not
compatible with predict.glm), or provide modified predictions,
`one_way_ave`

allows a custom prediction function.

This function must return a data.frame with two columns: “Actual_Values” and “Predicted_Values”.

```
# Custom Predict Function and facet
a_custom_predict_function <- function(target, model_object, dataset){
dataset <- base::as.data.frame(dataset)
Actual_Values <- dplyr::pull(dplyr::select(dataset, tidyselect::all_of(c(target))))
if(class(Actual_Values) == 'factor'){
Actual_Values <- base::as.numeric(as.character(Actual_Values))
}
Predicted_Values <- base::as.numeric(stats::predict(model_object, dataset, type='response'))
to_return <- base::data.frame(Actual_Values = Actual_Values,
Predicted_Values = Predicted_Values)
to_return <- to_return %>%
dplyr::mutate(Predicted_Values = base::ifelse(Predicted_Values > 0.4,0.4,Predicted_Values))
return(to_return)
}
one_way_ave(feature_to_plot = 'Age',
model_object = survival_model4,
target_variable = 'Survived',
data_set = titanic,
upper_percentile_to_cut = 0.1,
lower_percentile_to_cut = 0.1,
predict_function = a_custom_predict_function)
```

`actual_expected_bucketed()`

```
actual_expected_bucketed(target_variable = 'Survived',
model_object = survival_model4,
data_set = titanic)
```

```
actual_expected_bucketed(target_variable = 'Survived',
model_object = survival_model4,
data_set = titanic,
facetby = 'Sex')
```