Corset plots are a visualization technique used to visualize repeat measures at 2 time points (such as pre- and post- data). Specifically, corset plots visualize the distribution of measurements at each time point, as well as the trajectories of individual change. This method of visualization is ideal for showing the heterogeneity of data and differences by sub-groups.
The ‘ggcorset’ package relies on ‘ggplot2’ to make its visualizations, allowing for easy customization and integration with other ‘ggplot2’ objects. This vignette will introduce uses of this visualization technique, as well as how to make customizations using ‘ggplot2’. For specific functionality, the GitHub README1 or the CRAN Package Manual2 should be consulted.
The ‘ggcorset’ package provides two functions –
gg_corset()
and gg_corset_elongated()
– which
produce visualizations for wide and long data formats, respectively. The
visualizations produced by these functions are identical.
The c_var variable is a user-defined variable that
make the individual-level changes interpretable, and should be chosen
strategically. This variable can be a discrete value reflecting change
types or sub-groups. It can also be a continuous value to highlight
levels of change. As such, various c_var
strategies are
outlined in this vignette.
A corset plot consists of:
distribution at each time point using half violins (overall distributions and/ or distributions by group);
individual trajectories of change between each time point; and optionally
‘eyelets’ showing either the standard error mean (SEM), or mean +/- 1 standard deviation (SD) by group
The plot above was created using the sample ggcorset data
“drinkdays”, and by creating a variable which captures the direction of
change across the two time points. The theme_ggcorset()
was
used to polish the visualization alongside a few ggplot2 commands to
customize the titles and axes, as denoted below:
library(ggcorset)
library(ggplot2)
# LOAD "drinkdays" data from the ggcorset package
data("drinkdays")
# Magnitude of change: Subtract T1 from T2
drinkdays$change <- drinkdays$time2-drinkdays$time1
# Direction of Change
drinkdays$direction <- ifelse(drinkdays$change<0,"Decrease",
ifelse(drinkdays$change>0,"Increase","No Change"))
drinkdays$direction <- factor(drinkdays$direction,
levels = c("Increase","No Change","Decrease"))
# CORSET PLOT: Basic plot + theme_ggcorset()
plot <- gg_corset(drinkdays, y_var1 = "time1", y_var2 = "time2",
c_var = "direction", group = "id", eyelets = T) +
theme_ggcorset()
# ADDITIONAL ggplot2 commands
plot +
# Changes legend title, and selects a colour-palette
scale_colour_manual("Direction of Change",
values = MetBrewer::met.brewer("Ingres",3)) +
# Changes the plot title
ggtitle("Change in Drinking Days") +
# Changes the y-axis title
ylab("Number of Drinking Days per Week") +
# Changes the x-axis title (removes in favour of the 2 time point labels below)
xlab("") +
# Changes the labels of the 2 time points (on the x-axis)
scale_x_discrete(labels = c("Pre","Post")) +
# Makes the legend lines thicker
guides(colour = guide_legend(override.aes = list(linewidth = 3)))
Corset plots can also be faceted using the
faceted = TRUE
argument. There are 3 design options
available using the facet_design
. Eyelets can be used in
conjunction with the faceted corset plots, if desired. The default
facet_design
is “original” which simply facets the corset
plots by the c_var
.
The “group” option allows for the overall distribution to be plotted
in the same colour as the vio_fill
whilst a semi-opaque
distribution of the c_var
is situated ontop the overall
distribution.
gg_corset(drinkdays, y_var1 = "time1", y_var2 = "time2",
c_var = "direction", group = "id", faceted = T, facet_design = "group") +
theme_ggcorset() +
scale_colour_manual("Direction of Change",
values = MetBrewer::met.brewer("Demuth",3, direction = -1)) +
scale_fill_manual(values = MetBrewer::met.brewer("Demuth",3, direction = -1)) +
ggtitle("Change in Drinking Days") +
ylab("Number of Drinking Days per Week") + xlab("") +
scale_x_discrete(labels = c("Pre","Post")) +
guides(colour = guide_legend(override.aes = list(linewidth = 3)))
The “line” option displays all individual trajectories in the background of each facet. This allows for the change in distribution over the two time points by group to be the primary focus.
gg_corset(drinkdays, y_var1 = "time1", y_var2 = "time2",
c_var = "direction", group = "id", faceted = T, facet_design = "line") +
theme_ggcorset() +
scale_colour_manual("Direction of Change",
values = MetBrewer::met.brewer("Demuth",3)) +
ggtitle("Change in Drinking Days") +
ylab("Number of Drinking Days per Week") + xlab("") +
scale_x_discrete(labels = c("Pre","Post")) +
guides(colour = guide_legend(override.aes = list(linewidth = 3)))
Corset plots also lend themselves to showing continuous values of change, which can act as a heatmap of sorts, to visualize extreme values as well as overall trends.
gg_corset(drinkdays, y_var1 = "time1", y_var2 = "time2",
c_var = "change", group = "id", line_size = 0.5) +
theme_ggcorset() +
viridis::scale_colour_viridis(
option = "cividis", direction = -1,
breaks = c(-7,0,7), # can set the legend limits here (min and max)
name = "") + # can rename the legend title here or omit
ggtitle("Change in Drinking Days") +
ylab("Number of Drinking Days per Week") + xlab("") +
scale_x_discrete(labels = c("Pre","Post"))
As noted previously, corset plots are a great way to visualize
heterogeneity amongst data. Thus, the magnitude of homogeneity across
groups can be inspected by using a c_var
variable for
observed or latent sub-groups. Some examples using simulated3
data from the “drinkdays” data set are age categories and latent classes
(derived from a latent class analysis).
In the first example, individuals were categorized by young adults (29 years of age and younger) and adults (30 years of age and older). Young adults often partake in heavy episodic drinking, a phenomenon whereby individuals ‘age out’ of risky drinking behaviours over time. As seen in the faceted corset plots, those under 30 years of age are seeing decreases in their number of drinking days per week, compared to those 30 years of age and older who are more stable in their drinking days over time.
By plotting the data using corset plots, we are able to see the heterogeneity still present by age categories, notably how some <30 adults are maintaining their drinking levels and those 30+ have increased in some instances.
In the second example, individuals are categorized into three latent classes which were obtained by numerous drinking and mental health outcomes. The corset plots are visualizing distal outcomes of drinking days by these latent classes. Class 1 shows a notable decrease in drinking days (a decrease of 4 days on average); Class 2 shows an overall increase, with many individuals showing no change in their alcohol use; and Class 3 shows a less drastic decrease in drinking days than Class 1, whereby the overall decrease is 2 days per week (with many individuals in the class seeing a change in magnitude of 1 day), and no individuals consumed alcohol 7-days a week.
By plotting the distal outcome of drinking days by a latent class, we are able to see how well these latent classes were able to predict drinking outcomes - in this instance, drinking days per week.
1 https://github.com/kbelisar/ggcorset/blob/main/README.md
2 https://cran.r-project.org/package=ggcorset
3 The “drinkdays” data from the ‘ggcorset’ package are
simulated data. Age and latent classes (not available in the package)
are also simulated categories to demonstrate the possible uses of
c_var
, and are based on trends observed in the research of
substance use.