Calendar-based graphics

Earo Wang, Di Cook, Rob J Hyndman

Introduction

## Warning: package 'knitr' was built under R version 4.3.1

Calendar-based graphics turn out to be a useful tool for visually unfolding people’s daily schedules in detail, such as hourly foot traffic in the CBD, daily residential electricity demand and etc. It arranges the values according to the corresponding dates into a calendar layout, which is comprised of weekdays in columns and weeks of a month in rows for a common monthly calendar. The idea originates from Van Wijk and Van Selow (1999) and is implemented in a couple of R packages (ggTimeSeries and ggcal), yet they all are a variant of heatmap in temporal context. We extend the calendar-based graphics to a broader range of applications using linear algebra tools. For example, (1) it not only handles the data of daily intervals but also of higher frequencies like hourly data; (2) it is no longer constrained to a heatmap but can be used with other types of Geoms; (3) the built-in calendars include monthly, weekly, and daily types for the purpose of comparison between different temporal components. The frame_calendar() function returns the computed calendar grids as a data frame or a tibble according to its data input, and ggplot2 takes care of the plotting as you usually do with a data frame.

We are going to use Melbourne pedestrian data (shipped with the package) as an example throughout the vignette, which is sourced from Melbourne Open Data Portal. The subset of the data contains 7 sensors counting foot traffic at hourly intervals across the city of Melbourne from January to April in 2017.

library(tidyr)
library(dplyr)
library(viridis)
library(sugrrants)
pedestrian17 <- filter(hourly_peds, Year == "2017")
pedestrian17
#> # A tibble: 19,488 × 10
#>   Date_Time           Date        Year Month   Mdate Day     Time Sensor_ID
#>   <dttm>              <date>     <dbl> <ord>   <dbl> <ord>  <dbl>     <dbl>
#> 1 2017-01-01 00:00:00 2017-01-01  2017 January     1 Sunday     0        18
#> 2 2017-01-01 00:00:00 2017-01-01  2017 January     1 Sunday     0        13
#> 3 2017-01-01 00:00:00 2017-01-01  2017 January     1 Sunday     0         9
#> 4 2017-01-01 00:00:00 2017-01-01  2017 January     1 Sunday     0         6
#> 5 2017-01-01 00:00:00 2017-01-01  2017 January     1 Sunday     0        25
#> # ℹ 19,483 more rows
#> # ℹ 2 more variables: Sensor_Name <chr>, Hourly_Counts <dbl>

We’ll start with one sensor only–Melbourne Convention Exhibition Centre–to explain the basic use of the frame_calendar(). As it attempts to fit into the tidyverse framework, the interface should be straightforward to those who use tidyverse on a daily basis. The first argument is the data so that the data frame can directly be piped into the function using %>%. A variable indicating time of day could be mapped to x, a value variable of interest mapped to y. date requires a Date variable to organise the data into a correct chronological order . See ?frame_calendar() for more options. In this case, Time as hour of day is used for x and Hourly_Counts as value for y. It returns a data frame including newly added columns .Time and .Hourly_Counts with a “.” prefixed to the variable names. These new columns contain the rearranged coordinates for the calendar plots later.

centre <- pedestrian17 %>% 
  filter(Sensor_Name == "Melbourne Convention Exhibition Centre")
centre_calendar <- centre %>%
  frame_calendar(x = Time, y = Hourly_Counts, date = Date, calendar = "monthly")
centre_calendar
#> # A tibble: 2,880 × 12
#>   Date_Time           Date        Year Month   Mdate Day     Time Sensor_ID
#>   <dttm>              <date>     <dbl> <ord>   <dbl> <ord>  <dbl>     <dbl>
#> 1 2017-01-01 00:00:00 2017-01-01  2017 January     1 Sunday     0        25
#> 2 2017-01-01 01:00:00 2017-01-01  2017 January     1 Sunday     1        25
#> 3 2017-01-01 02:00:00 2017-01-01  2017 January     1 Sunday     2        25
#> 4 2017-01-01 03:00:00 2017-01-01  2017 January     1 Sunday     3        25
#> 5 2017-01-01 04:00:00 2017-01-01  2017 January     1 Sunday     4        25
#> # ℹ 2,875 more rows
#> # ℹ 4 more variables: Sensor_Name <chr>, Hourly_Counts <dbl>, .Time <dbl>,
#> #   .Hourly_Counts <dbl>

Consequently, .Time and .Hourly_Counts are mapped to the x and y axes respectively, grouped by Date when using geom_line(). The transformed .Time and .Hourly_Counts variables no longer carry their initial meanings, and thereby their values are meaningless.

p1 <- centre_calendar %>% 
  ggplot(aes(x = .Time, y = .Hourly_Counts, group = Date)) +
  geom_line()
p1

To make the plot more accessible and informative, we provide another function prettify() to go hand in hand with frame_calendar(). It takes a ggplot object and gives sensible breaks and labels. It can be noted that the calendar-based graphic depicts time of day, day of week, and other calendar effects like public holiday in a clear manner.

prettify(p1)

Scales

Scaling is controlled by the scale argument: fixed is the default suggesting to be scaled globally. The figure above shows the global scale that enables overall comparison. Another option free means to be scaled for each daily block individually. It puts more emphasis on a single day shape instead of magnitude comparison.

centre_calendar_free <- centre %>%
  frame_calendar(x = Time, y = Hourly_Counts, date = Date, calendar = "monthly",
    scale = "free", ncol = 4)
p2 <- ggplot(centre_calendar_free, 
        aes(x = .Time, y = .Hourly_Counts, group = Date)) +
  geom_line()
prettify(p2)

The other two choices are free_wday and free_mday, scaled conditionally on each weekday and each day of month respectively. The code snippet below gives the scaling by weekdays so that it enables to compare the magnitudes across Mondays, Tuesdays, and so on.

centre_calendar_wday <- centre %>%
  frame_calendar(x = Time, y = Hourly_Counts, date = Date, calendar = "monthly",
    scale = "free_wday", ncol = 4)
p3 <- ggplot(centre_calendar_wday, 
        aes(x = .Time, y = .Hourly_Counts, group = Date)) +
  geom_line()
prettify(p3)

Use in conjunction with group_by

We can also superimpose one sensor on top of the other. Without using group_by(), they will share the common scale on the overlaying graph.

two_sensors <- c("Lonsdale St (South)", "Melbourne Convention Exhibition Centre")
two_sensors_df <- pedestrian17 %>%
  filter(Sensor_Name %in% two_sensors)
two_sensors_calendar <- two_sensors_df %>%
  frame_calendar(x = Time, y = Hourly_Counts, date = Date, ncol = 4)
p4 <- ggplot(two_sensors_calendar) +
  geom_line(
    data = filter(two_sensors_calendar, Sensor_Name == two_sensors[1]),
    aes(.Time, .Hourly_Counts, group = Date), colour = "#1b9e77"
  ) +
  geom_line(
    data = filter(two_sensors_calendar, Sensor_Name == two_sensors[2]),
    aes(.Time, .Hourly_Counts, group = Date), colour = "#d95f02"
  )
prettify(p4)

The frame_calendar() function can be naturally combined with group_by(). Each grouping variable will have its own scale, making their magnitudes incomparable across different sensors.

grped_calendar <- two_sensors_df %>% 
  group_by(Sensor_Name) %>%
  frame_calendar(x = Time, y = Hourly_Counts, date = Date, ncol = 4)
p5 <- grped_calendar %>%
  ggplot(aes(x = .Time, y = .Hourly_Counts, group = Date)) +
  geom_line(aes(colour = Sensor_Name)) +
  facet_grid(Sensor_Name ~ .) +
  scale_color_brewer(palette = "Dark2") +
  theme(legend.position = "bottom")
prettify(p5)