---
title: "Grouped Tables and Side-by-Side Comparisons"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Grouped Tables and Side-by-Side Comparisons}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

```{r setup}
library(tsg)
library(dplyr)
```

This vignette covers three common scenarios that go beyond a single basic table:

1. **Get a separate table for each group** — when you need one table per region, sex, or other category.
2. **Compare several indicators side by side** — when you have multiple related columns and want them all in one compact table.
3. **Export many tables at once** — when your report has dozens of tables and you want to manage them efficiently.

We will use the `person_record` sample dataset throughout.

---

## Get a separate table for each group

By default, grouping with `group_by()` produces a single merged table with the group labels in the category column. If you want **one independent table per group instead**, add `group_as_list = TRUE`.

```{r}
person_record |>
  group_by(sex) |>
  generate_frequency(marital_status, group_as_list = TRUE)
```

With **two grouping variables**, the result is automatically nested — you get a list of lists:

```{r}
person_record |>
  filter(age >= 15) |> 
  group_by(sex, employed) |>
  generate_frequency(marital_status, group_as_list = TRUE)
```

The same works with `generate_crosstab()`:

```{r}
person_record |>
  filter(age >= 15) |> 
  group_by(sex) |>
  generate_crosstab(marital_status, employed, group_as_list = TRUE)
```

---

## Add a grand total to grouped tables

When you want to include an "All groups combined" summary alongside the per-group breakdowns, use `group_as_hierarchy = TRUE`.

### Flat table with total rows inserted

Without `group_as_list`, `group_as_hierarchy = TRUE` inserts a grand-total row at each group boundary in the flat output:

```{r}
person_record |>
  group_by(sex) |>
  generate_frequency(marital_status, group_as_hierarchy = TRUE)
```

### Separate tables with a total entry per level

Combine `group_as_list = TRUE` and `group_as_hierarchy = TRUE` to get a nested list where each level includes a special total entry. The total key is labelled with the variable name and the `label_group_hierarchy` setting (default: `"All"`).

```{r}
person_record |>
  group_by(sex) |>
  generate_frequency(
    marital_status,
    group_as_list      = TRUE,
    group_as_hierarchy = TRUE
  )
```

This scales to two grouping variables for a fully nested hierarchy:

```{r}
person_record |>
  filter(age >= 15) |> 
  group_by(sex, employed) |>
  generate_frequency(
    marital_status,
    group_as_list      = TRUE,
    group_as_hierarchy = TRUE
  )
```

### Change the total label

Use `label_group_hierarchy` to rename the `"All"` label. Pass a single string to use the same label everywhere, or a **named vector** to set a different label per grouping variable:

```{r}
person_record |>
  group_by(sex) |>
  generate_frequency(
    marital_status,
    group_as_hierarchy    = TRUE,
    label_group_hierarchy = "Grand Total"
  )
```

```{r}
person_record |>
  filter(age >= 15) |>
  group_by(sex, employed) |>
  generate_frequency(
    marital_status,
    group_as_list         = TRUE,
    group_as_hierarchy    = TRUE,
    label_group_hierarchy = c(sex = "All sexes", employed = "All workers")
  )
```

The same arguments work with `generate_crosstab()`:

```{r}
person_record |>
  filter(age >= 15) |> 
  group_by(sex) |>
  generate_crosstab(
    marital_status,
    employed,
    group_as_list      = TRUE,
    group_as_hierarchy = TRUE
  )
```

---

## Compare several Yes/No indicators side by side

`multiple_columns = TRUE` lets you cross-tabulate a row variable against **multiple indicator columns at once**. Instead of a separate table for each indicator, all results appear in a single wide table — each indicator becomes its own column group.

This is particularly useful for survey modules where several questions share the same response scale. In `person_record`, the functional difficulty columns (`seeing`, `hearing`, `walking`, etc.) use a scale where **1 = No difficulty**, **2 = Some difficulty**, **3 = A lot of difficulty**, and **4 = Cannot do it at all**. The `multiple_columns_filter` argument controls which response value to count (default: `1L`).

### Basic usage

The example below counts respondents who reported **"Some difficulty"** (value `2`) in each domain, broken down by `sex`:

```{r}
person_record |>
  generate_crosstab(
    sex,
    seeing,
    hearing,
    walking,
    remembering,
    self_caring,
    communicating,
    multiple_columns        = TRUE,
    multiple_columns_filter = 2L
  )
```

### Count a different response level

Change `multiple_columns_filter` to target any response level:

```{r, eval=FALSE}
person_record |>
  generate_crosstab(
    sex,
    seeing,
    hearing,
    multiple_columns        = TRUE,
    multiple_columns_filter = 3L   # "A lot of difficulty"
  )
```

### Combining with grouping

All grouping options work with `multiple_columns`. Use `calculate_per_group = TRUE` to compute percentages independently within each group:

```{r}
person_record |>
  group_by(marital_status) |>
  generate_crosstab(
    sex,
    seeing,
    hearing,
    walking,
    multiple_columns        = TRUE,
    multiple_columns_filter = 2L,
    calculate_per_group     = TRUE
  )
```

Use `group_as_list = TRUE` to get a separate table per group:

```{r}
person_record |>
  group_by(marital_status) |>
  generate_crosstab(
    sex,
    seeing,
    hearing,
    walking,
    multiple_columns        = TRUE,
    multiple_columns_filter = 2L,
    group_as_list           = TRUE
  )
```

---

## Full comparison table with hierarchical columns

Setting `multiple_columns_type = "stacked"` changes the layout fundamentally: instead of filtering for a single response value, **every category of every column variable becomes its own column**. The column headers form a hierarchy — the first `...` variable at the top level, the second at the next level, and so on.

This mode is ideal when you want a complete cross-product view: every combination of `marital_status × sex` as separate columns, all in one table.

> `multiple_columns_filter` is ignored in stacked mode — all categories appear automatically.

### Basic stacked table

```{r}
person_record |>
  generate_crosstab(
    age,
    marital_status,
    sex,
    multiple_columns      = TRUE,
    multiple_columns_type = "stacked"
  )
```

The column structure is:
- A **subtotal column** for each top-level category (e.g., all respondents in each marital status group)
- **Leaf columns** for each combination (e.g., single males, single females, married males, …)
- All **frequency** columns come first, then all **percent** columns — this keeps the Excel column spanners clean

### Custom label separator

Use `label_separator` to control how the hierarchy levels are joined in column labels. This also determines how `write_xlsx()` splits labels into multi-row header spanners in Excel.

```{r}
person_record |>
  generate_crosstab(
    age,
    marital_status,
    sex,
    multiple_columns      = TRUE,
    multiple_columns_type = "stacked",
    label_separator       = " | ",
    add_percent           = FALSE
  )
```

### Three or more column variables

Add more column variables to create deeper hierarchies. Each additional variable adds another level of column splitting:

```{r}
person_record |>
  generate_crosstab(
    age,
    marital_status,
    sex,
    seeing,
    multiple_columns      = TRUE,
    multiple_columns_type = "stacked",
    add_percent           = FALSE
  )
```

### Combining with grouping

Stacked mode supports all grouping options:

```{r}
person_record |>
  filter(age >= 15) |>
  group_by(employed) |>
  generate_crosstab(
    marital_status,
    sex,
    seeing,
    multiple_columns      = TRUE,
    multiple_columns_type = "stacked",
    calculate_per_group   = TRUE,
    add_percent           = FALSE
  )
```

---

## Export many tables to one Excel file

When your analysis produces many tables, combine them into a named list and pass the whole list to `write_xlsx()`. Each list element becomes a separate worksheet.

### Basic multi-sheet export

```{r, eval=FALSE}
tables <- list(
  "Sex"            = person_record |> generate_frequency(sex),
  "Marital Status" = person_record |> generate_frequency(marital_status),
  "Marital × Sex"  = person_record |>
    generate_crosstab(marital_status, sex) |>
    add_table_title("Marital Status by Sex") |>
    add_table_subtitle("Row percentages") |>
    add_footnote("Missing values are excluded from the denominator.")
)

write_xlsx(tables, path = "report.xlsx")
```

### Add an index sheet

Set `include_table_list = TRUE` to prepend an auto-generated index sheet:

```{r, eval=FALSE}
write_xlsx(tables, path = "multi-sheet-indexed.xlsx", include_table_list = TRUE)
```

### Save each table to its own file

Set `separate_files = TRUE` and provide a folder path:

```{r, eval=FALSE}
write_xlsx(tables, path = "output-tables/", separate_files = TRUE)
```

### End-to-end example

Here is a realistic pipeline that builds several tables, attaches metadata, applies a style, and exports everything to a single indexed workbook:

```{r, eval=FALSE}
# 1. Build tables
freq_sex <- person_record |>
  generate_frequency(sex) |>
  add_table_title("Distribution by Sex") |>
  add_source_note("Source: person_record dataset")

crosstab_marital_sex <- person_record |>
  generate_crosstab(marital_status, sex) |>
  add_table_title("Marital Status by Sex") |>
  add_table_subtitle("Row percentages") |>
  add_footnote("Missing values are excluded from the denominator.")

difficulties_wide <- person_record |>
  generate_crosstab(
    sex,
    seeing, hearing, walking, remembering, self_caring, communicating,
    multiple_columns        = TRUE,
    multiple_columns_filter = 2L   # count "Some difficulty" responses
  ) |>
  add_table_title("Functional Difficulties by Sex (Some difficulty)")

# 2. Combine into a named list
workbook_tables <- list(
  "1. Sex"           = freq_sex,
  "2. Marital x Sex" = crosstab_marital_sex,
  "3. Difficulties"  = difficulties_wide
)

# 3. Export with a style and an index sheet
write_xlsx(
  workbook_tables,
  path               = "report.xlsx",
  facade             = get_tsg_facade("yolo"),
  include_table_list = TRUE
)
```

---

## Controlling where footnotes appear

### Left and right placement

`add_footnote()` accepts a `placement` argument (`"auto"`, `"left"`, or `"right"`) to align the footnote text. Use left alignment for source citations and right alignment for methodological notes — this mirrors APA and AAPOR conventions.

```{r}
person_record |>
  generate_frequency(sex) |>
  add_table_title("Sex distribution") |>
  add_footnote("Source: National Survey 2023.", placement = "left") |>
  add_footnote("Weighted estimates.", placement = "right")
```

### Linking a footnote to a column header (HTML / PDF)

Pass column names via `locations` to place a footnote marker in the column header. This is supported in HTML and PDF output (via `gt`); Excel and Word output include the footnote text without cell-level markers.

```{r}
person_record |>
  generate_crosstab(marital_status, sex) |>
  add_footnote(
    "Counts exclude respondents with unknown marital status.",
    locations = c("frequency_1", "frequency_2")
  )
```

### Chaining multiple footnotes

Each `add_footnote()` call appends to the list. Different footnotes can have different placements and locations:

```{r}
person_record |>
  generate_frequency(sex) |>
  add_footnote("Source: National Survey 2023.") |>
  add_footnote("Weighted estimates.", placement = "right") |>
  add_footnote("Counts may not sum to total due to rounding.",
               locations = "frequency")
```

---

## Tips

- When building large workbooks, attach `add_table_title()` and `add_table_subtitle()` to each element **before** combining into the list — metadata is preserved per sheet.
- `group_as_hierarchy = TRUE` with `group_as_list = TRUE` is most useful for hierarchical administrative data (e.g., national → regional → district breakdowns).
- `multiple_columns = TRUE` is designed for survey modules where several indicator columns share the same response scale.
- `multiple_columns_type = "stacked"` is best for producing a complete cross-product comparison table.