LOAD_PNADC

The load_pnadc function is a wrapper for get_pnadc from the package PNADcIBGE, with added identification algorithms for panel construction. For details on the identification algorithms, see vignette("BUILD_PNADC_PANEL").


Panel Structure:

The table below shows the first and last quarter (ANOtrimestre, e.g. 20121 = 2012 Q1) covered by each PNADC rotating panel:

Panel Start End
1 20121 20124
2 20121 20141
3 20132 20152
4 20143 20163
5 20154 20174
6 20171 20191
7 20182 20202
8 20193 20213
9 20204 20224
10 20221 20241
11 20232 20252
12 20243 20263
13 20254 20274
14 20271 20291

Usage:

Default


load_pnadc(
  save_to = getwd(),
  years,
  quarters = 1:4,
  panel = "advanced",
  raw_data = FALSE,
  save_options = c(TRUE, TRUE),
  vars = NULL
)

To download PNADC data for all quarters of 2022 and 2023, with advanced identification, simply run

load_pnadc(
  save_to = "Directory/You/Would/like/to/save/the/files",
  years = 2022:2023
)

To download PNADC data for all of 2022, but only the first quarter of 2023, run

load_pnadc(
  save_to = "Directory/You/Would/like/to/save/the/files",
  years = 2022:2023,
  quarters = list(1:4, 1)
)

To download PNADC data without any variables treatment or identification (e.g., for all quarters of 2021), run

load_pnadc(
  save_to = "Directory/You/Would/like/to/save/the/files",
  years = 2021,
  panel = "none",
  raw_data = TRUE
)

To download PNADC data, keep the quarters parquet on disk, and save panels as Parquet, run

load_pnadc(
  save_to = "Directory/You/Would/like/to/save/the/files",
  years = 2022,
  save_options = c(TRUE, FALSE)
)

To download PNADC data and save panels as CSV but discard the intermediate quarters parquet, run

load_pnadc(
  save_to = "Directory/You/Would/like/to/save/the/files",
  years = 2022,
  save_options = c(FALSE, TRUE)
)

To download only a specific subset of variables — for example, age (V2009) and habitual income (VD4019) — alongside the structural columns that PNADcIBGE always returns, run

load_pnadc(
  save_to = "Directory/You/Would/like/to/save/the/files",
  years = 2022,
  vars = c("V2009", "VD4019")
)

Note: PNADcIBGE::get_pnadc() always downloads a set of ~210 structural columns regardless of the vars argument. These include survey design weights (V1027, V1028, V1028001V1028200, posest, posest_sxi), deflator variables (Habitual, Efetivo), and identifiers such as UF, Estrato, V1029, V1033, and ID_DOMICILIO. The vars argument adds columns on top of those; it does not restrict them. Use vars = NULL (the default) to download all available microdata columns.

If you specify vars and also request panel identification, any columns required by the identification algorithm that are absent from vars will be added automatically and a warning will tell you which ones were added. For example, when using panel = "advanced", the columns V2007, V20082, V20081, V2008, and V2003 must be present. If you omit them from vars, the function adds them for you:

# Only V2009 requested, but panel = "advanced" (the default) needs
# V2007, V20082, V20081, V2008 and V2003 — these are added automatically
# with a warning.
load_pnadc(
  save_to = "Directory/You/Would/like/to/save/the/files",
  years = 2022,
  panel = "advanced",
  vars = c("V2009", "VD4019")
)

Options:

  1. save_to: The directory in which the user desires to save the downloaded files.

  2. years: picks the years for which the data will be downloaded

  3. quarters: The quarters within those years to be downloaded. Can be either a vector such as 1:4 for consistent quarters across years, or a list of vectors, if quarters are different for each year (e.g. list(1:4, 1:2) for four quarters in the first year and two in the second).

  4. panel: Which panel algorithm to apply to this data. There are three options:

  5. raw_data: A command to define if the user would like to download the raw or treated data. There are two options:

  6. save_options: A logical vector of length 2 controlling file saving behaviour:

  7. vars: A character vector of additional variable names to download, following the same convention as vars in PNADcIBGE::get_pnadc(). Use NULL (the default) to download all available microdata columns. See the note above regarding the ~210 structural columns that are always returned by PNADcIBGE::get_pnadc() regardless of this argument.


Details:

The function performs the following steps:

  1. Loop over years and quarters using PNADcIBGE::get_pnadc to download the data. All quarters are collected in memory and saved together into a single pnadc_quarters.parquet file in save_to.

  2. Split the data into panels by the panel variable V1014. Data from each panel is saved depending on save_options.

  3. Read each panel file and apply the identification algorithms defined in build_pnadc_panel.

  4. If save_options[1] = FALSE, the intermediate quarters parquet is deleted after the panels are built.