The load_pnadc function is a wrapper for get_pnadc
from the package PNADcIBGE, with added identification
algorithms for panel construction. For details on the identification
algorithms, see vignette("BUILD_PNADC_PANEL").
Panel Structure:
The table below shows the first and last quarter
(ANOtrimestre, e.g. 20121 = 2012 Q1) covered
by each PNADC rotating panel:
| Panel | Start | End |
|---|---|---|
| 1 | 20121 | 20124 |
| 2 | 20121 | 20141 |
| 3 | 20132 | 20152 |
| 4 | 20143 | 20163 |
| 5 | 20154 | 20174 |
| 6 | 20171 | 20191 |
| 7 | 20182 | 20202 |
| 8 | 20193 | 20213 |
| 9 | 20204 | 20224 |
| 10 | 20221 | 20241 |
| 11 | 20232 | 20252 |
| 12 | 20243 | 20263 |
| 13 | 20254 | 20274 |
| 14 | 20271 | 20291 |
Usage:
Default
load_pnadc(
save_to = getwd(),
years,
quarters = 1:4,
panel = "advanced",
raw_data = FALSE,
save_options = c(TRUE, TRUE),
vars = NULL
)To download PNADC data for all quarters of 2022 and 2023, with advanced identification, simply run
To download PNADC data for all of 2022, but only the first quarter of 2023, run
load_pnadc(
save_to = "Directory/You/Would/like/to/save/the/files",
years = 2022:2023,
quarters = list(1:4, 1)
)To download PNADC data without any variables treatment or identification (e.g., for all quarters of 2021), run
load_pnadc(
save_to = "Directory/You/Would/like/to/save/the/files",
years = 2021,
panel = "none",
raw_data = TRUE
)To download PNADC data, keep the quarters parquet on disk, and save panels as Parquet, run
load_pnadc(
save_to = "Directory/You/Would/like/to/save/the/files",
years = 2022,
save_options = c(TRUE, FALSE)
)To download PNADC data and save panels as CSV but discard the intermediate quarters parquet, run
load_pnadc(
save_to = "Directory/You/Would/like/to/save/the/files",
years = 2022,
save_options = c(FALSE, TRUE)
)To download only a specific subset of variables — for example, age
(V2009) and habitual income (VD4019) —
alongside the structural columns that PNADcIBGE always
returns, run
load_pnadc(
save_to = "Directory/You/Would/like/to/save/the/files",
years = 2022,
vars = c("V2009", "VD4019")
)Note:
PNADcIBGE::get_pnadc()always downloads a set of ~210 structural columns regardless of thevarsargument. These include survey design weights (V1027,V1028,V1028001–V1028200,posest,posest_sxi), deflator variables (Habitual,Efetivo), and identifiers such asUF,Estrato,V1029,V1033, andID_DOMICILIO. Thevarsargument adds columns on top of those; it does not restrict them. Usevars = NULL(the default) to download all available microdata columns.
If you specify vars and also request panel
identification, any columns required by the identification algorithm
that are absent from vars will be added automatically and a
warning will tell you which ones were added. For example, when using
panel = "advanced", the columns V2007,
V20082, V20081, V2008, and
V2003 must be present. If you omit them from
vars, the function adds them for you:
# Only V2009 requested, but panel = "advanced" (the default) needs
# V2007, V20082, V20081, V2008 and V2003 — these are added automatically
# with a warning.
load_pnadc(
save_to = "Directory/You/Would/like/to/save/the/files",
years = 2022,
panel = "advanced",
vars = c("V2009", "VD4019")
)Options:
save_to: The directory in which the user desires to save the downloaded files.
years: picks the years for which the data will be downloaded
quarters: The quarters within those years to be
downloaded. Can be either a vector such as 1:4 for
consistent quarters across years, or a list of vectors, if quarters are
different for each year (e.g. list(1:4, 1:2) for four
quarters in the first year and two in the second).
panel: Which panel algorithm to apply to this data. There are three options:
none: No panel is built. If
raw_data = TRUE, returns the original data. Otherwise,
creates some extra treated variables. The intermediate quarters parquet
is always kept when panel = "none".basic: Performs basic identification steps for creating
households and individual identifiers for panel constructionadvanced: Performs advanced identification steps for
creating households and individual identifiers for panel
construction.raw_data: A command to define if the user would like to download the raw or treated data. There are two options:
TRUE: if you want the PNADC variables as they
come.FALSE: if you want the treated version of the PNADC
variables.save_options: A logical vector of length 2 controlling file saving behaviour:
c(TRUE, TRUE) (default): keeps the intermediate
quarters parquet after panel is built; saves panel files as
.csv.c(FALSE, TRUE): deletes the quarters parquet after use;
saves panel files as .csv.c(TRUE, FALSE): keeps the quarters parquet; saves panel
files as a .parquet dataset.c(FALSE, FALSE): deletes the quarters parquet after
use; saves panel files as a .parquet dataset.vars: A character vector of additional variable
names to download, following the same convention as vars in
PNADcIBGE::get_pnadc(). Use NULL (the default)
to download all available microdata columns. See the note above
regarding the ~210 structural columns that are always returned by
PNADcIBGE::get_pnadc() regardless of this
argument.
Details:
The function performs the following steps:
Loop over years and quarters using
PNADcIBGE::get_pnadc to download the data. All quarters are
collected in memory and saved together into a single
pnadc_quarters.parquet file in
save_to.
Split the data into panels by the panel variable
V1014. Data from each panel is saved depending on
save_options.
Read each panel file and apply the identification algorithms
defined in build_pnadc_panel.
If save_options[1] = FALSE, the intermediate
quarters parquet is deleted after the panels are built.
build_pnadc_panel are
drawn from Ribas, Rafael Perez, and Sergei Suarez Dillon Soares (2008):
“Sobre o painel da Pesquisa Mensal de Emprego (PME) do IBGE”.