---
title: 'BUILD_PNADC_PANEL'
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{BUILD_PNADC_PANEL}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

library(datazoom.social)
```

***
**Usage:**

Basic Panel

```{r eval=TRUE}

panel_data <- build_pnadc_panel(dat = pnad_sample, panel = "basic")

```

Advanced Panel

```{r eval=TRUE}

panel_data <- build_pnadc_panel(dat = pnad_sample, panel = "advanced")

```

***
**Description**

Our `load_pnadc` function uses the internal function `build_pnadc_panel` to identify households and individuals across quarters. The method used for the identification is based on the paper of Ribas, Rafael Perez, and Sergei Suarez Dillon Soares (2008): "Sobre o painel da Pesquisa Mensal de Emprego (PME) do IBGE". 

***

## Basic Identification

The household identifier -- stored as `id_dom` -- combines the variables:

* `UPA` -- Primary Sampling Unit - PSU;

* `V1008` -- Household;

* `V1014` -- Panel Number;

In order to create a unique number for every combination of those variables.

***

The basic individual identifier -- stored as `id_ind` -- combines the household id with:

* `V2007` -- Sex;

* Date of Birth -- [`V20082` (year), `V20081` (month), `V2008` (day)];

In order to create an unique number for every combination of those variables.

***

## Advanced Identification

The advanced identifier is saved as `id_rs`. On individuals who were not matched on all interviews, we relax some assumptions to increase matching power. Under the assumption that the date of birth is often misreported, we take individuals who are either:

1. Head of the household or their partner

2. Child of the head of the household, 25 or older

For these observations, we run the basic identification again, but allowing the year of birth to be wrong. We also include the order number.

## Attrition

The tables below show the levels of attrition obtained using the basic and advanced identification algorithms, and compares them to the attrition levels obtained in the Stata `datazoom_social` package.

```{r, echo = FALSE}
knitr::kable(
  datazoom.social:::atrito,
  col.names = c(
    "Interview",
    "Percentage found (R)",
    "Percentage found (Stata)"
  ),
  caption = "Attrition for Panel 2"
)
```

Each cell is the percentage of PNADC observations that are identified by the advanced algorithm in each interview.