#### Perturbation parameters for count variables

The next task is to define parameters that are used to perturb count
variables which can be achieved with `ck_params_cnts`

. This
function requires as input the result of either
`pt_create_pParams`

, `pt_create_pTable`

or
`create_cnt_ptable`

from the **ptable**
package. Please refer also to the documentation of this package for
information on the required parameters. In this example we are going to
use - amongst others - exemplary ptables that can are provided by the
`ptable`

-pkg for demonstration purposes:

```
# two different perturbation parameter sets from the ptable-pkg
# an example ptable provided directly
ptab1 <- ptable::pt_ex_cnts()
# creating a ptable by specifying parameters
para2 <- ptable::create_cnt_ptable(
D = 8, V = 3, js = 2, pstay = 0.5,
optim = 1, mono = TRUE)
```

We then need to create the required inputs for the cellKey
package.

```
p_cnts1 <- ck_params_cnts(ptab = ptab1)
p_cnts2 <- ck_params_cnts(ptab = para2)
```

`ck_params_cnts()`

returns objects that can be used as
inputs in method `params_cnts_set()`

. In argument
`v`

one may specify count variables for which the supplied
perturbation parameters should be used. If `v`

is not
specified, the perturbation parameters are used for all count
variables.

```
# use `p_cnts1` for variable "total" (which always exists)
tab$params_cnts_set(val = p_cnts1, v = "total")
```

`## --> setting perturbation parameters for variable 'total'`

```
# use `p_cnts2` for "cnt_highincome"
tab$params_cnts_set(val = p_cnts2, v = "cnt_highincome")
```

`## --> setting perturbation parameters for variable 'cnt_highincome'`

It is therefore entirely possible to use different parameter sets for
different variables. Modifying perturbation parameters for some
variables is easy, too. It is only required to apply the
`params_cnts_set()`

-method again which will replace any
previously defined parameters.

#### Perturbation parameters for continuous variables

Setting and defining perturbation parameters for continuous variables
works similarily. The required functions are
`ck_params_num()`

to create input objects that can be set
with the `params_nums_set`

method. Please note that it is
possibly by specifying the `path`

argument in both
`ck_params_nums()`

and `ck_params_cnts()`

to save
the parameters additionally as yaml-file. Using
`ck_read_yaml()`

, these files can later be imported again.
This feature is useful for re-using parameter settings.

The underlying framework on how to perturb continuous tables differs
from the proposed method from `ABS`

. One possible approach is
based on a *“flex function”*. This approach (which is described
in deliverable `D4.2`

in the project perturbative
confidentiality methods allows to apply different magnitude of noise
to larger and smaller cells. Users can define the required parameters
for the flex-approach with function `ck_flexparams()`

. The
required inputs are:

`fp`

: the flexpoint defining at which point should the
underlying noise coefficient function reach its desired maximum (which
is defined by the first element of `p`

)
`p`

: numeric vector of length `2`

with
`p[1] > p[2]`

where both elements specify a percentage.
The first value refers to the desired maximum perturbation percentage
for small cells (depending on fp) while the second element refers to the
desired maximum perturbation percentage for large cells.
`epsilon`

: a numeric vector in descending order with all
values in `[0; 1]`

and with the first element forced to equal
`1`

. The length of this parameter must correspond with the
number of `top_k`

specified in `ck_params_nums()`

(which will be discussed later).

```
# parameters for the flex-function
p_flex <- ck_flexparams(
fp = 1000,
p = c(0.3, 0.03),
epsilon = c(1, 0.5, 0.2))
```

In the **cellKey**
package it is possible to select the underlying data that form the base
for the perturbation differently. In `ck_params_nums()`

the
specific approach can be selected in argument `type`

. The
valid choices for this argument are:

`"top_contr"`

: the `k`

largest contributions
to each cell are used in the perturbation procedure with the number
`k`

required to be specified in argument
`top_k`

`"mean"`

: weighted cellmeans are used as starting
points
`"range"`

: the difference between largest and smallest
unweighted contributions for each cell are used as base for the
perturbation procedure
`"sum"`

: weighted cellvalues are used as starting points
for the perturbation

Another, more basic approach, is to use a constant perturbation
magnitude for all cells, independent on their (weighted) values. The
required parameters can be defined with `ck_simpleparams()`

as shown below:

```
# parameters for the simple approach
p_simple <- ck_simpleparams(
p = 0.05,
epsilon = 1)
```

In this appraoch it is only required to specify a single percentage
value `p`

and - as in the case for the flex function - a
vector of epsilons that are used in the case when
`top_k > 1`

.

Further important parameters for `ck_params_nums()`

are:

`mu_c`

: an extra amount of perturbation applied to
sensitive cells (restricted to the first of `top_k`

noise
components). In the following example we demonstrate how to identify
sensitive cells for numeric variables.
`same_key`

: a logical value specifying if the original
cell key (`TRUE`

) should be used for the lookup of the
largest contributor of a cell or if a perturbation of the cellkey itself
(`FALSE`

) should take place.
`use_zero_rkeys`

: a logical value defining if record keys
of units not contributing to a specific numeric variables should be used
(`TRUE`

) or ignored (`FALSE`

) when cell keys are
computed.

A very important parameter is `ptab`

which actually holds
the perturbation tables in which perturbation values are looked up. This
input can be specified differently in the case when numeric variables
should be perturbed. It can be either an object derived from
`ptable::pt_create_pTable(..., table = "nums")`

in the most
simple case. More advanced is to supply a named list, where the allowed
names are shown below and each element must be the output of
`ptable::pt_create_pTable(..., table = "nums")`

.

`"all"`

: this ptable will be used for all cells; if
specified, list-elements named `"even"`

or `"odd"`

are ignored
`"even"`

: this perturbation table will be used to look up
perturbation values for cells with an even number of contributors
`"odd"`

: will be used to look up perturbation values for
cells with an odd number of contributors
`"small_cells"`

: if specified, this ptable will be used
to extract perturbation values for very small cells

Please note, that if the goal is to have different perturbation
tables for cells with an even/odd number of contributors, both
`"even"`

or `"odd"`

must be available in the input
list. In the chunk below we create four different perturbation tables.
For details on the parameters, please look at the documentation of the
`ptable`

package, especially in `ptable::create_num_ptable()`

.

```
# same ptable for all cells except for very small ones
ex_ptab1 <- ptable::pt_ex_nums(parity = TRUE, separation = TRUE)
```

We can now use these tables to finally create objects containing all
the required information to create perturbed magnitude tables using
`ck_params_nums`

. In the first case we want the same
perturbation table (`ptab_all`

) for cells with an even/odd
number of contributors but want to use `ptab_sc`

for very
small cells.

```
p_nums1 <- ck_params_nums(
type = "top_contr",
top_k = 3,
ptab = ex_ptab1,
mult_params = p_flex,
mu_c = 2,
same_key = FALSE,
use_zero_rkeys = TRUE)
```

The second input we generate should use different ptables for cells
with an even/odd number of contributing units (`ptab_even`

and `ptab_odd`

) but should not use a specific perturbation
table for very small cells.

`ex_ptab2 <- ptable::pt_ex_nums(parity = FALSE, separation = FALSE)`

As above, we need to use `ck_params_nums()`

to compute
suitable inputs.

```
p_nums2 <- ck_params_nums(
type = "mean",
ptab = ex_ptab2,
mult_params = p_simple,
mu_c = 1.5,
same_key = FALSE,
use_zero_rkeys = TRUE)
```

The package internally computes the separation point that is used for
very small cells in case this is required. Details on this can also be
found in deliverable `D4.2`

.

Now we can attach the results from `ck_params_nums()`

to
numeric variables using the `params_nums_set()`

-method as
shown below:

`tab$params_nums_set(v = "income", val = p_nums1)`

`## --> setting perturbation parameters for variable 'income'`

`tab$params_nums_set(v = "savings", val = p_nums1)`

`## --> setting perturbation parameters for variable 'savings'`

In order to make use of parameter `mu_c`

that allows ab
add extra amount of protection to sensitive cells, one may identify
sensitive cells according to some rules. The following methods to
identify sensitive cells are implemented:

`supp_p()`

: identify sensitive cells based on
p%-rule
`supp_pp()`

: identify sensitive cells based on
pq%-rule
`supp_nk()`

: identify sensitive cells based on
nk-dominance rule
`supp_freq()`

: identify sensitive cells based on minimal
frequencies for (weighted) number of contributors
`supp_val()`

: identify sensitive cells based on
(weighted) cell values
`supp_cells()`

: identify sensitive cells based on their
“names”

We now want to set all cells for variable `income`

as
sensitive to which less than `15`

units contribute.

`tab$supp_freq(v = "income", n = 15, weighted = FALSE)`

`## freq-rule: 3 new sensitive cells (incl. duplicates) found (total: 3)`

To set specific cells independent on values but their names, one may
use the `$supp_cells()`

-method. This cell requires a
`data.frame`

as input that contains a column for each
dimensional variable specified. Each row of this input is considered as
a cell where `NAs`

are used as placeholders that match any
characteristic of the relevant variable. Using the
`data.frame`

`inp`

show below, the programm would
suppress the following cells:

`female`

x `age_group1`

`male`

x `age_group3`

`male`

x any age group available in the data

```
inp <- data.frame(
"sex" = c("female", "male", "male"),
"age" = c("age_group1", "age_group3", NA)
)
```