Help for package myrror

Title:

Compare Two Data Frames and Summarize Differences

Version:

0.1.2

Description:

Tools for systematic comparison of data frames, offering functionality to identify, quantify, and extract differences. Provides functions with user-friendly and interactive console output for immediate analysis, while also offering options to export differences as structured data frames that can be easily integrated into existing workflows.

License:

MIT + file LICENSE

URL:

https://pip-technical-team.github.io/myrror/, https://github.com/PIP-Technical-Team/myrror

Depends:

R (≥ 4.3.0)

Imports:

cli (≥ 3.6.2), collapse, data.table (≥ 1.15.4), digest, joyn (≥ 0.3.0), rlang, utils

Suggests:

knitr, rmarkdown, testthat (≥ 3.0.0), withr

VignetteBuilder:

knitr

Config/Needs/website:

rmarkdown, tidyverse, gapminder, DT

Config/testthat/edition:

Encoding:

UTF-8

LazyData:

true

RoxygenNote:

7.3.2

BugReports:

https://github.com/PIP-Technical-Team/myrror/issues

NeedsCompilation:

Packaged:

2025-12-19 16:35:27 UTC; wb384996

Author:

Giorgia Cecchinato [aut], R.Andres Castaneda [aut, cre], Rossana Tatulli [aut], Global Poverty and Inequality Data Team World Bank [cph]

Maintainer:

R.Andres Castaneda <acastanedaa@worldbank.org>

Repository:

CRAN

Date/Publication:

2026-01-06 11:20:02 UTC

myrror: Compare Data Frames

Description

myrror provides tools for comparing data frames, identifying differences, and extracting summary tables or lists of differences.

Usage

myrror(
  dfx,
  dfy,
  by = NULL,
  by.x = NULL,
  by.y = NULL,
  compare_type = TRUE,
  compare_values = TRUE,
  extract_diff_values = TRUE,
  factor_to_char = TRUE,
  interactive = getOption("myrror.interactive"),
  verbose = getOption("myrror.verbose"),
  tolerance = getOption("myrror.tolerance")
)

Arguments

dfx

a non-empty data.frame.

dfy

a non-empty data.frame.

by

character, key to be used for dfx and dfy.

by.x

character, key to be used for dfx.

by.y

character, key to be used for dfy.

compare_type

TRUE or FALSE, default to TRUE.

compare_values

TRUE or FALSE, default to TRUE.

extract_diff_values

TRUE or FALSE, default to TRUE.

factor_to_char

TRUE or FALSE, default to TRUE.

interactive

logical: If TRUE, print S3 method for myrror objects displays by chunks. If FALSE, everything will be printed at once.

verbose

logical: If TRUE, print messages.

tolerance

numeric, default to 1e-7.

Value

Object of class "myrror" containing:

name_dfx, name_dfy: Names of input data frames
prepared_dfx, prepared_dfy: Prepared versions of input data frames
set_by.x, set_by.y: Keys used for comparison
datasets_report: Characteristics of input datasets (rows, columns)
match_type: Type of join relationship ("1:1", "1:m", "m:1")
merged_data_report: Information about matched and unmatched data
pairs: Column pairing information
compare_type: Results from type comparison (if enabled)
compare_values: Results from value comparison (if enabled)
extract_diff_values: Extracted differences (if enabled)
interactive: Whether interactive mode is enabled

Returns NULL invisibly if the two datasets are identical.

Author(s)

Maintainer: R.Andres Castaneda acastanedaa@worldbank.org

Authors:

Giorgia Cecchinato gcecchinato@worldbank.org
Rossana Tatulli rtatulli@worldbank.org

Other contributors:

Global Poverty and Inequality Data Team World Bank [copyright holder]

References

https://pip-technical-team.github.io/myrror/

Examples



# 1. Specifying by, by.x or by.y:
myrror(survey_data, survey_data_2, by=c('country', 'year'))

## These are equivalent:
myrror(survey_data, survey_data_2_cap, by.x=c('country', 'year'), by.y = c('COUNTRY', 'YEAR'))
myrror(survey_data, survey_data_2_cap, by=c('country' = 'COUNTRY', 'year' = 'YEAR'))

# 2. Turn off interactivity:
myrror(survey_data, survey_data_2, by=c('country', 'year'), interactive = FALSE)

# 3. Turn off factor_to_char (it will treat factors as factors):
myrror(survey_data, survey_data_2, by=c('country', 'year'), factor_to_char = FALSE)

# 4. Turn off compare_type:
myrror(survey_data, survey_data_2, by=c('country', 'year'), compare_type = FALSE)
## Same can be done for compare_values and extract_diff_values.

# 5. Set tolerance:
myrror(survey_data, survey_data_2, by=c('country', 'year'), tolerance = 1e-5)

Check if the df arguments are valid, makes them into a data.frame if they are a list. Internal function.

Description

Check if the df arguments are valid, makes them into a data.frame if they are a list. Internal function.

Usage

check_df(df)

Arguments

df

data frame

Check join type. Internal function.

Description

This function checks the join type between two data frames. Internal function. It returns the type of match between the two data frames ("1:1", "1:m", "m:1", "m:m"), and the identified and non-identified rows.

Usage

check_join_type(dfx, dfy, by.x, by.y, return_match = FALSE)

Arguments

dfx

data.frame

dfy

data.frame

by.x

character vector, keys for by.y.

by.y

character vector, keys for by.x.

return_match

logical, default is FALSE.

Value

character/list depending on return_match FALSE/TRUE.

Check if the by arguments are valid, makes them into a data.frame if they are a list. Internal function.

Description

Check if the by arguments are valid, makes them into a data.frame if they are a list. Internal function.

Usage

check_set_by(by = NULL, by.x = NULL, by.y = NULL)

Arguments

by

character vector

by.x

character vector

by.y

character vector

Examples


#check_set_by(NULL, NULL, NULL) # rn set
#check_set_by("id", NULL, NULL) # by set
#check_set_by(NULL, "id", "id") # by.x and by.y set

Clear last myrror object. Internal Function.

Description

This function unbinds the last myrror object from the package-specific environment, effectively removing it.

Usage

clear_last_myrror_object()

Value

Invisible NULL, indicating the object was successfully cleared.

Examples

# myrror(iris, iris_var1, interactive = FALSE) # Run myrror to create myrror object.
# clear_last_myrror_object()  # Clear the environment
# rlang::env_has(.myrror_env, "last_myrror_object") # should return an error

Function to compare type of variables of matched data frames.

Description

This function compares the types of the columns in the two data frames.

Usage

compare_type(
  dfx = NULL,
  dfy = NULL,
  myrror_object = NULL,
  by = NULL,
  by.x = NULL,
  by.y = NULL,
  output = c("full", "simple", "silent"),
  interactive = getOption("myrror.interactive"),
  verbose = getOption("myrror.verbose")
)

Arguments

dfx

a non-empty data.frame.

dfy

a non-empty data.frame.

myrror_object

myrror object from create_myrror_object

by

character, key to be used for dfx and dfy.

by.x

character, key to be used for dfx.

by.y

character, key to be used for dfy.

output

character: one of "full" (returns a myrror_object), "simple" (returns a dataframe), "silent" (invisible object returned).

interactive

logical: If TRUE, print S3 method for myrror objects displays by chunks. If FALSE, everything will be printed at once.

verbose

logical: If TRUE additional information will be displayed.

Value

Depending on output parameter:

"full": myrror object with compare_type slot containing a data.table of column class comparisons
"simple": data.table with columns: variable, class_x, class_y, same_class
"silent": invisibly returns myrror object (same as "full")

Returns NULL if no differences are found and output = "simple".

Examples


# 1. Standard report, myrror_object output:
compare_type(survey_data, survey_data_2, by=c('country', 'year'))

# 2. Simple output, data.table output:
compare_type(survey_data, survey_data_2, by=c('country', 'year'),
             output = 'simple')

# 3. Toggle interactvity:
compare_type(survey_data, survey_data_2, by=c('country', 'year'),
             interactive = FALSE)

# 4. Different keys (see also ?myrror):
compare_type(survey_data, survey_data_2_cap,
             by.x = c('country', 'year'), by.y = c('COUNTRY', 'YEAR'))

# 5. Using existing myrror object created by myrror():
myrror(survey_data, survey_data_2, by=c('country', 'year'))
compare_type()

Compare type of variables, internal function.

Description

Compare type of variables, internal function.

Usage

compare_type_int(myrror_object = NULL)

Arguments

myrror_object

myrror object

Value

data.table object

Function to compare values of matched data frames.

Description

Function to compare values of matched data frames.

Usage

compare_values(
  dfx = NULL,
  dfy = NULL,
  myrror_object = NULL,
  by = NULL,
  by.x = NULL,
  by.y = NULL,
  output = c("full", "simple", "silent"),
  interactive = getOption("myrror.interactive"),
  verbose = getOption("myrror.verbose"),
  tolerance = getOption("myrror.tolerance")
)

Arguments

dfx

a non-empty data.frame.

dfy

a non-empty data.frame.

myrror_object

myrror object from create_myrror_object

by

character, key to be used for dfx and dfy.

by.x

character, key to be used for dfx.

by.y

character, key to be used for dfy.

output

character: one of "full" (returns a myrror_object), "simple" (returns a dataframe), "silent" (invisible object returned).

interactive

logical: If TRUE, print S3 method for myrror objects displays by chunks. If FALSE, everything will be printed at once.

verbose

logical: If TRUE additional information will be displayed.

tolerance

numeric, default to 1e-7.

Value

Depending on output parameter:

"full": myrror object with compare_values slot containing a summary tibble of value differences
"simple": tibble with columns: variable, change_in_value, na_to_value, value_to_na (counts)
"silent": invisibly returns myrror object (same as "full")

Returns NULL if no differences are found and output = "simple".

Examples


# 1. Standard report, myrror_object output:
compare_values(survey_data, survey_data_2, by=c('country', 'year'))

# 2. Simple output, list of data.tables output:
compare_values(survey_data, survey_data_2, by=c('country', 'year'),
               output = 'simple')

# 3. Toggle tolerance:
compare_values(survey_data, survey_data_2, by=c('country', 'year'),
               tolerance = 1e-5)

# 4. Toggle interactvity:
compare_values(survey_data, survey_data_2, by=c('country', 'year'),
               interactive = FALSE)

# 5. Different keys (see also ?myrror):
compare_values(survey_data, survey_data_2_cap,
               by.x = c('country', 'year'), by.y = c('COUNTRY', 'YEAR'))

# 6. Using existing myrror object created by myrror():
myrror(survey_data, survey_data_2, by=c('country', 'year'))
compare_values()

Creates a myrror object for comparing two data frames

Description

This function constructs a myrror object by comparing two data frames. It handles the preparation, validation, and joining of datasets, identifies matching and non-matching observations, and performs column pairing for comparison. The function supports various join types (1:1, 1:m, m:1) and provides detailed reports on the comparison results.

Usage

create_myrror_object(
  dfx,
  dfy,
  by = NULL,
  by.x = NULL,
  by.y = NULL,
  factor_to_char = TRUE,
  verbose = getOption("myrror.verbose"),
  interactive = getOption("myrror.interactive")
)

Arguments

dfx

a non-empty data.frame.

dfy

a non-empty data.frame.

by

character, key to be used for dfx and dfy.

by.x

character, key to be used for dfx.

by.y

character, key to be used for dfy.

factor_to_char

TRUE or FALSE, default to TRUE.

verbose

logical: If TRUE additional information will be displayed.

interactive

logical: If TRUE, print S3 method for myrror objects displays by chunks. If FALSE, everything will be printed at once.

Value

An object of class "myrror" containing comparison results, dataset information, and various reports on matching/non-matching observations.

Examples

# convert rownames of mtcars to a column
mtcars2 <- mtcars
mtcars2$car_name <- rownames(mtcars2)
rownames(mtcars2) <- NULL
# modify mtcars2 slightly by remove one row and changing one value
mtcars3 <- mtcars2[-1, ]
mtcars3$mpg[1] <- mtcars3$mpg[1] + 1

mo <- create_myrror_object(mtcars2, mtcars3, by = "car_name")
mo

Are these two values equal with tolerance applied? This function is used to apply tolerance to the comparison of two numeric values.

Description

Are these two values equal with tolerance applied? This function is used to apply tolerance to the comparison of two numeric values.

Usage

equal_with_tolerance(x, y, tolerance = 0.0000001)

Arguments

x

numeric

y

numeric

tolerance

numeric

Value

logical

Extract Different Values - Internal

Description

Extract Different Values - Internal

Usage

extract_diff_int(myrror_object = NULL, tolerance = 0.0000001)

Arguments

myrror_object

myrror object

tolerance

numeric, default to 1e-7

Value

list with two elements:

diff_list
diff_table

Extract Different Rows Function to extract missing or new rows from comparing two data frames.

Description

Extract Different Rows Function to extract missing or new rows from comparing two data frames.

Usage

extract_diff_rows(
  dfx = NULL,
  dfy = NULL,
  myrror_object = NULL,
  by = NULL,
  by.x = NULL,
  by.y = NULL,
  output = c("simple", "full", "silent"),
  tolerance = 0.0000001,
  verbose = getOption("myrror.verbose"),
  interactive = getOption("myrror.interactive")
)

Arguments

dfx

a non-empty data.frame.

dfy

a non-empty data.frame.

myrror_object

myrror object from create_myrror_object

by

character, key to be used for dfx and dfy.

by.x

character, key to be used for dfx.

by.y

character, key to be used for dfy.

output

character: one of "full", "simple", "silent".

tolerance

numeric, default to 1e-7.

verbose

logical: If TRUE additional information will be displayed.

interactive

logical: If TRUE, print S3 method for myrror objects displays by chunks. If FALSE, everything will be printed at once.

Value

Depending on output parameter:

"full": myrror object with extract_diff_rows slot containing a data.table of non-matching rows
"simple": data.table with columns: df (indicating 'dfx' or 'dfy'), keys, and all other columns. Contains rows that exist in only one dataset
"silent": invisibly returns myrror object (same as "full")

Returns NULL if no row differences are found and output = "simple".

Examples


# 1. Standard report, after running myrror() or compare_values():
myrror(survey_data, survey_data_2, by=c('country', 'year'))
extract_diff_rows()

# 2. Standard report, with new data:
extract_diff_rows(survey_data, survey_data_2, by=c('country', 'year'))


# 3. Toggle tolerance:
extract_diff_rows(survey_data, survey_data_2, by=c('country', 'year'),
                    tolerance = 1e-5)

Extract Different Values in Table Format Function to extract rows with different values between two data frames.

Description

Extract Different Values in Table Format Function to extract rows with different values between two data frames.

Usage

extract_diff_table(
  dfx = NULL,
  dfy = NULL,
  myrror_object = NULL,
  by = NULL,
  by.x = NULL,
  by.y = NULL,
  output = c("simple", "full", "silent"),
  tolerance = 0.0000001,
  verbose = getOption("myrror.verbose"),
  interactive = getOption("myrror.interactive")
)

Arguments

dfx

a non-empty data.frame.

dfy

a non-empty data.frame.

myrror_object

myrror object from create_myrror_object

by

character, key to be used for dfx and dfy.

by.x

character, key to be used for dfx.

by.y

character, key to be used for dfy.

output

character: one of "full", "simple", "silent".

tolerance

numeric, default to 1e-7.

verbose

logical: If TRUE additional information will be displayed.

interactive

logical: If TRUE, print S3 method for myrror objects displays by chunks. If FALSE, everything will be printed at once.

Value

Depending on output parameter:

"full": myrror object with extract_diff_values slot containing a list with diff_list and diff_table
"simple": data.table with all observations where at least one value differs. Contains columns: diff, variable, indexes, keys, and all compared variables with .x/.y suffixes
"silent": invisibly returns myrror object (same as "full")

Returns NULL if no differences are found and output = "simple".

Examples


# 1. Standard report, after running myrror() or compare_values():
myrror(survey_data, survey_data_2, by=c('country', 'year'))
extract_diff_table()

# 2. Standard report, with new data:
extract_diff_table(survey_data, survey_data_2, by=c('country', 'year'))

# 3. Toggle tolerance:
extract_diff_table(survey_data, survey_data_2, by=c('country', 'year'),
                    tolerance = 1e-5)

Extract Different Values Function to extract rows with different values between two data frames.

Description

Extract Different Values Function to extract rows with different values between two data frames.

Usage

extract_diff_values(
  dfx = NULL,
  dfy = NULL,
  myrror_object = NULL,
  by = NULL,
  by.x = NULL,
  by.y = NULL,
  output = c("simple", "full", "silent"),
  tolerance = 0.0000001,
  verbose = getOption("myrror.verbose"),
  interactive = getOption("myrror.interactive")
)

Arguments

dfx

a non-empty data.frame.

dfy

a non-empty data.frame.

myrror_object

myrror object from create_myrror_object

by

character, key to be used for dfx and dfy.

by.x

character, key to be used for dfx.

by.y

character, key to be used for dfy.

output

character: one of "full", "simple", "silent".

tolerance

numeric, default to 1e-7.

verbose

logical: If TRUE additional information will be displayed.

interactive

logical: If TRUE, print S3 method for myrror objects displays by chunks. If FALSE, everything will be printed at once.

Value

Depending on output parameter:

"full": myrror object with extract_diff_values slot containing a list with diff_list and diff_table
"simple": named list of data.tables, one per variable with differences. Each table contains columns: diff, indexes, keys, variable.x, variable.y
"silent": invisibly returns myrror object (same as "full")

Returns NULL if no differences are found and output = "simple".

Examples


# 1. Standard report, after running myrror() or compare_values():
myrror(survey_data, survey_data_2, by=c('country', 'year'))
extract_diff_values()

# 2. Standard report, with new data:
extract_diff_values(survey_data, survey_data_2, by=c('country', 'year'))

# 3. Toggle tolerance:
extract_diff_values(survey_data, survey_data_2, by=c('country', 'year'),
                    tolerance = 1e-5)

Get correct myrror object. Internal function.

Description

It checks all the arguments parsed to parent function. If myrror_object if found, then it will be used. If not, it checks if both databases are NULL. If they are it looks for the the last myrror object. If nothing available, then error. Finally, it checks for the availability of both datasets. If they are available, then create myrror_object

Usage

get_correct_myrror_object(
  myrror_object,
  dfx,
  dfy,
  by,
  by.x,
  by.y,
  verbose,
  interactive,
  ...
)

Arguments

dfx

a non-empty data.frame.

dfy

a non-empty data.frame.

by

character, key to be used for dfx and dfy.

by.x

character, key to be used for dfx.

by.y

character, key to be used for dfy.

verbose

logical: If TRUE additional information will be displayed.

interactive

logical: If TRUE, print S3 method for myrror objects displays by chunks. If FALSE, everything will be printed at once.

...

other arguments parsed to parent function.

Value

myrror object

Get the name of a data frame. Internal function.

Description

This function gets the name of a data frame. Internal function. If the data frame has a name attribute, it returns that. Otherwise, it returns the deparse of the original call.

Usage

get_df_name(df, original_call)

Arguments

df

data.frame

original_call

original call (df)

Value

character

Get keys or default. A simple function wrapper which returns 'rn' (row names) if the data.table has no keys.

Description

Get keys or default. A simple function wrapper which returns 'rn' (row names) if the data.table has no keys.

Usage

get_keys_or_default(keys, default = "rn")

Arguments

keys

character vector

default

character

Value

character

Iris Dataset Variation 1

Description

This dataset variation includes:

Additional rows by duplicating the first 5 rows.
A new column Petal.Area calculated from Petal.Length and Petal.Width.
Introduction of NA values in Sepal.Length.

Usage

iris_var1

Format

A data.frame with 155 rows and 6 variables:

Sepal.Length: Numeric, Sepal length in cm, with some NA values.
Sepal.Width: Numeric, Sepal width in cm.
Petal.Length: Numeric, Petal length in cm.
Petal.Width: Numeric, Petal width in cm.
Species: Factor with levels: "setosa","versicolor","virginica".
Petal.Area: Numeric, calculated as Petal.Length * Petal.Width.

Details

It tests the handling of extended datasets, new calculated fields, and missing values.

Source

Modified iris dataset.

Iris Dataset Variation 2

Description

This dataset variation includes:

NaN values in the Sepal.Width column.
Random adjustments to Sepal.Length to create a range of different values.
A shuffled order of rows to test comparison without reliance on row order.

Usage

iris_var2

Format

A data frame with 150 rows and 5 variables, row order shuffled:

Sepal.Length: Numeric, Sepal length in cm, modified by adding random values.
Sepal.Width: Numeric, Sepal width in cm, with some NaN values.
Petal.Length: Numeric, Petal length in cm.
Petal.Width: Numeric, Petal width in cm.
Species: Factor with levels: "setosa","versicolor","virginica".

Details

It is designed to test the handling of NaN values, comparison of numeric differences, and insensitivity to row order.

Source

Modified iris dataset.

Iris Dataset Variation 3

Description

This dataset variation includes:

Column name changes (e.g., Sepal.Length to SL).
Conversion of numeric to character type for the Sepal.Length (now SL) column.
An NA value introduced into the SL column.

Usage

iris_var3

Format

A data.frame with 150 rows and 5 variables:

SL: Character, originally numeric, with one NA value.
SW: Numeric, Sepal width in cm.
PL: Numeric, Petal length in cm.
PW: Numeric, Petal width in cm.
Species: Factor with levels: "setosa","versicolor","virginica".

Details

This variation tests the package's ability to correctly identify and handle column renaming, type conversion, and missing values.

Source

Modified iris dataset.

Iris Dataset Variation 4

Description

This dataset variation includes:

Uppercase transformation of Species factor levels.
Duplicated rows (first 10 rows repeated).
An altered scale for Petal.Width (values multiplied by 10).

Usage

iris_var4

Format

A data.frame with 160 rows and 5 variables:

Sepal.Length: Numeric, length in cm.
Sepal.Width: Numeric, width in cm.
Petal.Length: Numeric, length in cm.
Petal.Width: Numeric, width in cm, values scaled by a factor of 10.
Species: Factor with levels modified to uppercase: "SETOSA", "VERSICOLOR", "VIRGINICA".

Details

Designed to test handling of categorical variable level modifications, duplicate rows, and numeric scale adjustments.

Source

Modified iris dataset.

Iris Dataset Variation 5

Description

This dataset variation includes:

Column with different type: Sepal.Length (character).
Column with different values: Sepal.Length (1 modified value).

Usage

iris_var5

Format

A data.frame with 160 rows and 5 variables:

Sepal.Length: Charactert, length in cm.
Sepal.Width: Numeric, width in cm.
Petal.Length: Numeric, length in cm.
Petal.Width: Numeric, Petal width in cm.
Species: Factor with levels: "setosa","versicolor","virginica".

Source

Modified iris dataset.

Iris Dataset Variation 6

Description

Iris Dataset Variation 6

Usage

iris_var6

Format

A data.frame with 146 rows and 5 variables:

Sepal.Length: Numeric, Sepal length in cm.
Sepal.Width: Numeric, Sepal width in cm.
Petal.Length: Numeric, Petal length in cm.
Petal.Width: Numeric, Petal width in cm.
Species: Factor with levels: "setosa","versicolor","virginica".

Source

Modified iris dataset.

Iris Dataset Variation 7

Description

Iris Dataset Variation 7

Usage

iris_var7

Format

A data.frame with 146 rows and 5 variables:

Sepal.Length: Numeric, Sepal length in cm.
Sepal.Width: Numeric, Sepal width in cm.
Petal.Length: Numeric, Petal length in cm.
Petal.Width: Numeric, Petal width in cm.
Species: Factor with levels: "setosa","versicolor","virginica".

Source

Modified iris dataset.

Menu wrapper. Internal function.

Description

This function is a wrapper around the base R menu function. It is used to provide a consistent interface for the menu function.

Usage

my_menu(...)

Arguments

...

Arguments passed to utils::menu(), including choices and title.

Value

Integer indicating the selected menu item.

Readline wrapper. Internal function.

Description

This function is a wrapper around the base R readline function. It is used to provide a consistent interface for the readline function.

Usage

my_readline(...)

Arguments

...

Arguments passed to base::readline(), particularly prompt.

Value

Character string containing the user's input.

Pairs columns and prepares them for comparison.

Description

Pairs columns and prepares them for comparison.

Usage

pair_columns(merged_data_report, suffix_x = ".x", suffix_y = ".y")

Arguments

merged_data_report

joined prepared_dfx and prepared_dfy.

suffix_x

suffix for dfx (default .x)

suffix_y

suffix for dfy (default .y)

Value

list of paired_columns

Prepares dataset for joyn::joyn(). Internal function.

Description

Prepares dataset for joyn::joyn(). Internal function.

Usage

prepare_df(
  df,
  by = NULL,
  factor_to_char = TRUE,
  interactive = getOption("myrror.interactive"),
  verbose = getOption("myrror.verbose")
)

Arguments

df

data.frame or data.table

by

character vector

factor_to_char

logical

interactive

logical

verbose

logical

Value

data.table

Print method for Myrror object.

Description

Print method for Myrror object.

Usage

## S3 method for class 'myrror'
print(x, ...)

Arguments

x

an object of class 'myrror_object'

...

additional arguments

Value

Invisibly returns the myrror object x. Called for side effects (printing comparison report to console).

Examples

# Create example datasets
dfx <- data.frame(id = 1:5,
                  name = c("A", "B", "C", "D", "E"),
                  value = c(10, 20, 30, 40, 50))

dfy <- data.frame(id = 1:6,
                  name = c("A", "B", "C", "D", "E", "F"),
                  value = c(10, 20, 35, 40, 50, 60))

# Create a myrror object
library(myrror)
m <- myrror(dfx, dfy, by.x = "id", by.y = "id")

# Print the myrror object (happens automatically)
m

# Create object with different print settings

# With interactive mode disabled
m2 <- myrror(dfx, dfy, by.x = "id", by.y = "id", interactive = FALSE)
print(m2)

Identify Suggested Keys or IDs for Data Frame

Description

This function attempts to find potential unique identifier columns or combinations for a given data frame. It first tries to identify single-column keys, then two-column key combinations that uniquely identify each row in the data frame.

Usage

suggested_ids(df)

Arguments

df

A data frame for which to identify potential unique identifiers

Value

A list containing up to two elements:

1

The first single-column key identified (if any)

2

The first two-column key combination identified (if any)

Returns NULL if no valid keys were found.

Survey Data A country-year level dataset with 15 rows and 6 variables. 2 countries, 4 years, and 4 additional variables.

Description

Survey Data A country-year level dataset with 15 rows and 6 variables. 2 countries, 4 years, and 4 additional variables.

Usage

survey_data

Format

A data.table with 16 rows and 4 variables:

country: Factor with levels: "A", "B".
year: Numeric, with values: 2010, 2011, 2012, 2013.
variable1: Numeric.
variable2: Numeric.
variable3: Numeric.
variable4: Numeric.

@source Simulated data.

Survey Data 1:m Variation 1

Description

Survey Data 1:m Variation 1

Usage

survey_data_1m

Format

A data.table with 36 rows and 6 variables:

country: Factor with levels: "A", "B".
year: Numeric, with values: 2010-2017.
variable1: Numeric.
variable2: Numeric.
variable3: Numeric.
variable4: Numeric.

Source

Variation of survey_data with non-unique ids and a 1:m relationship between ids and values.

Survey Data 1:m Variation 2

Description

Survey Data 1:m Variation 2

Usage

survey_data_1m_2

Format

A data.table with 36 rows and 6 variables:

country: Factor with levels: "A", "B".
year: Numeric, with values: 2010-2017.
variable1: Numeric.
variable2: Numeric.
variable3: Numeric.
variable4: Numeric.

Source

Variation of survey_data with non-unique ids and a 1:m relationship between ids and values.

Survey Data Variation 2

Description

Survey Data Variation 2

Usage

survey_data_2

Format

A data.table with 15 rows and 6 variables:

country: Factor with levels: "A", "B".
year: Numeric, with values: 2010, 2011, 2012, 2013.
variable1: Numeric.
variable2: Numeric. Modified variable values.
variable3: Numeric.
variable4: Numeric.

Source

Simulated data.

Survey Data Variation 2 with Cap Keys

Description

Survey Data Variation 2 with Cap Keys

Usage

survey_data_2_cap

Format

A data.table with 15 rows and 6 variables:

country: Factor with levels: "A", "B".
year: Numeric, with values: 2010, 2011, 2012, 2013.
variable1: Numeric.
variable2: Numeric. Modified variable values.
variable3: Numeric.
variable4: Numeric.

Source

Simulated data.

Survey Data Variation 3

Description

Survey Data Variation 3

Usage

survey_data_3

Format

A data.table with 15 rows and 6 variables:

country: Factor with levels: "A", "B".
year: Numeric, with values: 2010-2017.
variable1: Character. Modified variable class.
variable2: Numeric.
variable3: Numeric.
variable4: Numeric.

Source

Simulated data.

Survey Data Variation 4

Description

Survey Data Variation 4

Usage

survey_data_4

Format

A data.table with 12 rows (4 missing) and 6 variables:

country: Factor with levels: "A", "B".
year: Numeric, with values: 2010-2017.
variable1: Numeric.
variable2: Numeric.
variable3: Numeric.
variable4: Numeric.

Source

Simulated data.

Survey Data Variation 5

Description

Survey Data Variation 5

Usage

survey_data_5

Format

A data.table with 15 rows and 4 variables (2 missing):

country: Factor with levels: "A", "B".
year: Numeric, with values: 2010-2017.
variable1: Numeric.
variable2: Numeric.
variable3: Numeric.
variable4: Numeric.

Source

Simulated data.

Survey Data Variation 6

Description

Survey Data Variation 6

Usage

survey_data_6

Format

A data.table with 15+15 (duplicated) rows and 4 variables:

country: Factor with levels: "A", "B".
year: Numeric, with values: 2010-2017.
variable1: Numeric.
variable2: Numeric.
variable3: Numeric.
variable4: Numeric.

Source

Simulated data.

Survey Data Variation All

Description

Survey Data Variation All

Usage

survey_data_all

Format

A data.table with 12 rows and 6 variables:

country: Factor with levels: "A", "B".
year: Numeric, with values: 2010-2017.
variable1: Numeric.
variable2: Numeric.
variable3: Numeric.

Source

Simulated data.

Survey Data m:1

Description

Survey Data m:1

Usage

survey_data_m1

Format

A data.table with 15 rows and 6 variables:

country: Factor with levels: "A", "B".
year: Numeric, with values: 2010-2017.
variable1: Numeric.
variable2: Numeric.
variable3: Numeric.
variable4: Numeric.

Source

Variation of survey_data with non-unique ids and a m:1 relationship between ids and values.

myrror: Compare Data Frames

Description

Usage

Arguments

Value

Author(s)

References

See Also

Examples

Check if the df arguments are valid, makes them into a data.frame if they are a list. Internal function.

Description

Usage

Arguments

Check join type. Internal function.

Description

Usage

Arguments

Value

Check if the by arguments are valid, makes them into a data.frame if they are a list. Internal function.

Description

Usage

Arguments

Examples

Clear last myrror object. Internal Function.

Description

Usage

Value

Examples

Function to compare type of variables of matched data frames.

Description

Usage

Arguments

Value

Examples

Compare type of variables, internal function.

Description

Usage

Arguments

Value

Function to compare values of matched data frames.

Description

Usage

Arguments

Value

Examples

Creates a myrror object for comparing two data frames

Description

Usage

Arguments

Value

Examples

Are these two values equal with tolerance applied? This function is used to apply tolerance to the comparison of two numeric values.

Description

Usage

Arguments

Value

Extract Different Values - Internal

Description

Usage

Arguments

Value

Extract Different Rows Function to extract missing or new rows from comparing two data frames.

Description

Usage

Arguments

Value

Examples

Extract Different Values in Table Format Function to extract rows with different values between two data frames.

Description

Usage

Arguments

Value

Examples

Extract Different Values Function to extract rows with different values between two data frames.

Description

Usage

Arguments

Value

Examples

Get correct myrror object. Internal function.