| Title: | Miscellaneous Functions for Data and Geospatial Work |
| Version: | 0.0.5 |
| Description: | Helpers for common data analysis tasks including missing-value summaries and filters, simple reporting and plotting utilities, 'Excel' import and export workflows, and reading geospatial formats (for example shapefiles in zip archives, file geodatabases, KMZ, and KML) via 'sf' and related packages. Also includes small project utilities such as creating directories, gitignore scaffolding, combined package loading, and optional 'lintr' setup. |
| Depends: | R (≥ 4.1.0) |
| Imports: | data.table, dplyr, fs, ggplot2, glue, here, magrittr, purrr, tidyr, tools, usethis, readr, mapview, sf, rlang, jsonlite, writexl, zip |
| Suggests: | clipr, conflicted, covr, janitor, lexicon, magick, naniar, readxl, rnaturalearth, skimr, stringr, testthat, textclean, tidyselect, lintr |
| License: | MIT + file LICENSE |
| URL: | https://github.com/kguidonimartins/misc |
| BugReports: | https://github.com/kguidonimartins/misc/issues |
| Encoding: | UTF-8 |
| RoxygenNote: | 7.3.2 |
| NeedsCompilation: | no |
| Packaged: | 2026-03-31 19:47:34 UTC; karloguidoni |
| Author: | Karlo Guidoni Martins
|
| Maintainer: | Karlo Guidoni Martins <kguidonimartins@gmail.com> |
| Repository: | CRAN |
| Date/Publication: | 2026-04-04 09:10:07 UTC |
Pipe operator
Description
See magrittr::%>% for details.
Usage
lhs %>% rhs
Arguments
lhs |
A value or the magrittr forward-pipe placeholder. |
rhs |
A function call using magrittr semantics. |
Value
The result of evaluating the right-hand side with the left-hand
side as first argument (see magrittr::%>%).
Add a gitignore file to the project root
Description
add_gitignore() fetch files using the API from
gitignore.io. Also,
add_gitignore() include tags (created by
ctags) into the gitignore file.
Usage
add_gitignore(type = "r")
Arguments
type |
a character vector with the language to be ignored |
Value
No return value, called for side effects (creates .gitignore, or
stops with an error if the file already exists).
Acknowledgment
add_gitignore() is inspired by
gitignore::gi_fetch_templates
and by some examples on the gitignore.io
wiki page.
Examples
if (interactive()) {
# Downloads from gitignore.io (requires network). Use combined `type` on
# first create, e.g. `add_gitignore(type = c("r", "python"))`.
add_gitignore()
}
Combine words using ptbr rules
Description
combine_words_ptbr() collapse words using ptbr rules. This function
differ from knitr::combine_words()
which uses oxford commas.
Usage
combine_words_ptbr(words, sep = NULL, last = NULL)
Arguments
words |
a character vector with words to combine |
sep |
a character with the separator of the words. Default is NULL and insert ", " |
last |
a character vector with the last separator of the words. Default is NULL and insert " e " |
Value
a character vector
Acknowledgment
combine_words_ptbr() uses transformers
available in the excellent {glue} package
Examples
misc::ipak("glue")
# using in an ordinary text
feira <- c("banana", "maça", "pepino", "ovos")
glue("Por favor, compre: {combine_words_ptbr(feira)}")
Create directories
Description
The main purpose of create_dirs() is to create default directories used
in data science projects. create_dirs() can also create custom
directories.
Usage
create_dirs(dirs = NULL)
Arguments
dirs |
a character vector with the directory names. Default is NULL and
create |
Value
No return value, called for side effects (creates directories and
optional .gitkeep placeholder files).
Goal
There is a somewhat subjective discussion about the ideal directory structure
for data science projects in general (see
here,
here,
here, and
here). In my humble opinion, the
decision should be made by the user/analyst/scientist/team. Here, I
suggest a directory structure that has worked for me. In addition, the
directory structure created fits perfectly with functions present in this
package (for example save_plot and save_temp_data).
Below is the suggested directory structure:
.
+- R # local functions
+- data
| +- clean # stores clean data
| +- raw # stores raw data (read-only)
| +- temp # stores temporary data
+- output
+- figures # stores figures ready for publication/presentation
+- results # stores text results and others
+- supp # stores supplementary material for publication/presentation
Acknowledgment
create_dirs() takes advantage of the functions available in the excellent
{fs} package.
Examples
if (interactive()) {
# create a single directory
create_dirs("myfolder")
# create the default directories
create_dirs()
# see the resulting tree
fs::dir_tree()
}
Remove duplicate rows based on specified grouping variables
Description
This function removes duplicate rows from a data frame while keeping the first occurrence of each unique combination of the specified grouping variables.
Usage
deduplicate_by(.data, ...)
Arguments
.data |
A data frame or tibble |
... |
One or more unquoted variable names to group by |
Value
A data frame with duplicate rows removed, keeping only the first occurrence for each unique combination of grouping variables
Examples
# Remove duplicates based on a single column
mtcars %>% deduplicate_by(carb)
# Remove duplicates based on multiple columns
mtcars %>% deduplicate_by(carb, mpg)
Describe data
Description
Describe data
Usage
describe_data(data)
Arguments
data |
a data frame |
Value
a skimr object
Examples
nice_data <- data.frame(c1 = c(1, NA), c2 = c(NA, NA))
nice_data %>%
describe_data()
Easily filter NA values from data frames
Description
filter_na() just wrap {dplyr} functions in a more
convenient way, IMO.
Usage
filter_na(data, type = c("any", "all"))
Arguments
data |
a data frame or tibble |
type |
a character vector indicating which type of NA-filtering must be done. If type = "any",
|
Value
a tibble object
Examples
nice_data <- data.frame(c1 = c(1, NA), c2 = c(NA, NA))
nice_data %>%
filter_na("all")
nice_data %>%
filter_na("any")
Load multiple CRAN and GitHub R packages
Description
Attaches packages that are already installed. Names that are not found on
the library search path are reported with suggested
install.packages() or remotes::install_github() calls to run yourself;
this function does not install packages (CRAN policy).
Usage
ipak(pkg_list, force_cran = FALSE, force_github = FALSE)
Arguments
pkg_list |
A character vector of package names. GitHub sources use
|
force_cran |
Logical. Ignored (retained for backwards compatibility; this function does not install or update packages). |
force_github |
Logical. Ignored (retained for backwards compatibility). |
Value
A data.frame with columns pkg_name (character), success
(logical: whether require() attached the package), and version
(character, NA when not loaded). Returned invisibly; summaries are
printed via print() on subsets when rows exist.
Acknowledgment
ipak() was first developed by
Steven Worthington and made
publicly available
here. This version
only loads packages and suggests install commands for missing ones.
Examples
pkg_list <- c("utils", "stats") # base packages — usually present
ipak(pkg_list)
Count NA frequency in data
Description
na_count() is a way to display the count and frequency of NA in data. It
can be slow over large datasets.
Usage
na_count(data, sort = TRUE)
Arguments
data |
a data frame |
sort |
If |
Value
a long-format tibble
Acknowledgment
I learned this way of exploring data though the excellent webinar taught by Emily Robinson.
Examples
na_data <- data.frame(c1 = c(1, NA), c2 = c(NA, NA))
na_data %>% na_count()
Visualize NA frequency in data
Description
na_viz() create a ggplot plot showing the percentage of NA in each column
Usage
na_viz(data)
Arguments
data |
a data frame |
Value
a ggplot object
Acknowledgment
na_viz() is another name for the excellent vis_miss() of
{naniar}
Examples
if (interactive()) {
na_data <- data.frame(c1 = c(1, NA), c2 = c(NA, NA))
na_data %>% na_viz()
}
Defines preferred package::functions
Description
The most common conflict between {tidyverse} users is dplyr::filter() and
stats::filter(); among {raster} users, the conflict is with
dplyr::select(). prefer() eliminates conflicts between namespaces by
forcing the use of all the functions of the chosen package, rather than
looking for specific conflicts. Because of that and depending on the number
of functions exported by a package, prefer() can be slow.
Usage
prefer(pkg_name, quiet = TRUE)
Arguments
pkg_name |
a atomic vector with package names |
quiet |
If warnings should be displayed. Default is TRUE |
Value
No return value, called for side effects (registers conflict
preferences via conflicted::conflict_prefer()).
Acknowledgment
prefer() is shamelessly derived from a piece of code in
README.md
of the {tidylog}
Examples
# prefer `{dplyr}` functions over `{stats}`
prefer("dplyr")
Create maps quickly
Description
quick_map() allows the creation of maps quickly using {ggplot2}. For this
reason, the resulting map is fully editable through {ggplot2} layers.
Usage
quick_map(region = NULL, type = NULL)
Arguments
region |
character string or atomic vector containing countries names ou continents. Default is |
type |
character string informing map type. Can be |
Value
a ggplot object
Acknowledgment
quick_map() depends heavily on the data available by
the {rnaturalearth}
package. In this sense, quick_map() uses a wide and dirty filtering of
this data to create the map.
Examples
if (interactive()) {
# plot a world map
quick_map()
# plot a new world map
quick_map(region = "Americas", type = "sf")
# using ggplot
quick_map(region = "Americas", type = "ggplot")
# edit using ggplot2 layers
quick_map() +
ggplot2::theme_void() +
ggplot2::geom_sf(fill = "white")
}
Read and save all excel sheets and save them to a CSV file
Description
read_all_sheets_then_save_csv() just loops read_sheet_then_save_csv() over
the available excel sheets and save them in data/temp/extracted_sheets
Usage
read_all_sheets_then_save_csv(path_to_xlsx, dir_to_save = NULL)
Arguments
path_to_xlsx |
a character vector with path to the excel file |
dir_to_save |
a character vector with the path to save the csv files. Default is NULL and save the csv files in the "data/temp/extracted_sheets" if it exists. |
Value
A list (one element per sheet), each the return value of
read_sheet_then_save_csv() for that sheet (invisibly NULL per call).
Acknowledgment
Examples
if (interactive()) {
# read and into a csv
misc::create_dirs("ma-box")
xlsx_file <- system.file("xlsx-examples", "mtcars_workbook_001.xlsx", package = "misc")
read_all_sheets_then_save_csv(
path_to_xlsx = xlsx_file,
dir_to_save = "ma-box"
)
}
Read all sheets from all excel files and save into CSV files
Description
Following the same principle of read_all_sheets_then_save_csv
read_all_xlsx_then_save_csv() just loop read_all_sheets_then_save_csv() over
all available xlsx files
Usage
read_all_xlsx_then_save_csv(path_to_xlsx)
Arguments
path_to_xlsx |
a character vector with the path to excel file |
Value
A list (one element per .xlsx file found under path_to_xlsx), each
the list returned by read_all_sheets_then_save_csv() for that workbook.
Acknowledgment
Examples
if (interactive()) {
# read and into a csv
xlsx_dir <- system.file("xlsx-examples", package = "misc")
read_all_xlsx_then_save_csv(
path_to_xlsx = xlsx_dir
)
}
Read layers from a file geodatabase (.gdb)
Description
Read layers from a file geodatabase (.gdb)
Usage
read_gdb(path, layer = NULL, quiet = TRUE, ...)
Arguments
path |
Path to a |
layer |
If |
quiet |
Passed to |
... |
Additional arguments passed to |
Value
A tibble with columns fpath (path or GDAL dsn used for the layer),
file_type (tools::file_ext()), layer_name, geometry_type, nrows_aka_features,
ncols_aka_fields, crs_name (from st_layers()$crs when available), and
data (list-column of sf::sf objects). Layers are not row-bound; differing CRS are preserved
per row.
Examples
gdb <- system.file("extdata", "misc_example.gdb", package = "misc")
if (nzchar(gdb) && dir.exists(gdb)) {
read_gdb(gdb)
read_gdb(gdb, layer = "OGRGeoJSON")
}
Read a geospatial file or dataset (auto-detect by extension)
Description
Chooses the reader from tools::file_ext(path) (case-insensitive):
-
.zip—read_sf_zip() -
.kmz—read_kmz() -
.kml— internal KML reader (same tibble layout;fpathis the.kmlfile) -
.gdb—read_gdb() anything else GDAL/
sfcan open onpath— one row per layer fromsf::st_layers()(e.g..shp,.gpkg,.geojson)
Usage
read_geo(path, layer = NULL, quiet = TRUE, ...)
Arguments
path |
Path to a spatial file or a |
layer |
Passed to multi-layer GDAL readers. Ignored for |
quiet |
Passed to |
... |
Additional arguments passed to |
Value
A tibble as described in read_gdb().
Examples
d <- system.file("extdata", package = "misc")
f <- function(...) file.path(d, ...)
if (file.exists(f("misc_example.zip"))) read_geo(f("misc_example.zip"))
if (file.exists(f("misc_example.kmz"))) read_geo(f("misc_example.kmz"))
if (file.exists(f("misc_example.kml"))) read_geo(f("misc_example.kml"))
if (file.exists(f("misc_example.gpkg"))) read_geo(f("misc_example.gpkg"))
if (file.exists(f("misc_example.geojson"))) read_geo(f("misc_example.geojson"))
if (file.exists(f("misc_example.shp"))) read_geo(f("misc_example.shp"))
if (dir.exists(f("misc_example.gdb"))) read_geo(f("misc_example.gdb"), layer = "OGRGeoJSON")
Read a KMZ file (KML in a ZIP)
Description
Extracts the archive to a temporary directory and reads each KML layer with
sf::read_sf() after sf::st_layers(). Multiple KML files or multiple
layers yield one row per layer; layer_name is simplified when there is only
one layer in one file.
Usage
read_kmz(path, quiet = TRUE, ...)
Arguments
path |
Path to a |
quiet |
Passed to |
... |
Additional arguments passed to |
Value
A tibble with the same columns as read_gdb(). Here fpath is the
path to the original .kmz (not the temporary .kml), and file_type is
typically "kmz". Metadata columns still come from sf::st_layers() on the
extracted KML file used for reading.
Examples
kmz <- system.file("extdata", "misc_example.kmz", package = "misc")
if (nzchar(kmz) && file.exists(kmz)) read_kmz(kmz)
Read shapefile(s) inside a ZIP archive via GDAL /vsizip/
Description
Uses zip::zip_list() to find .shp members, then reads each with
sf::read_sf() on a /vsizip/... path. Multiple shapefiles become one row
each (list-column data), so differing CRS are not merged.
Usage
read_sf_zip(path, quiet = TRUE, ...)
Arguments
path |
Path to a |
quiet |
Passed to |
... |
Additional arguments passed to |
Value
A tibble with fpath (the /vsizip/... dsn), file_type, metadata
from sf::st_layers(), and data (list-column of sf). See read_gdb().
Examples
z <- system.file("extdata", "misc_example.zip", package = "misc")
if (nzchar(z) && file.exists(z)) read_sf_zip(z)
Read an excel sheet and save it to a CSV file
Description
read_sheet_then_save_csv() is heavily inspired in readxl::read_excel()
(actually, this inherit almost all argument from it).
Usage
read_sheet_then_save_csv(
excel_sheet,
path_to_xlsx,
dir_to_save = NULL,
range = NULL,
col_types = NULL,
col_names = TRUE,
na = "",
trim_ws = TRUE,
skip = 0,
n_max = Inf,
guess_max = min(1000, n_max),
.name_repair = "unique"
)
Arguments
excel_sheet |
a character vector with the name of the excel sheet |
path_to_xlsx |
a character vector with the path of the excel file |
dir_to_save |
a character vector with the path to save the csv file. Default is NULL and save the csv in the "data/temp" if it exists. |
range |
A cell range to read from. Includes typical Excel ranges like "B3:D87". |
col_types |
Either NULL to guess all from the spreadsheet or a character vector containing one entry per column from these options: "skip", "guess", "logical", "numeric", "date", "text" or "list". If exactly one col_type is specified, it will be recycled. |
col_names |
TRUE to use the first row as column names |
na |
Character vector of strings to interpret as missing values. By default, treats blank cells as missing data. |
trim_ws |
Should leading and trailing whitespace be trimmed? |
skip |
Minimum number of rows to skip before reading anything, be it column names or data. |
n_max |
Maximum number of data rows to read. |
guess_max |
Maximum number of data rows to use for guessing column types. |
.name_repair |
Handling of column names |
Value
No return value, called for side effects (writes one CSV file for the requested sheet).
Acknowledgment
read_sheet_then_save_csv() is an adaptation of the awesome workflow described
in an article
from {readxl} package site.
Examples
if (interactive()) {
# read and into a csv
misc::create_dirs("ma-box")
xlsx_file <-
system.file("xlsx-examples", "mtcars_workbook_001.xlsx", package = "misc")
read_sheet_then_save_csv(
excel_sheet = "mtcars_sheet_001",
path_to_xlsx = xlsx_file,
dir_to_save = "ma-box"
)
}
Remove columns based on NA values
Description
Remove columns based on NA values
Usage
remove_columns_based_on_NA(data, threshold = 0.5)
Arguments
data |
A data frame or tibble |
threshold |
The proportion of NA values allowed in a column (default: 0.5) |
Value
A data frame with columns removed if they have more than the specified threshold of NA values
Examples
# Create sample data frame with NA values
df <- data.frame(
a = c(1, 2, NA, 4, 5),
b = c(NA, NA, NA, 4, 5),
c = c(1, 2, 3, NA, 5)
)
# Remove columns with more than 50% NA values
remove_columns_based_on_NA(df)
# Use stricter threshold of 10% NA values
remove_columns_based_on_NA(df, threshold = 0.1)
Save a ggplot figure
Description
save_plot() wraps ggplot2::ggsave() and offer option to remove white
spaces around figures (creates a additional file in output/figures/trim;
uses trim_fig)
Usage
save_plot(
object,
filename = NULL,
dir_to_save = NULL,
width = NA,
height = NA,
format = NULL,
units = NULL,
dpi = NULL,
overwrite = FALSE,
trim = FALSE
)
Arguments
object |
a ggplot object |
filename |
a character vector with the name of the file to save. Default is NULL and saves with the name of the object |
dir_to_save |
a character vector with the name of the directory to save |
width |
a numerical vector with the width of the figure |
height |
a numerical vector with the height of the figure |
format |
a character vector with format of the figure. Can "jpeg", "tiff", "png" (default), or "pdf" |
units |
a character vector with the units of the figure size. Can be "in", "cm" (default), or "mm" |
dpi |
a numerical vector with the resolution of the figure. Default is 300 |
overwrite |
logical |
trim |
logical |
Value
No return value, called for side effects (writes a graphics file
via ggplot2::ggsave(), and optionally calls trim_fig()).
Acknowledgment
save_plot() is derived from
write_plot(),
available in the excellent
start project template
Examples
if (interactive()) {
library(misc)
ipak(c("ggplot2", "dplyr"))
create_dirs()
p <- mtcars %>%
ggplot() +
aes(x = mpg, y = cyl) +
geom_point()
save_plot(p)
}
Save object as RDS file
Description
Save object as RDS file
Usage
save_temp_data(object, dir_to_save = NULL)
Arguments
object |
R object |
dir_to_save |
a character vector with the directory name. Default is NULL and save object in the "data/temp" if it exists. |
Value
No return value, called for side effects (writes an .rds file
via saveRDS()).
Examples
if (interactive()) {
# create and save a R object
awesome <- "not too much!"
misc::create_dirs("ma-box")
save_temp_data(object = awesome, dir_to_save = "ma-box")
# using default directories from `misc::create_dirs()`
create_dirs()
so_good <- "Yep!"
save_temp_data(object = so_good)
# reading many temp data
ext <- "\\.[rRdDsS]$"
# list files
files <- list.files(
path = "data/temp",
pattern = ext,
full.names = TRUE
)
# loop over files
for (i in files) {
# read temporary file
tmp <- readRDS(file = i)
# remove extension from filename
obj_name <- gsub(
pattern = ext,
replacement = "",
x = basename(i)
)
# assign name
assign(obj_name, tmp)
}
}
Alternative data.frame viewer using tad
Description
tad_view() is an alternative to View() function when not using
RStudio. Please, make sure you have
tad installed in your
system.
Usage
tad_view(data)
Arguments
data |
a data.frame/tibble data format. |
Value
None
Examples
if (interactive()) {
library(misc)
mtcars %>%
tad_view()
}
Remove white spaces around figures
Description
trim_fig() just remove white spaces around a figure and save it into the
trim folder (maintain the original figure untouchable)
Usage
trim_fig(figure_path, overwrite = FALSE)
Arguments
figure_path |
a character vector with path of the figure |
overwrite |
logical |
Value
No return value, called for side effects (writes a trimmed image
file under a trim/ subdirectory via magick::image_write()).
Acknowledgment
trim_fig() wraps the excellent image_trim() of
{magick}
Examples
if (interactive()) {
library(misc)
ipak(c("ggplot2", "dplyr"))
create_dirs()
p <- mtcars %>%
ggplot() +
aes(x = mpg, y = cyl) +
geom_point()
save_plot(p)
trim_fig("output/figures/p.png")
}
View data frame in Excel or other spreadsheet viewer
Description
Opens a data frame in Microsoft Excel or another spreadsheet viewer. Also copies the data to the system clipboard.
Usage
view_excel(data, viewer = c("libreoffice", "gnumeric", "tad", "excel"))
Arguments
data |
A data frame to view |
viewer |
The spreadsheet viewer to use. One of "libreoffice", "gnumeric", "tad", or "excel" (default) |
Value
Returns nothing
Alternative data.frame viewer
Description
view_in() is an alternative to View() function when not using
RStudio. To date, it works with gnumeric, libreoffice and tad.
Usage
view_in(data, viewer = c("libreoffice", "gnumeric", "tad"))
Arguments
data |
a data.frame/tibble data format. |
viewer |
character app to open the csv file. |
Value
None
Examples
if (interactive()) {
library(misc)
mtcars %>%
view_in()
}
View spatial data from file path with optional map preview
Description
Reads a spatial data file (.shp or .gpkg) and optionally displays it in an interactive map preview. The data is also opened in VisiData for tabular viewing.
Usage
view_mapview_from_path(path, preview = FALSE)
Arguments
path |
Path to the spatial data file (.shp or .gpkg) |
preview |
Logical. If TRUE, opens an interactive map preview in the browser. Default is FALSE. |
Details
The function performs the following steps:
Validates that the input file exists and has the correct extension (.shp or .gpkg)
Creates a temporary HTML file for the map preview in ~/.local/share/mapview/
Reads the spatial data using sf::read_sf()
If preview=TRUE, creates an interactive map using mapview and opens it in the browser
Opens the attribute data in VisiData
Value
Returns nothing, called for side effects
View data in VisiData
Description
Opens data in VisiData through the Terminal application on macOS. If the input is an sf object, the geometry column will be dropped before viewing.
Usage
view_vd(data, type = "csv")
Arguments
data |
A data.frame, tibble, or sf object to view |
type |
Either "csv" or "json" format for writing the temporary file. Use "json" for preserving list-columns. |
Details
This function only works in interactive sessions on macOS. It creates a temporary file and opens it in VisiData through the Terminal application. The temporary filename includes a timestamp for identification.
Value
Returns the input data invisibly
Examples
if (interactive()) {
# View a data frame
mtcars %>% view_vd()
# View with custom title
mtcars %>% view_vd(title = "Car Data")
# View with list columns preserved
nested_df %>% view_vd(type = "json")
}
View data frame in VisiData (non-interactive version)
Description
Opens a data frame in VisiData terminal viewer, saving to a fixed location in Downloads. Similar to view_vd() but without interactive mode check.
Usage
view_vd_nonint(data, title = NULL)
Arguments
data |
A data frame or sf object to view |
title |
Optional title for the viewer window (default: "misc::view_vd") |
Value
Returns the input data frame unchanged