geobounds: Accessing global administrative boundary data in R

Important

Attribution is required when using geoBoundaries.

Introduction

The geobounds package provides a straightforward interface for downloading and working with global administrative boundary data from the geoBoundaries project (Runfola et al. 2020).

The default gbOpen release is CC BY 4.0 compliant when attribution is provided and covers countries worldwide across multiple administrative levels. The package also supports gbHumanitarian and gbAuthoritative release types, which vary in source, validation process and licensing. With geobounds, you can fetch boundary geometries as sf objects, explore metadata, cache datasets locally and integrate the boundaries into your spatial workflows.

Understanding the data

The geoBoundaries database undergoes a rigorous quality assurance process, including manual review and hand-digitization of physical maps where necessary. Its primary goal is to provide the highest possible level of spatial accuracy for scientific and academic applications.

This precision comes at a cost: some files can be quite large and may take longer to download. For visualization or general mapping purposes, we recommend using the simplified datasets by setting simplified = TRUE.

library(geobounds)
library(ggplot2)
library(dplyr)

# Compare resolutions.
norway <- gb_get_adm0("NOR") |>
  mutate(res = "Full resolution")
print(object.size(norway), units = "Mb")
#> 26.5 Mb

norway_simp <- gb_get_adm0(country = "NOR", simplified = TRUE) |>
  mutate(res = "Simplified")
print(object.size(norway_simp), units = "Mb")
#> 1.5 Mb

norway_all <- bind_rows(norway, norway_simp)

# Plot with ggplot2.
ggplot(norway_all) +
  geom_sf(fill = "#BA0C2F", color = "#00205B") +
  facet_wrap(vars(res)) +
  theme_minimal() +
  labs(caption = "Source: www.geoboundaries.org")
Comparison between full vs. simplified map.

Comparison between full vs. simplified map.

Individual country files

The geoBoundaries API provides individual country files, whose aim is to represent every nation “as they would represent themselves”, without special identification of disputed areas.

Download individual country files with gb_get() or the ?gb_get_adm wrappers. Borders are not guaranteed to align perfectly, gaps may exist between countries and disputed territories may not be represented consistently.

india_pak <- gb_get_adm0(c("India", "Pakistan"))

# Highlight the disputed Kashmir area.
ggplot(india_pak) +
  geom_sf(aes(fill = shapeName), alpha = 0.5) +
  scale_fill_manual(values = c("#FF671F", "#00401A")) +
  labs(
    fill = "Country",
    title = "Map of India and Pakistan",
    subtitle = "Note the overlap in the Kashmir region",
    caption = "Source: www.geoboundaries.org"
  )
Map showing overlap in the disputed Kashmir area.

Map showing overlap in the disputed Kashmir area.

Note that individual country files are governed by the license or licenses identified within the metadata for each respective boundary.

gb_get_metadata(c("India", "Pakistan"), adm_lvl = "ADM0") |>
  select(boundaryName, boundaryLicense, boundarySource)
#> # A tibble: 2 × 3
#>   boundaryName boundaryLicense                      boundarySource
#>   <chr>        <chr>                                <chr>         
#> 1 India        CC0 1.0 Universal (CC0 1.0) Public … geoBoundaries…
#> 2 Pakistan     Open Data Commons Open Database Lic… OpenStreetMap…

Global composite files

Use gb_get_world() for data where disputed areas are explicitly handled by removing overlaps and filling gaps. This function downloads global composite files for administrative boundaries, also known as Comprehensive Global Administrative Zones (CGAZ). There are three important distinctions between CGAZ and individual country files:

  1. Extensive simplification is performed to ensure that file sizes are small enough to be used in most traditional desktop software.
  2. Disputed areas are removed and replaced with polygons following US Department of State definitions.
  3. Gaps between borders have been filled.
cgaz_india_pak <- gb_get_world(c("India", "Pakistan"))

ggplot(cgaz_india_pak) +
  geom_sf(aes(fill = shapeName), alpha = 0.5) +
  scale_fill_manual(values = c("#FF671F", "#00401A")) +
  labs(
    fill = "Country",
    title = "Map of India and Pakistan",
    subtitle = "CGAZ does not overlap",
    caption = "Source: www.geoboundaries.org"
  )
Map showing no overlap in Kashmir, provided by CGAZ.

Map showing no overlap in Kashmir, provided by CGAZ.

Caching and performance

The package provides a built-in mechanism to cache files locally so that repeated downloads for the same country and administrative level use the cached version. For example:

# Show the current folder.
current <- gb_detect_cache_dir()
#> ℹ 'C:\Users\diego\AppData\Local\Temp\RtmpoLuvEM'

current
#> [1] "C:\\Users\\diego\\AppData\\Local\\Temp\\RtmpoLuvEM"

# Change to a new folder.
newdir <- file.path(tempdir(), "/geoboundvignette")
gb_set_cache_dir(newdir)
#> ✔ geobounds cache directory is 'C:\Users\diego\AppData\Local\Temp\RtmpoLuvEM//geoboundvignette'.
#> ℹ To install your `cache_dir` path for use in future sessions run this function with `install = TRUE`.

# Download the example data.
example <- gb_get_adm0("Vatican City", quiet = FALSE)
#> ℹ Downloading file from <https://github.com/wmgeolab/geoBoundaries/raw/9469f09/releaseData/gbOpen/VAT/ADM0/geoBoundaries-VAT-ADM0-all.zip>.
#> → Cache directory is 'C:\Users\diego\AppData\Local\Temp\RtmpoLuvEM//geoboundvignette/gbOpen'.

# Restore the cache directory.
gb_set_cache_dir(current)
#> ✔ geobounds cache directory is 'C:\Users\diego\AppData\Local\Temp\RtmpoLuvEM'.
#> ℹ To install your `cache_dir` path for use in future sessions run this function with `install = TRUE`.

current == gb_detect_cache_dir()
#> ℹ 'C:\Users\diego\AppData\Local\Temp\RtmpoLuvEM'
#> [1] TRUE

To clear the cache, use gb_clear_cache().

Set a specific cache directory for each function call with the cache_dir argument.

Use in spatial analysis pipelines

Because the boundaries are returned as sf objects, you can use them with other spatial data:

This example creates a choropleth map using metadata from individual country files and boundary data from CGAZ:

# Retrieve metadata.

latam_meta <- gb_get_metadata(adm_lvl = "ADM0") |>
  select(boundaryISO, boundaryName, Continent, worldBankIncomeGroup) |>
  filter(Continent == "Latin America and the Caribbean") |>
  glimpse()
#> Rows: 47
#> Columns: 4
#> $ boundaryISO          <chr> "ABW", "AIA", "ARG", "ATG", "BES", …
#> $ boundaryName         <chr> "Aruba", "Anguilla", "Argentina", "…
#> $ Continent            <chr> "Latin America and the Caribbean", …
#> $ worldBankIncomeGroup <chr> "High-income Countries", "No income…

# Adjust factors.
latam_meta$income_factor <- factor(
  latam_meta$worldBankIncomeGroup,
  levels = c(
    "High-income Countries",
    "Upper-middle-income Countries",
    "Lower-middle-income Countries",
    "Low-income Countries"
  )
)

# Get the shapes from CGAZ.
latam_sf <- gb_get_world(adm_lvl = "ADM0") |>
  inner_join(latam_meta, by = c("shapeGroup" = "boundaryISO"))

ggplot(latam_sf) +
  geom_sf(aes(fill = income_factor)) +
  scale_fill_brewer(palette = "Greens", direction = -1) +
  guides(fill = guide_legend(position = "bottom", nrow = 2)) +
  coord_sf(
    crs = "+proj=laea +lon_0=-75 +lat_0=-15"
  ) +
  labs(
    title = "World Bank Income Group",
    subtitle = "Latin America and the Caribbean",
    fill = "",
    caption = "Source: www.geoboundaries.org"
  )
World Bank Income Group: Latin America and the Caribbean.

World Bank Income Group: Latin America and the Caribbean.

Summary

The geobounds package makes it easy to fetch, manage and visualize administrative boundary data worldwide in a reproducible way. Whether you are mapping, doing spatial analysis, integrating survey data or modeling geospatial patterns, it gives you access to high-quality boundary data with minimal overhead.

References

Runfola, Daniel, Austin Anderson, Heather Baier, et al. 2020. geoBoundaries: A Global Database of Political Administrative Boundaries.” PLOS ONE 15 (4): 1–9. https://doi.org/10.1371/journal.pone.0231866.