% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/DoU-classify-grid.R
\name{DoU_classify_grid}
\alias{DoU_classify_grid}
\title{Create the DEGURBA grid cell classification}
\usage{
DoU_classify_grid(
  data,
  level1 = TRUE,
  parameters = NULL,
  values = NULL,
  regions = FALSE,
  filename = NULL
)
}
\arguments{
\item{data}{path to the directory with the data, or named list with the data as returned by function \code{\link[=DoU_preprocess_grid]{DoU_preprocess_grid()}}}

\item{level1}{logical. Whether to classify the grid according to first hierarchical level (\code{TRUE}) or the second hierarchical level (\code{FALSE}). For more details, see section "Classification rules" below.}

\item{parameters}{named list with the parameters to adapt the standard specifications in the Degree of Urbanisation classification. For more details, see section "Custom specifications" below.}

\item{values}{vector with the values assigned to the different classes in the resulting classification:
\itemize{
\item If \code{level1=TRUE}: the vector should contain the values for (1) urban centres, (2) urban clusters, (3) rural grid cells and (4) water cells.
\item If \code{level1=FALSE}: the vector should contain the values for (1) urban centres, (2) dense urban clusters, (3) semi-dense urban clusters, (4) suburban or peri-urban cells, (5) rural clusters, (6) low density rural cells, (7) very low density rural cells and (8) water cells.
}}

\item{regions}{logical. Whether to execute the classification in the memory-efficient pre-defined regions. For more details, see section "Regions" below (Note that this requires a large amount of memory).}

\item{filename}{character. Output filename (with extension \code{.tif}). The grid classification together with a metadata file (in JSON format) will be saved if \code{filename} is not \code{NULL}.}
}
\value{
SpatRaster with the grid cell classification
}
\description{
The function reconstructs the grid cell classification of the Degree of Urbanisation. The arguments of the function allow to adapt the standard specifications in the Degree of Urbanisation in order to construct an alternative version  (see section "Custom specifications" below).

For more information about the Degree of Urbanisation methodology, see the \href{https://ec.europa.eu/eurostat/statistics-explained/index.php?title=Applying_the_degree_of_urbanisation_manual}{methodological manual}, \href{https://ghsl.jrc.ec.europa.eu/documents/GHSL_Data_Package_2022.pdf}{GHSL Data Package 2022} and \href{https://ghsl.jrc.ec.europa.eu/documents/GHSL_Data_Package_2023.pdf}{GHSL Data Package 2023}.
}
\section{Classification rules}{


The Degree of Urbanisation consists of two hierarchical levels. In level 1, the cells of a 1 km² grid are classified in urban centres, urban clusters and rural cells (and water cells). In level 2, urban cluster are further divided in dense urban clusters, semi-dense urban clusters and suburbs or peri-urban cells. Rural cells are further divided in rural clusters, low density rural cells and very low density rural cells.

The detailed classification rules are as follows:

\strong{LEVEL 1:}
\itemize{
\item \strong{Urban centres} are identified as clusters of continuous grid cells (based on rook contiguity) with a minimum density of 1500 inhabitants per km² (or with a minimum built-up area; see section "Built-up area criterium" below), and a minimum total population of 50 000 inhabitants. Gaps smaller than 15 km² in the urban centres are filled and edges are smoothed by a 3x3-majority rule (see section "Edge smoothing" below).
\item \strong{Urban clusters} are identified as clusters of continuous grid cells (based on queen contiguity) with a minimum density of 300 inhabitants per km², and a minimum total population of 5000 inhabitants.
\item \strong{Water cells} contain no built-up area, no population, and less than 50\% permanent land. All other cells not belonging to an urban centre or urban cluster are considered \strong{rural cells}.
}

\strong{LEVEL 2:}
\itemize{
\item \strong{Urban centres} are identified as clusters of continuous grid cells (based on rook contiguity) with a minimum density of 1500 inhabitants per km² (or with a minimum built-up area; see section "Built-up area criterium" below), and a minimum total population of 50 000 inhabitants. Gaps smaller than 15 km² in the urban centres are filled and edges are smoothed by a 3x3-majority rule (see section "Edge smoothing" below).
\item \strong{Dense urban clusters} are identified as clusters of continuous grid cells (based on rook contiguity) with a minimum density of 1500 inhabitants per km² (or with a minimum built-up area; see section "Built-up area criterium" below), and a minimum total population of 5000 inhabitants.
\item \strong{Semi-dense urban clusters} are identified as clusters of continuous grid cells (based on rook contiguity) with a minimum density of 900 inhabitants per km², and a minimum total population of 2500 inhabitants, that are not within 2 km away from urban centres and dense urban clusters. Clusters that are within 2 km away are classified as \strong{suburban and peri-urban cells}.
\item \strong{Rural clusters} are clusters of continuous grid cells (based on queen contiguity) with a minimum density of 300 inhabitants per km², and a minimum total population of 500 inhabitants.
\item \strong{Low density rural cells} are remaining cells with a population density less than 50 inhabitants per km².
\item \strong{Water cells} contain no built-up area, no population, and less than 50\% permanent land. All cells not belonging to an other class are considered \strong{very low density rural cells}.
}

For more information about the Degree of Urbanisation methodology, see the \href{https://ec.europa.eu/eurostat/statistics-explained/index.php?title=Applying_the_degree_of_urbanisation_manual}{methodological manual}, \href{https://ghsl.jrc.ec.europa.eu/documents/GHSL_Data_Package_2022.pdf}{GHSL Data Package 2022} and \href{https://ghsl.jrc.ec.europa.eu/documents/GHSL_Data_Package_2023.pdf}{GHSL Data Package 2023}.
}

\section{Custom specifications}{


The function allows to change the standard specifications of the Degree of Urbanisation in order to construct an alternative version of the grid classification. Custom specifications can be passed in a named list by the argument \code{parameters}. The supported parameters with their default values are returned by the function \code{\link[=DoU_get_grid_parameters]{DoU_get_grid_parameters()}} and are as follows:

\strong{LEVEL 1}
\itemize{
\item \code{UC_density_threshold} numeric (default: \code{1500}).

Minimum population density per permanent land of a cell required to belong to an urban centre
\item \code{UC_size_threshold} numeric (default: \code{50000}).

Minimum total population size required for an urban centre
\item \code{UC_contiguity_rule} integer (default: \code{4}).

Which cells are considered adjacent in urban centres: \code{4} for rooks case (horizontal and vertical neighbours) or \code{8} for queens case (horizontal, vertical and diagonal neighbours)
\item \code{UC_built_criterium} logical (default: \code{TRUE}).

Whether to use the additional built-up area criterium (see section "Built-up area criterium" below). If \code{TRUE}, not only cells that meet the population density requirement will be considered when delineating urban centres, but also cells with a built-up area per permanent land above the \code{UC_built_threshold}
\item \code{UC_built_threshold} numeric or character (default: \code{0.2}).

Additional built-up area threshold. Can be a value between \code{0} and \code{1}, representing the minimum built-up area per permanent land, or \code{"optimal"} (see section "Built-up area criterium" below). Ignored when \code{UC_built_criterium} is \code{FALSE}.
\item \code{built_optimal_data} character / list (default: \code{NULL}).

Path to the directory with the data, or named list with the data as returned by function \code{\link[=DoU_preprocess_grid]{DoU_preprocess_grid()}} used to determine the optimal built threshold (see section "Built-up area criterium" below). Ignored when \code{UC_built_criterium} is \code{FALSE} or when \code{UC_built_threshold} is not \code{"optimal"}.
\item \code{UC_smooth_pop} logical (default: \code{FALSE}).

Whether to smooth the population grid before delineating urban centres. If \code{TRUE}, the population grid will be smoothed with a moving average of window size \code{UC_smooth_pop_window}.
\item \code{UC_smooth_pop_window} integer (default: \code{5}).

Size of the moving window used to smooth the population grid before delineating urban centres. Ignored when \code{UC_smooth_pop} is \code{FALSE}.
\item \code{UC_gap_fill} logical (default: \code{TRUE}).

Whether to perform gap filling. If \code{TRUE}, gaps in urban centres smaller than \code{UC_max_gap} are filled.
\item \code{UC_max_gap} integer (default: \code{15}).

Gaps with an area smaller than this threshold in urban centres will be filled (unit is km²). Ignored when \code{UC_gap_fill} is \code{FALSE}.
\item \code{UC_smooth_edge} logical (default: \code{TRUE}).

Whether to perform edge smoothing. If \code{TRUE}, edges of urban centres are smoothed with the function \code{UC_smooth_edge_fun}.
\item \code{UC_smooth_edge_fun} character / function (default: \code{"majority_rule_R2023A"}).

Function used to smooth the edges of urban centres. Ignored when \code{UC_smooth_edge} is \code{FALSE}. Possible values are:
\itemize{
\item \code{"majority_rule_R2022A"} to use the edge smoothing algorithm in GHSL Data Package 2022 (see section "Edge smoothing" below)
\item \code{"majority_rule_R2023A"} to use the edge smoothing algorithm in GHSL Data Package 2023 (see section "Edge smoothing" below)
\item a custom function with a signature similar as \code{\link[=apply_majority_rule]{apply_majority_rule()}}.
}
\item \code{UCL_density_threshold} numeric (default: \code{300}).

Minimum population density per permanent land of a cell required to belong to an urban cluster
\item \code{UCL_size_threshold} numeric (default: \code{5000}).

Minimum total population size required for an urban cluster
\item \code{UCL_contiguity_rule} integer (default: \code{8}).

Which cells are considered adjacent in urban clusters: \code{4} for rooks case (horizontal and vertical neighbours) or \code{8} for queens case (horizontal, vertical and diagonal neighbours)
\item \code{UCL_smooth_pop} logical (default: \code{FALSE}).

Whether to smooth the population grid before delineating urban clusters. If \code{TRUE}, the population grid will be smoothed with a moving average of window size \code{UCL_smooth_pop_window}.
\item \code{UCL_smooth_pop_window} integer (default: \code{5}).

Size of the moving window used to smooth the population grid before delineating urban clusters. Ignored when \code{UCL_smooth_pop} is \code{FALSE}.
\item \code{water_land_threshold} numeric (default: \code{0.5}).

Maximum proportion of permanent land allowed in a water cell
\item \code{water_pop_threshold} numeric (default: \code{0}).

Maximum population size allowed in a water cell
\item \code{water_built_threshold} numeric (default: \code{0}).

Maximum built-up area allowed in a water cell
}

\strong{LEVEL 2}
\itemize{
\item \code{UC_density_threshold} numeric (default: \code{1500}).

Minimum population density per permanent land of a cell required to belong to an urban centre
\item \code{UC_size_threshold} numeric (default: \code{50000}).

Minimum total population size required for an urban centre
\item \code{UC_contiguity_rule} integer (default: \code{4}).

Which cells are considered adjacent in urban centres: \code{4} for rooks case (horizontal and vertical neighbours) or \code{8} for queens case (horizontal, vertical and diagonal neighbours)
\item \code{UC_built_criterium} logical (default: \code{TRUE}).

Whether to use the additional built-up area criterium (see section "Built-up area criterium" below). If \code{TRUE}, not only cells that meet the population density requirement will be considered when delineating urban centres, but also cells with a built-up area per permanent land above the \code{UC_built_threshold}
\item \code{UC_built_threshold} numeric or character (default: \code{0.2}).

Additional built-up area threshold. Can be a value between \code{0} and \code{1}, representing the minimum built-up area per permanent land, or \code{"optimal"} (see section "Built-up area criterium" below). Ignored when \code{UC_built_criterium} is \code{FALSE}.
\item \code{built_optimal_data} character / list (default: \code{NULL}).

Path to the directory with the data, or named list with the data as returned by function \code{\link[=DoU_preprocess_grid]{DoU_preprocess_grid()}} used to determine the optimal built threshold (see section "Built-up area criterium" below). Ignored when \code{UC_built_criterium} is \code{FALSE} or when \code{UC_built_threshold} is not \code{"optimal"}.
\item \code{UC_smooth_pop} logical (default: \code{FALSE}).

Whether to smooth the population grid before delineating urban centres. If \code{TRUE}, the population grid will be smoothed with a moving average of window size \code{UC_smooth_pop_window}.
\item \code{UC_smooth_pop_window} integer (default: \code{5}).

Size of the moving window used to smooth the population grid before delineating urban centres. Ignored when \code{UC_smooth_pop} is \code{FALSE}.
\item \code{UC_gap_fill} logical (default: \code{TRUE}).

Whether to perform gap filling. If \code{TRUE}, gaps in urban centres smaller than \code{UC_max_gap} are filled.
\item \code{UC_max_gap} integer (default: \code{15}).

Gaps with an area smaller than this threshold in urban centres will be filled (unit is km²). Ignored when \code{UC_gap_fill} is \code{FALSE}.
\item \code{UC_smooth_edge} logical (default: \code{TRUE}).

Whether to perform edge smoothing. If \code{TRUE}, edges of urban centres are smoothed with the function \code{UC_smooth_edge_fun}.
\item \code{UC_smooth_edge_fun} character / function (default: \code{"majority_rule_R2023A"}).

Function used to smooth the edges of urban centres. Ignored when \code{UC_smooth_edge} is \code{FALSE}. Possible values are:
\itemize{
\item \code{"majority_rule_R2022A"} to use the edge smoothing algorithm in GHSL Data Package 2022 (see section "Edge smoothing" below)
\item \code{"majority_rule_R2023A"} to use the edge smoothing algorithm in GHSL Data Package 2023 (see section "Edge smoothing" below)
\item a custom function with a signature similar as \code{\link[=apply_majority_rule]{apply_majority_rule()}}.
}
\item \code{DUC_density_threshold} numeric (default: \code{1500}).

Minimum population density required for a dense urban cluster
\item \code{DUC_size_threshold} numeric (default: \code{5000}).

Minimum total population size required for a dense urban cluster
\item \code{DUC_built_criterium} logical (default: \code{TRUE}).

Whether to use the additional built-up area criterium (see section "Built-up area criterium" below). If \code{TRUE}, not only cells that meet the population density requirement will be considered when delineating dense urban clusters, but also cells with a built-up area per permanent land above the \code{DUC_built_threshold}
\item \code{DUC_built_threshold} numeric or character (default: \code{0.2}).

Additional built-up area threshold. Can be a value between \code{0} and \code{1}, representing the minimum built-up area per permanent land, or \code{"optimal"} (see section "Built-up area criterium" below). Ignored when \code{DUC_built_criterium} is \code{FALSE}.
\item \code{DUC_contiguity_rule} integer (default: \code{4}).

Which cells are considered adjacent in dense urban clusters: \code{4} for rooks case (horizontal and vertical neighbours) or \code{8} for queens case (horizontal, vertical and diagonal neighbours)
\item \code{SDUC_density_threshold} numeric (default: \code{900}).

Minimum population density per permanent land of a cell required to belong to a semi-dense urban cluster
\item \code{SDUC_size_threshold} numeric (default: \code{2500}).

Minimum total population size required for a semi-dense urban cluster
\item \code{SDUC_contiguity_rule} integer (default: \code{4}).

Which cells are considered adjacent in semi-dense urban clusters: \code{4} for rooks case (horizontal and vertical neighbours) or \code{8} for queens case (horizontal, vertical and diagonal neighbours)
\item \code{SDUC_buffer_size} integer (default: \code{2}).

The distance to urban centres and dense urban clusters required for a semi-dense urban cluster
\item \code{SUrb_density_threshold} numeric (default: \code{300}).

Minimum population density per permanent land of a cell required to belong to a suburban or peri-urban area
\item \code{SUrb_size_threshold} numeric (default: \code{5000}).

Minimum total population size required for a suburban or peri-urban area
\item \code{SUrb_contiguity_rule} integer (default: \code{8}).

Which cells are considered adjacent in suburban or peri-urban area: \code{4} for rooks case (horizontal and vertical neighbours) or \code{8} for queens case (horizontal, vertical and diagonal neighbours)
\item \code{RC_density_threshold} numeric (default: \code{300}).

Minimum population density per permanent land of a cell required to belong to a rural cluster
\item \code{RC_size_threshold} numeric (default: \code{500}).

Minimum total population size required for a rural cluster
\item \code{RC_contiguity_rule} integer (default: \code{8}).

Which cells are considered adjacent in rural clusters:  \code{4} for rooks case (horizontal and vertical neighbours) or \code{8} for queens case (horizontal, vertical and diagonal neighbours)
\item \code{LDR_density_threshold} numeric (default: \code{50}).

Minimum population density per permanent land of a low density rural grid cell
\item \code{water_land_threshold} numeric (default: \code{0.5}).

Maximum proportion of permanent land allowed in a water cell
\item \code{water_pop_threshold} numeric (default: \code{0}).

Maximum population size allowed in a water cell
\item \code{water_built_threshold} numeric (default: \code{0}).

Maximum built-up area allowed in a water cell
}
}

\section{Built-up area criterium}{


In Data Package 2022, the Degree of Urbanisation includes an optional built-up area criterium to account for the presence of office parks, shopping malls, factories and transport infrastructure. When the setting is enabled, urban centres (and dense urban clusters) are created using both cells with a population density of at least 1500 inhabitants per km² \emph{and} cells that have at least 50\% built-up area on permanent land. For more information: see \href{https://ghsl.jrc.ec.europa.eu/documents/GHSL_Data_Package_2022.pdf}{GHSL Data Package 2022, footnote 25}. The parameter settings \code{UC_built_criterium=TRUE} and \code{UC_built_threshold=0.5} (level 1 & 2) and \code{DUC_built_criterium=TRUE} and \code{DUC_built_threshold=0.5} (level 2) reproduce this built-up area criterium in urban centres and dense urban clusters respectively.

In Data Package 2023, the built-up area criterium is slightly adapted and renamed to the "Reduce Fragmentation Option". Instead of using a fixed threshold of built-up area per permanent land of 50\%, an "optimal" threshold is employed. The optimal threshold is dynamically identified as the global average built-up area proportion in clusters with a density of at least 1500 inhabitants per permanent land with a minimum population of 5000 people. We determined empirically that this optimal threshold is 20\% for the data of 2020. For more information: see \href{https://ghsl.jrc.ec.europa.eu/documents/GHSL_Data_Package_2023.pdf}{GHSL Data Package 2023, footnote 30}. The "Reduce Fragmentation Option" can be reproduced with the parameter settings \code{UC_built_criterium=TRUE} and \code{UC_built_threshold="optimal"} (level 1 & 2) and \code{DUC_built_criterium=TRUE} and \code{DUC_built_threshold="optimal"} (level 2). In addition, the parameter \code{built_optimal_data} must contain the path to the directory with the (global) data to compute the optimal built-up area threshold.
}

\section{Edge smoothing}{


In Data Package 2022, edges of urban centres are smoothed by an iterative majority rule. The majority rule works as follows: if a cell has at least five of the eight surrounding cells belonging to an unique urban centre, then the cell is added to that urban centre. The process is iteratively repeated until no more cells are added. The parameter setting \code{UC_smooth_edge=TRUE} and \code{UC_smooth_edge_fun="majority_rule_R2022A"} reproduces this edge smoothing rule.

In Data Package 2023, the majority rule is slightly adapted. A cell is added to an urban centre if the majority of the surrounding cells belongs to an unique urban centre, with majority only computed among populated or land cells (proportion of permanent land > 0.5). In addition, cells with permanent water are never added to urban centres. The process is iteratively repeated until no more cells are added. For more information: see \href{https://ghsl.jrc.ec.europa.eu/documents/GHSL_Data_Package_2023.pdf}{GHSL Data Package 2023, footnote 29}. The parameter setting \code{UC_smooth_edge=TRUE} and \code{UC_smooth_edge_fun="majority_rule_R2023A"} reproduces this edge smoothing rule.
}

\section{Regions}{

Because of the large amount of data at a global scale, the grid classification procedure is quite memory-consuming. To optimise the procedure, we divided the world in 9 pre-defined regions. These regions are the smallest grouping of GHSL tiles while ensuring that no continuous land mass is split into two different regions (for more information, see the figure below and \code{\link{GHSL_tiles_per_region}}).

If \code{regions=TRUE}, a global grid classification is created by (1) executing the grid classification procedure separately in the 9 pre-defined regions, and (2) afterwards merging these classifications together. The argument \code{data} should contain the path to a directory with the data of all pre-defined regions (for example as created by \verb{download_GHSLdata(... extent="regions"}). Note that although the grid classification is optimised, it still takes approx. 145 minutes and requires 116 GB RAM to execute the grid classification with the standard parameters (performed on a Kubernetes server with 32 cores and 256 GB RAM). For a concrete example on how to construct the grid classification on a global scale, see \code{vignette("vig3-DoU-global-scale")}.

\figure{figure_GHSL_tiles_per_region.png}{GHSL tiles}
}

\examples{
# load the data
data_belgium <- DoU_load_grid_data_belgium()

# classify with standard parameters:
classification1 <- DoU_classify_grid(data = data_belgium)

\donttest{
# classify with custom parameters:
classification2 <- DoU_classify_grid(
  data = data_belgium,
  parameters = list(
    UC_density_threshold = 3000,
    UC_size_threshold = 75000,
    UC_gap_fill = FALSE,
    UC_smooth_edge = FALSE,
    UCL_contiguity_rule = 4
  )
)
}

}
