% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/eucop_data_preparation.R
\name{eucop_data_preparation}
\alias{eucop_data_preparation}
\title{Import and preprocess mammal occurrence data}
\usage{
eucop_data_preparation(input.dir,species_name,variables="all",which.vars=NULL,
calibration=FALSE,add.modern.occs=FALSE,
combine.ages=NULL,remove.duplicates=TRUE, bk_points=NULL,output.dir)
}
\arguments{
\item{input.dir}{the file path wherein EutherianCop mammal occurrences and
paleoclimatic data are to be stored.}

\item{species_name}{character. The name of the single (or multiple) species
used by \code{eucop_data_preparation}.}

\item{variables}{character. The name of paleoclimatic simulations to be used.
The viable options are "climveg", "bio", or "all".}

\item{which.vars}{character vector indicating the name of the variables to be
downloaded. The list of accepted names can be found
[here](https://www.nature.com/articles/s41597-024-04181-4/tables/1).}

\item{calibration}{logical. If \code{TRUE}, \code{eucop_data_preparation}
performs the 14C calibration process to convert the conventional radiocarbon
age estimates included in EutherianCop raw data file.}

\item{add.modern.occs}{logical. If \code{TRUE}, \code{eucop_data_preparation}
adds the modern records (if present) related to species in
\code{species_name}.}

\item{combine.ages}{one of \code{"mean"} or \code{"median"}. The method to be
used to aggregate multiple ages for each site or layer within the site.}

\item{remove.duplicates}{logical. If \code{TRUE},
\code{eucop_data_preparation} removes duplicated record for each grid cell
within a given time bin.}

\item{bk_points}{a list including parameters to add background/pseudoabsence
(i.e. absence) points (following the procedure described in Mondanaro et al.
2024). The list includes:
\itemize{\item buff: the proportional distance to set a buffer around the
minimum convex polygon that encompasses all occurrences of the target species.
\item bk_strategy: the strategy to add the absence points. It can be one of
"background" or "pseudoabsence". \item bk_n: number of absence points.}
If provided as an empty \code{list()}, the function automatically sets
\code{buff = 0.1}, \code{bk_strategy="background"},\code{bk_n=10000}.}

\item{output.dir}{the file path wherein \code{eucop_data_preparation} stores
the results.}
}
\value{
\code{eucop_data_preparation} does not store any results in the global
 environment. Instead, a list of GeoPackage files, one per selected species,
 is saved in the directory specified by \code{output.dir}. The names of these
 files depend on the combination of arguments chosen by users: they include
 the suffix "cal/uncal" and "combined/multi" depending on whether calibration
 (\code{calibration}) and age aggregation (\code{combine.ages}) steps are
 performed. In any case, output files include information about ages, a
 column called "OBS" including species occurrence data in binary format,
 spatial geometry, and all the data information derived from EutherianCop
 dataset.
}
\description{
The function is meant to automatically import and preprocess
 fossil mammal occurrences and paleoclimatic/vegetational data available in
 EutherianCop dataset (Mondanaro et al., 2025). It also provides two distinct
 approaches, both implemented within a user-defined study area, for sampling
 a specified number of pseudoabsences or alternatively defining the
 background points. This flexibility enables users to assemble a list of
 \code{sf} objects that can be easily used to train ENFA, ENphylo or any
 other SDM algorithms of their choice.
}
\details{
The variables argument allows the selection of climatic and
 environmental variables ("climveg"), bioclimatic variables ("bio"), or both
 sets of variables.

Through the \code{bk_strategy} argument,
 \code{eucop_data_preparation} offers two different approaches to generate
 absence points. The definition of the study area is the same for both
 methods. Under \code{bk_strategy = "background"}, the \code{bk_n} argument
 defines the maximum number of background points sampled from the study area
 within each time bin. Under \code{bk_strategy = "pseudoabsence"}, the
 \code{bk_n} argument represents the maximum number of pseudoabsence points
 across all time bins. This flexibility allows users to accommodate the
 different requirements for training the traditional envelope models (i.e.
 ENFA, ENphylo) and the common correlative or machine learning models (i.e.
 generalized linear model, MaxEnt, Random Forest).

Additionally, if \code{bk_points} is not \code{NULL}, the ages of
 presences and pseudoabsences or background points are forced to 1 kyr
 resolution according to the temporal resolution of the
 paleoclimatic/vegetational or bioclimatic data.
}
\examples{
\dontrun{

newwd<-tempdir()
# newwd<-"YOUR_DIRECTORY"

eucop_data_preparation(input.dir=newwd, species_name="Ursus ingressus",
                       variables="bio",which.vars = "bio1", calibration=FALSE, combine.ages="mean",
                       bk_points=NULL,output.dir=newwd)

}
}
\references{
Mondanaro, A., Di Febbraro, M., Castiglione, S., Belfiore, A. M.,
 Girardi, G., Melchionna, M., Serio, C., Esposito, A., & Raia, P. (2024).
 Modelling reveals the effect of climate and land use change on Madagascar’s
 chameleons fauna. \emph{Communications Biology}, 7: 889.
 doi:10.1038/s42003-024-06597-5.

Mondanaro, A., Girardi, G., Castiglione, S., Timmermann, A.,
 Zeller, E., Venugopal, T., Serio, C., Melchionna, M., Esposito, A., Di
 Febbraro, M., & Raia, P. (2025). EutherianCoP. An integrated biotic and
 climate database for conservation paleobiology based on eutherian mammals.
 \emph{Scientific Data}, 12: 6. doi:10.1038/s41597-024-04181-4.
}
\seealso{
\href{../doc/Preparing-Data.html}{\code{eucop_data_preparation} vignette}
}
\author{
Alessandro Mondanaro, Silvia Castiglione, Pasquale Raia
}
