% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/outlier_fn.R
\name{outliers.detect.mass}
\alias{outliers.detect.mass}
\title{Detect outliers for a multi-species set of geographical coordinates}
\usage{
outliers.detect.mass(
  test,
  train = NULL,
  path = NULL,
  strategy = "majority",
  hi_res = FALSE,
  crop = FALSE,
  threshold = 0.05
)
}
\arguments{
\item{test}{data.frame. With three columns containing species, latitude and longitude, describing
the locations of a species, which may contain outliers.}

\item{train}{data.frame. With the same formatting as \code{longlat}, indicating only known
locations where a target species occurs. Used exclusively as training data for
method 'svm'. In order for outlier detection to work the training data supplied
must have valid environmental data, if you suspect this might not be the case,
run \code{\link[terra:extract]{terra::extract()}} using your 
downloaded \link[gecko:gecko.worldclim.load]{WorldClim} data.}

\item{path}{character. Path to a folder where plots scrutinizing decision making
per species should be saved.}

\item{strategy}{character. Strategy to use for combining the decisions of the
outlier detection methods used. Either \code{"permissive"}, \code{"majority"}
or \code{"conservative"}.  In \code{"permissive"}, only points marked as
potential outliers by all methods selected will be rejected. In \code{"majority"}
a decision is made based on popular vote. If popular vote cannot be achieved, the point is 
rejected by default. In \code{"conservative"}
all points marked as outliers by any number of methods will be rejected.}

\item{hi_res}{logical. Specifies if 1 KM resolution environmental data should be used. 
If \code{FALSE} 10 KM resolution data is used instead.}

\item{crop}{logical. Indicates whether environmental data should be cropped to
an extent similar to what is given in \code{longlat} and \code{training}. Useful to avoid
large processing times of higher resolutions.}

\item{threshold}{numeric. Value indicating the threshold for classifying 
outliers in methods \code{"geo"} and \code{"env"}. E.g.: under the default
 of 0.05, points that are at an average distance greater than the 95% quartile
 of the average distances of all points, will be classified as outliers.}
}
\value{
list. With the first element being a dataset containing all elements 
of the original test set except for those \code{rejected}. The second element 
is a table scrutinizing how many data points belonged to species \code{not_in_common},
those where a decision was not passed due to \code{insufficient_data},
and the ones that were \code{accepted} and \code{rejected}, with the latter being accompanied
by how much each group of methods was used as basis, e.g: \code{env;geo}.
}
\description{
This function runs the outlier detection methods described in 
\link[gecko:outliers.detect]{gecko::outliers.detect()} but for multi-species datasets,
automatically adjusting for the amount of data available and strategy chosen.
Species must have at least 3 data points in order to be processed. Additionally,
inclusion of a training dataset will induce the function to use method "svm" which has
an added restriction of needing at least 5 training points. For now species with
insufficient data are accepted by default but future updates will allow users
to choose a "lack of data" strategy.
}
\details{
Environmental data used is WorldClim and requires a long download, see
\code{\link[gecko:gecko.setDir]{gecko::gecko.setDir()}}
This function is a version of \code{\link[gecko:outliers.detect]{gecko::outliers.detect()}}
tailored for ease of handling datasets with multiple species. For details on 
the methodology used to detect outliers please consult the documentation for that function.
}
\examples{
\dontrun{
old_occurrences = gecko.data("records")
colnames(old_occurrences) = c("species", "long", "lat")
new_occurrences = data.frame(
species = rep(c("Hogna maderiana", "Malthonica oceanica", "Agroeca inopina"), each = 50),
long = c(runif(50, -17.1, -16.09), runif(50, -8.8, -7), runif(50, -6, -2)),
lat = c(runif(50, 32.73, 32.76), runif(50, 39.5, 40), runif(50, 40, 42))
)
outliers.detect.mass(new_occurrences, train = old_occurrences, path = path)
}
}
