% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/dplyr-filter.R, R/galah_filter.R
\name{filter.data_request}
\alias{filter.data_request}
\alias{filter.metadata_request}
\alias{filter.files_request}
\alias{galah_filter}
\title{Keep rows that match a condition}
\usage{
\method{filter}{data_request}(.data, ...)

\method{filter}{metadata_request}(.data, ...)

\method{filter}{files_request}(.data, ...)

galah_filter(...)
}
\arguments{
\item{.data}{An object of class \code{data_request}, \code{metadata_request}
or \code{files_request}, created using \code{\link[=galah_call]{galah_call()}} or related functions.}

\item{...}{Expressions that return a logical value, and are defined in terms
of the variables in the selected atlas (and checked using \code{show_all(fields)}.
If multiple expressions are included, they are combined with the & operator.
Only rows for which all conditions evaluate to \code{TRUE} are kept.}
}
\value{
A tibble containing filter values.
}
\description{
The \code{filter()} function is used to subset a data, retaining all rows that
satisfy your conditions. To be retained, the row must produce a value of
\code{TRUE} for all conditions. Unlike 'local' filters that act on a \code{tibble},
the galah implementations work by amending a query which is then enacted
by \code{collect()} or one of the \code{atlas_} family of functions (such as
\code{atlas_counts()} or \code{atlas_occurrences()}).
}
\details{
\emph{Syntax}

\code{filter()} uses non-standard evaluation
(NSE), and is designed to be as compatible as possible with
\code{dplyr::filter()} syntax. Permissible examples include:
\itemize{
\item \code{==} (e.g. \code{year = 2020}) but not \code{=} (for consistency with \code{dplyr})
\item \code{!=}, e.g. \code{year != 2020})
\item \code{>} or \code{>=} (e.g. \code{year >= 2020})
\item \code{<} or \code{<=} (e.g. \code{year <= 2020})
\item \code{OR} statements (e.g. \code{year == 2018 | year == 2020})
\item \code{AND} statements (e.g. \code{year >= 2000 & year <= 2020})
}

Some general tips:
\itemize{
\item Separating statements with a comma is equivalent to an \code{AND} statement;
Ergo \code{filter(year >= 2010 & year < 2020)} is the same as
\verb{_filter(year >= 2010, year < 2020)}.
\item All statements must include the field name; so
\code{filter(year == 2010 | year == 2021)} works, as does
\code{filter(year == c(2010, 2021))}, but \code{filter(year == 2010 | 2021)}
fails.
\item It is possible to use an object to specify required values, e.g.
\verb{year_value <- 2010; filter(year > year_value)}.
\item \code{solr} supports range queries on text as well as numbers; so
\code{filter(cl22 >= "Tasmania")} is valid.
\item It is possible to filter by 'assertions', which are statements about data
validity, such as \verb{filter(assertions != c("INVALID_SCIENTIFIC_NAME", "COORDINATE_INVALID")}.
Valid assertions can be found using \code{show_all(assertions)}.
}

\emph{Exceptions}

When querying occurrences, species, or their respective counts (i.e. all of
the above examples), field names are checked internally against
\code{show_all(fields)}. There are some cases where bespoke field names are
required, as follows.

When requesting a data download from a DOI, the field \code{doi} is valid, i.e.:
\preformatted{galah_call() |> 
  filter(doi = "a-long-doi-string") |> 
  collect()}

For taxonomic metadata, the \code{taxa} field is valid:
\preformatted{request_metadata() |> 
  filter(taxa == "Chordata") |> 
  unnest()}

For building taxonomic trees, the \code{rank} field is valid:
\preformatted{request_data() |>
  identify("Chordata") |>
  filter(rank == "class") |>
  atlas_taxonomy()}

Media queries are more involved, but break two rules: they accept the \code{media}
field, and they accept a tibble on the rhs of the equation. For example,
users wishing to break down media queries into their respective API calls
should begin with an occurrence query:

\preformatted{occurrences <- galah_call() |> 
   identify("Litoria peronii) |> 
   select(group = c("basic", "media") |> 
   collect()}

They can then use the \code{media} field to request media metadata:
\preformatted{media_metadata <- request_metadata |>
  filter(media == occurrences) |>
  collect()}

And finally, the metadata tibble can be used to request files:
\preformatted{request_files() |>
  filter(media == media_metadata) |>
  collect()}
}
\examples{
\dontrun{
# basic example
galah_call() |>
  filter(year >= 2019,
         basisOfRecord == "HumanObservation") |>
  count() |>
  collect()

# Field names can be parsed from objects using `{{}}` syntax, e.g. 
field <- "year"
value <- "2025"
galah_call() |> 
  filter({{field}} == value) |>
  count() |>
  collect()
}
}
\seealso{
\code{\link[=select.data_request]{select()}},
\code{\link[=group_by.data_request]{group_by()}} and \code{\link[=geolocate]{geolocate()}} for
other ways to amend the information returned by \code{\link[=atlas_]{atlas_()}} functions. Use
\code{search_all(fields)} to find fields that you can filter by, and
\code{\link[=show_values]{show_values()}} to find what values of those filters are available.
}
