% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/taxa_translate.R
\name{taxa_translate}
\alias{taxa_translate}
\title{Taxa Translate}
\usage{
taxa_translate(
  df_user = NULL,
  df_official = NULL,
  df_official_metadata = NULL,
  taxaid_user = "TAXAID",
  taxaid_official_match = NULL,
  taxaid_official_project = NULL,
  taxaid_drop = NULL,
  col_drop = NULL,
  sum_n_taxa_boo = FALSE,
  sum_n_taxa_col = NULL,
  sum_n_taxa_group_by = NULL,
  trim_ws = FALSE,
  match_caps = FALSE
)
}
\arguments{
\item{df_user}{User taxa data}

\item{df_official}{Official project taxa data (master taxa list).}

\item{df_official_metadata}{Metadata for official project taxa data.
Default is NULL}

\item{taxaid_user}{Taxonomic identifier in user data.  Default is "TAXAID".}

\item{taxaid_official_match}{Taxonomic identifier in official data user to
match with user data.  This is not the project taxanomic identifier.}

\item{taxaid_official_project}{Taxonomic identifier in official data that is
specific to a project, e.g., after operational taxonomic unit (OTU) applied.}

\item{taxaid_drop}{Official taxonomic identifier that signals a record
should be dropped; e.g., DNI (Do Not Include) or -999.  Default = NULL}

\item{col_drop}{Columns to remove in output.  Default = NULL}

\item{sum_n_taxa_boo}{Boolean value for if the results should be summarized
Default = FALSE  DEPRECATED, values will be ignored}

\item{sum_n_taxa_col}{Column name for number of individuals for user data
when summarizing.  This column will be summed.
Default = NULL (suggestion = N_TAXA)
DEPRECATED, values will be ignored}

\item{sum_n_taxa_group_by}{Column names for user data to use for grouping the
data when summarizing the user data.  Suggestions are SAMPID and TAXA_ID.
Default = NULL
DEPRECATED, values will be ignored}

\item{trim_ws}{Boolean value for taxaid to have leading and trailing white
space removed.  Non-braking spaces (e.g., from ITIS) also removed (including
inside text).  Default = FALSE}

\item{match_caps}{Boolean value to match user and official TaxaIDs after
converting to ALL CAPS.  Default = FALSE}
}
\value{
A list with four elements.  The first (merge) is the user data frame
with additional columns from the official data appended to it.  Names from
the user data that overlap with the official data have the suffix '_User'.
The second element (nonmatch) of the list is a vector of the non-matching
taxa from the user data.  The third element (metadata) includes the
metadata for the official data (if provided).  The fourth element (unique) is
a data frame of the unique taxa names old and new.
}
\description{
Convert user taxa names to those in an official project based
name list.
}
\details{
Merges user file with official file.  The official file has
phylogeny, autecology, and other project specific fields.

The inputs for the function uses existing data frames (or tibbles).

Any fields that match between the user file and the official file the
official data column name have the 'official' version retained.

The 'col_drop' parameter can be used to remove unwanted columns; e.g.,
the other taxa id fields in the 'official' data file.

By default, taxa are not collapsed to the official taxaid.  That is, if
multiple taxa in a sample have the same name the rows will not be combined.
If collapsing is desired set the parameter `sum_n_taxa_boo` to TRUE.
Will also need to provide `sum_n_taxa_col` and `sum_n_taxa_group_by`.
This feature was DEPRECATED in v1.0.2.9040 (2024-06-12).  The parameters
will remain and could be reinstituted in a future version.

Slightly different than `qc_taxa` since no options in `taxa_translate` for
using one field over another and is more generic.

The parameter `taxaid_drop` is used to drop records that matched to a new
name that should not be included in the results.  Examples include "999" or
"DNI" (Do Not Include).  Default is NULL so no action is taken.  "NA"s are
always removed.

Optional parameter `trim_ws` is used to invoke the function `trimws` to
remove from the taxa matching field any leading and trailing white space.
Default is FALSE (no action).  All horizontal and vertical white space
characters are removed.  See ?trimws for additional information.
Additionally, non-breaking spaces (nbsp) inside the text string will be
replaced with a normal space.  This cuts down on the number of permutations
need to be added to the translation table.

Optional parameter `match_caps` is used to convert user and official taxaid
values to ALL CAPS before matching.  Any non-ascii characters will cause this
to fail.  A message is output to the console for any taxaid values that
contain non-ascii characters.  In the event that `match_caps` is set to TRUE
and non-ascii characters are present the matching will be done without
converting to upper case as this would cause the function to fail.

The taxa list and metadata file names will be added to the results as two
new columns.

Another output is the unique taxa with old and new names.
}
\examples{
# Example 1, PacNW
## Input Parameters
df_user <- BioMonTools::data_benthos_PacNW
fn_official <- file.path(system.file("extdata", package = "BioMonTools"),
                         "taxa_official",
                         "ORWA_TAXATRANSLATOR_20221219b.csv")
df_official <- read.csv(fn_official)
fn_official_metadata <- file.path(system.file("extdata",
                                              package = "BioMonTools"),
                                  "taxa_official",
                                  "ORWA_ATTRIBUTES_METADATA_20221117.csv")
df_official_metadata <- read.csv(fn_official_metadata)
taxaid_user <- "TaxaID"
taxaid_official_match <- "Taxon_orig"
taxaid_official_project <- "OTU_MTTI"
taxaid_drop <- "DNI"
col_drop <- c("Taxon_v2", "OTU_BCG_MariNW") # non desired ID cols in Official
sum_n_taxa_boo <- TRUE
sum_n_taxa_col <- "N_TAXA"
sum_n_taxa_group_by <- c("INDEX_NAME", "INDEX_CLASS", "SampleID", "TaxaID")
## Run Function

taxatrans <- taxa_translate(df_user,
                            df_official,
                            df_official_metadata,
                            taxaid_user,
                            taxaid_official_match,
                            taxaid_official_project,
                            taxaid_drop,
                            col_drop,
                            sum_n_taxa_boo,
                            sum_n_taxa_col,
                            sum_n_taxa_group_by)
## View Results
taxatrans$nonmatch


#~~~~~
# Example 2, Multiple Stages
# Create data
TAXAID <- c(rep("Agapetus", 3), rep("Zavrelimyia", 2))

N_TAXA <- c(rep(33, 3), rep(50, 2))
STAGE <- c("A", "L", "P", "X", "")
df_user <- data.frame(TAXAID, N_TAXA, STAGE)
df_user[, "INDEX_NAME"]  <- "BCG_MariNW_Bugs500ct"
df_user[, "INDEX_CLASS"] <- "HiGrad-HiElev"
df_user[, "SAMPLEID"]    <- "Test2023"
df_user[, "STATIONID"]   <- "Test"
df_user[, "DATE"]        <- "2023-01-16"
## Input Parameters
fn_official <- file.path(system.file("extdata", package = "BioMonTools"),
                         "taxa_official",
                         "ORWA_TAXATRANSLATOR_20221219b.csv")
df_official <- read.csv(fn_official)
fn_official_metadata <- file.path(system.file("extdata",
                                              package = "BioMonTools"),
                                  "taxa_official",
                                  "ORWA_ATTRIBUTES_20221212.csv")
df_official_metadata <- read.csv(fn_official_metadata)
taxaid_user <- "TAXAID"
taxaid_official_match <- "Taxon_orig"
taxaid_official_project <- "OTU_BCG_MariNW"
taxaid_drop <- NULL
col_drop <- c("Taxon_v2", "OTU_MTTI") # non desired ID cols in Official
sum_n_taxa_boo <- TRUE
sum_n_taxa_col <- "N_TAXA"
sum_n_taxa_group_by <- c("INDEX_NAME", "INDEX_CLASS", "SAMPLEID", "TAXAID")
## Run Function
taxatrans <- taxa_translate(df_user,
                            df_official,
                            df_official_metadata,
                            taxaid_user,
                            taxaid_official_match,
                            taxaid_official_project,
                            taxaid_drop,
                            col_drop,
                            sum_n_taxa_boo,
                            sum_n_taxa_col,
                            sum_n_taxa_group_by)
## View Results (before and after)
df_user
taxatrans$merge
}
