% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/bias.R
\name{bias}
\alias{bias}
\title{Bias / Average Residuals}
\usage{
bias(
  X,
  resid,
  w = NULL,
  x_name = "x",
  breaks = "Sturges",
  right = TRUE,
  discrete_m = 13L,
  outlier_iqr = 2,
  seed = NULL,
  ...
)
}
\arguments{
\item{X}{A vector, matrix, or data.frame with features.}

\item{resid}{A numeric vector of residuals, i.e., y - pred.}

\item{w}{An optional numeric vector of weights. Having observations with
non-positive weight is equivalent to excluding them.}

\item{x_name}{If \code{X} is a vector: what is the name of the variable? By default "x".}

\item{breaks}{An integer, vector, or "Sturges" (the default) used to determine
bin breaks of continuous features. Values outside the total bin range are placed
in the outmost bins. To allow varying values of \code{breaks} across features,
\code{breaks} can be a list of the same length as \code{v}, or a \emph{named} list with breaks
for certain variables.}

\item{right}{Should bins be right-closed? The default is \code{TRUE}.
Vectorized over \code{v}. Only relevant for continuous features.}

\item{discrete_m}{Numeric features with up to this number of unique values should not
be binned but rather treated as discrete. The default is 13. Vectorized over \code{v}.}

\item{outlier_iqr}{If \code{breaks} is an integer or "Sturges", the breaks of a continuous
feature are calculated without taking into account feature values outside
quartiles +- \code{outlier_iqr} * IQR (where <= 9997 values are used to calculate the
quartiles). To let the breaks cover the full data range, set \code{outlier_iqr} to
0 or \code{Inf}. Vectorized over \code{v}.}

\item{seed}{Optional integer random seed used for calculating breaks:
The bin range is determined without values outside quartiles +- 2 IQR
using a sample of <= 9997 observations to calculate quartiles.}

\item{...}{Currently unused.}
}
\value{
A list (of class "EffectData") with a data.frame per feature having columns:
\itemize{
\item \code{bin_mid}: Bin mid points. In the plots, the bars are centered around these.
\item \code{bin_width}: Absolute width of the bin. In the plots, these equal the bar widths.
\item \code{bin_mean}: For continuous features, the (possibly weighted) average feature
value within bin. For discrete features equivalent to \code{bin_mid}.
\item \code{N}: The number of observations within bin.
\item \code{weight}: The weight sum within bin. When \code{w = NULL}, equivalent to \code{N}.
\item Different statistics, depending on the function call.
}

Use single bracket subsetting to select part of the output. Note that each
data.frame contains an attribute "discrete" with the information whether the
feature is discrete or continuous. This attribute might be lost when you manually
modify the data.frames.
}
\description{
Calculates average residuals (= bias) over the values of one or multiple
features specified by \code{X}.
}
\details{
The function is a convenience wrapper around \code{\link[=feature_effects]{feature_effects()}}.
}
\examples{
fit <- lm(Sepal.Length ~ ., data = iris)
M <- bias(iris[2:5], resid = fit$residuals, breaks = 5)
M |> update(sort_by = "resid_mean") |> plot(share_y = "all")
}
\seealso{
\code{\link[=feature_effects]{feature_effects()}}
}
