% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/mean_dscore.R
\name{mean_dscore}
\alias{mean_dscore}
\title{Calculate the mean divergence scores across event reports}
\usage{
mean_dscore(data, group_var, variables, normalize = FALSE, plot = FALSE)
}
\arguments{
\item{data}{A data frame containing event report level data.}

\item{group_var}{A character string naming the column that uniquely identifies events (e.g., "event_id").}

\item{variables}{A character vector of column names to compute divergence scores for.}

\item{normalize}{Logical, indicating whether to normalize the scores by the total number of unique values for each variable.}

\item{plot}{Logical, indicating whether to return a ggplot object visualizing the scores.}
}
\value{
Either a tibble or a ggplot object, depending on the value of \code{plot}.
If \code{plot = FALSE}, returns a tibble with two columns:
\describe{
  \item{variable}{The name of each variable.}
  \item{dscore}{The mean divergence score or normalized score.}
}
If \code{plot = TRUE}, returns a lollipop-style plot showing divergence scores by variable.
}
\description{
This function calculates the mean divergence score for one or more variables grouped by an event identifier.
The divergence score captures how often values for a given variable differ across event reports describing the same event.
}
\details{
For each variable and event, the function computes the number of unique values reported, subtracts one, and averages these
values across all events. This reflects how much inconsistency exists across sources. Optionally, the scores can be
normalized by the total number of unique values observed for each variable across the dataset. The result is a long-format
dataframe showing which variables are most sensitive to aggregation. A plotting option is also available.
}
\examples{
df <- data.frame(
  event_id = c(1, 1, 2, 2, 3),
  country = c("US", "US", "UK", "UK", "CA"),
  actor1 = c("Actor A", "Actor B", "Actor B", "Actor C", "Actor D"),
  deaths_best = c(10, 20, 5, 15, 10)
)
mean_dscore(df, "event_id", c("country", "actor1", "deaths_best"), normalize = TRUE, plot = TRUE)
}
