The R package betareg provides:
Beta regression for modeling beta-distributed dependent variables on the open unit interval (0, 1), e.g., rates and proportions, see Cribari-Neto and Zeileis (2010, doi:10.18637/jss.v034.i02).
Extended-support beta regression models for variables on the closed unit interval [0, 1] with boundary observations at 0 and/or 1, see Kosmidis and Zeileis (2024, doi:10.48550/arXiv.2409.07233).
Alternative specifications of the classical beta regression model: Bias-corrected and bias-reduced estimation, finite mixture models, and recursive partitioning for (0, 1) beta regression, see Grün, Kosmidis, and Zeileis (2012, doi:10.18637/jss.v048.i11).
The stable version of betareg
is available on CRAN:
install.packages("betareg")
The latest development version can be installed from R-universe:
install.packages("betareg", repos = "https://zeileis.R-universe.dev")
The package is available under the General Public License version 3 or version 2
A nice first illustration of beta regression is the analysis of reading accuracy scores from primary school children from Smithson & Verkuilen (2006). Package and data can be loaded via:
library("betareg")
data("ReadingSkills", package = "betareg")
The reading accuracy
was scaled to be within (0, 1). Its
mean is explained by verbal iq
score with separate lines by
dyslexia
(control vs. dyslexic). The precision parameter is
explained by main effects of the two explanatory variables. More details
are provided in ?ReadingSkills
.
<- betareg(accuracy ~ dyslexia * iq | dyslexia + iq, data = ReadingSkills)
br summary(br)
##
## Call:
## betareg(formula = accuracy ~ dyslexia * iq | dyslexia + iq, data = ReadingSkills)
##
## Quantile residuals:
## Min 1Q Median 3Q Max
## -2.3625 -0.5872 0.3026 0.9425 1.5874
##
## Coefficients (mean model with logit link):
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.1232 0.1428 7.864 3.73e-15 ***
## dyslexia -0.7416 0.1428 -5.195 2.04e-07 ***
## iq 0.4864 0.1331 3.653 0.000259 ***
## dyslexia:iq -0.5813 0.1327 -4.381 1.18e-05 ***
##
## Phi coefficients (precision model with log link):
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 3.3044 0.2227 14.835 < 2e-16 ***
## dyslexia 1.7466 0.2623 6.658 2.77e-11 ***
## iq 1.2291 0.2672 4.600 4.23e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Type of estimator: ML (maximum likelihood)
## Log-likelihood: 65.9 on 7 Df
## Pseudo R-squared: 0.5756
## Number of iterations: 25 (BFGS) + 1 (Fisher scoring)
The regression summary shows that accuracy
increases
with iq
for the control group but not for the dyslexic
group (even slightly decreases). This can be brought out more clearly
graphically. This also highlights that the model employs a logit link so
that the fitted curves always remain within (0, 1).
<- palette.colors()[c(4, 8)]
pal <- c(19, 17)
pch plot(accuracy ~ iq, data = ReadingSkills, col = pal[dyslexia], pch = pch[dyslexia])
<- -30:30/10
iq lines(iq, predict(br, newdata = data.frame(dyslexia = "no", iq = iq)), col = pal[1], lwd = 2)
lines(iq, predict(br, newdata = data.frame(dyslexia = "yes", iq = iq)), col = pal[2], lwd = 2)
legend("topleft", c("Control", "Dyslexic"), pch = pch, col = pal, bty = "n")
For going beyond this basic analysis the following extensions can be considered.
To analyze the original accuracy scores in [0, 1] (without scaling
the perfect scores of 1
to 0.99
), use the
variable accuracy1
(instead of accuracy
) in
the code above. The betareg()
model then automatically
estimates an additional exceedence parameter that accounts for the
boundary probability of a perfect score. For this data set, most
coefficients shrink a bit, rendering some coefficients only weakly
significant but the qualitative interpretations still remain
similar.
betareg(accuracy1 ~ dyslexia * iq | dyslexia + iq, data = ReadingSkills)
See Kosmidis and Zeileis (2024) and the documentation of
betareg()
for more details.
Bias-reduced estimation (instead of the default maximum likelihood
estimation) can be used by adding the argument type = "BR"
in betareg()
. This slightly shrinks all coefficient
estimates but, on this data, leads to qualitatively identical
results.
betareg(accuracy ~ dyslexia * iq | dyslexia + iq, data = ReadingSkills, type = "BR")
See Grün et al. (2012) and the documentation of
betareg()
for more details.
To find subgroups in a beta regression by recursively splitting
subsamples, beta regression trees can be used. Here, this strategy can
be used to figure out the different iq
effects by
dyslexia
rather than fixing the variables’ interaction in
advance.
betatree(accuracy ~ iq | iq, ~ dyslexia + ..., data = ReadingSkills, minsize = 10)
See Grün et al. (2012) and the documentation of
betatree()
for more details.
To find clusters in a beta regression finite mixtures of beta
regressions can be used. Here, this technique can be employed to find
the different iq
effects in the data without even requiring
the dyslexia
information.
betamix(accuracy ~ iq, data = ReadingSkills, k = 3, ...)
See Grün et al. (2012) and the documentation of
betamix()
for more details.