EnTraineR

EnTraineR logo

An intelligent teaching assistant based on LLMs to help interpret statistical model outputs in R.
EnTraineR builds audience-aware prompts (beginner, applied, advanced) that never invent numbers: it passes verbatim outputs from R and instructs how to explain them.

Works out of the box to produce high-quality prompts.
Optionally, you can connect your own LLM backend (via your functions built on top of trainer_core_generate_or_return()).

Installation

From GitHub:

# install.packages("remotes")
remotes::install_github("Sebastien-Le/EnTraineR")

Optional but recommended packages for examples: - FactoMineR, SensoMineR (model objects used in examples) - stringr (to squish multi-line intros)

What it does

Included datasets

The package ships 3 small datasets for teaching:

These datasets are the intellectual property of L’Institut Agro Rennes Angers and are used for the “Statistical Approach” course module.

data(deforestation); str(deforestation)
data(ham); summary(ham)
data(poussin); with(poussin, table(Temperature, Gender))

Quick start

1) ANOVA (AovSum)

# install.packages("SensoMineR")
library(SensoMineR)
data(chocolates)

# Build AovSum (example similar to chocolates::Granular ~ Product*Panelist)
res <- AovSum(Granular ~ Product*Panelist, data = sensochoc)

intro <- "Six chocolates have been evaluated by a sensory panel, 
  according to a sensory attribute: granular.
  The panel has been trained according to this attribute
  and panellists should be reproducible when rating this attribute."
intro <- gsub("\n", " ", intro)
intro <- stringr::str_squish(intro)

p <- trainer_AovSum(
  aovsum_obj   = res,
  audience     = "applied",
  t_test       = c("Product", "Panelist"),  # filter T-test section
  introduction = intro
)

cat(p)   # a ready-to-use prompt for an LLM or for teaching

2) Linear model (FactoMineR::LinearModel)

# install.packages("FactoMineR"); install.packages("stringr")
library(FactoMineR)

intro_ham <- "Can we predict ham overall liking from its sensory profile?"
intro_ham <- stringr::str_squish(gsub("\n", " ", intro_ham))

fit <- LinearModel(`Overall liking` ~ ., data = ham, selection = "bic")

pr <- trainer_LinearModel(
  lm_obj       = fit,
  introduction = intro_ham,
  audience     = "advanced"
)

cat(pr)

Another linear model with interaction and a categorical factor:

fit2 <- LinearModel(Temp_water ~ Temp_air * Deforestation,
                    data = deforestation, selection = "none")

pr2 <- trainer_LinearModel(
  lm_obj       = fit2,
  introduction = "Effect of deforestation on the air-water temperature link.",
  audience     = "beginner"
)

cat(pr2)

3) Classical tests

t-test:

tt <- t.test(rnorm(20, 0.1), mu = 0)
cat(trainer_t_test(tt, audience = "beginner"))

Variance F-test:

vt <- var.test(rnorm(25, sd = 1.0), rnorm(30, sd = 1.3))
cat(trainer_var_test(vt, audience = "applied"))

Proportion test:

pt <- prop.test(x = c(42, 35), n = c(100, 90))
cat(trainer_prop_test(pt, audience = "advanced", summary_only = TRUE))

Correlation test:

set.seed(1)
x <- rnorm(30); y <- 0.5 * x + rnorm(30, sd = 0.8)
ct <- cor.test(x, y, method = "pearson")
cat(trainer_cor_test(ct, audience = "applied"))

Chi-squared test:

m <- matrix(c(10, 20, 30, 40), nrow = 2)
cx <- chisq.test(m, correct = TRUE)
cat(trainer_chisq_test(cx, audience = "beginner"))

Using Gemini from R (optional)

gemini_generate() lets you send a prompt to Google Gemini and get the response back as text.

# 1) Set your API key once per session (or in .Renviron)
Sys.setenv(GEMINI_API_KEY = "your_key_here")

# 2) Send a prompt
txt <- gemini_generate(
  prompt      = "Say hello in one short sentence.",
  model       = "gemini-2.5-flash",   # accepts "gemini-2.5-flash" or "models/gemini-2.5-flash"
  temperature = 0.2,
  user_agent  = "EnTraineR/0.9.0 (https://github.com/Sebastien-Le/EnTraineR)"
)
cat(txt)

Audience profiles (summary)

All prompts emphasize: do not invent numbers; use only what appears in the printed output.

Reproducibility and LLMs

By default, trainers return a prompt string (i.e., generate = FALSE).
If you have a generator backend, you can pass generate = TRUE and a llm_model name; implement your own trainer_core_generate_or_return() to call your LLM API.

Contributing

Issues and pull requests are welcome. Please: - Keep code ASCII and roxygen2-ready. - Add tests and examples where relevant. - Follow the audience style guidelines.

License and citation

See the DESCRIPTION file for license terms.
If EnTraineR helps your teaching or analyses, starring the repo is appreciated.

Acknowledgments

Thanks to the R community and the authors of FactoMineR and SensoMineR for inspiring teaching tools and example datasets used in demonstrations.