collinear 3.0.2

The preference_order dataframe returned by collinear() now shows the correct metric name instead of “custom” when a known f function is used.

The formula environment generated by model_formula() is now a minimal two-binding environment (poly, s) with baseenv() as parent, rather than the full caller frame. This prevents large objects (e.g. the df dataframe and intermediate variables) from being serialized when a fitted model is saved with saveRDS().

Improved example of the function collinear().

collinear 3.0.1

Bug Fixes

preference_order(): Namespace-qualified function names (e.g., collinear::f_count_gam) are now correctly recognised as package functions instead of being labelled metric = "custom". Fixed by stripping the namespace prefix before name lookup in R/preference_order.R.
preference_order(): NA values in the response variable (including those converted from Inf/-Inf/NaN) are now handled correctly. Fixed by wrapping the (y, x) data frame construction in stats::na.omit() in R/preference_order.R.
model_formula(): No longer crashes on sf spatial data frames with "default method not implemented for type 'list'". Fixed by calling drop_geometry_column() early in R/model_formula.R before variable-type detection.
score_auc(): No longer crashes when o or p contain NA. Fixed by removing incomplete cases before computing ones/zeros in R/score_auc.R.

Data Changes

Example datasets (vi, vi_smol, vi_predictors, vi_predictors_numeric, vi_predictors_categorical, vi_responses) have been moved to the package spatialData. All function examples and README code now load data via data(..., package = "spatialData").

Other Changes

The package now prints its version on attach.

collinear 3.0.0

Breaking Changes

API Changes

Argument response renamed to responses: Now accepts multiple response variables. Functions affected: collinear(), preference_order(), and related validation functions.
Argument encoding_method defaults to NULL in collinear(): Target encoding is now opt-in rather than automatic. Previously defaulted to "mean".
Default values changed for max_cor and max_vif: Both now default to NULL, triggering adaptive threshold computation based on the correlation structure of the data.
Output structure changed for collinear(): Now returns a list of class collinear_output containing sub-lists of class collinear_selection, each with response, df, preference_order, selection, and formulas slots. Previously returned a character vector or named list of character vectors.

Renamed Functions

Old Name (v2.0)	New Name (v3.0)
`identify_predictors()`	Split into `identify_valid_variables()`, `identify_numeric_variables()`, `identify_categorical_variables()`, `identify_logical_variables()`
`identify_predictors_categorical()`	`identify_categorical_variables()`
`identify_predictors_numeric()`	`identify_numeric_variables()`
`identify_predictors_zero_variance()`	`identify_zero_variance_variables()`
`identify_predictors_type()`	Removed (merged into `identify_valid_variables()`)

Renamed `f_` Functions for Preference Order

Old Name (v2.0)	New Name (v3.0)
`f_r2_glm_gaussian()`	`f_numeric_glm()`
`f_r2_gam_gaussian()`	`f_numeric_gam()`
`f_r2_rf()`	`f_numeric_rf()`
`f_r2_glm_poisson()`	`f_count_glm()`
`f_r2_gam_poisson()`	`f_count_gam()`
`f_auc_glm_binomial()`	`f_binomial_glm()`
`f_auc_gam_binomial()`	`f_binomial_gam()`
`f_auc_rf_binomial()`	`f_binomial_rf()`
`f_v_rf()`	`f_categorical_rf()`
—	`f_count_rf()` (new)

Major New Features

Adaptive Multicollinearity Thresholds

When both max_cor = NULL and max_vif = NULL, the function now automatically determines optimal filtering thresholds using:

The 75th percentile of pairwise correlations as input
A sigmoid transformation that smoothly transitions between. conservative (VIF ≈ 2.5) and permissive (VIF ≈ 7.5) thresholds.
A GAM model (gam_cor_to_vif) mapping correlation thresholds to equivalent VIF values.

This data-driven approach adapts to each dataset’s correlation structure, preventing over-filtering while maintaining statistically meaningful bounds.

Tidymodels Integration

New step_collinear(): Recipe step for multicollinearity filtering in tidymodels workflows.
Implements proper prep() and bake() methods following recipes architecture.

Cross-Validation Support in Preference Order

New arguments cv_training_fraction and cv_iterations in preference_order() and passed through collinear().
Enables robust predictor ranking through repeated train/test splits.

Rich Output Structure

collinear() now returns comprehensive results including:

Filtered dataframe with response and selected predictors.
Preference order dataframe with rankings.
Ready-to-use model formulas (linear, smooth/GAM, classification).

S3 methods print() and summary() for collinear_output and collinear_selection classes provide clean output formatting.

Correlation Matrix Improvements

cor_matrix() now returns signed correlations, preserving the positive semi-definite property required for VIF calculations.
Absolute values applied only when comparing against max_cor thresholds.
Fixes numerical instability that could produce negative VIF scores.

New Functions

Multicollinearity Assessment

collinear_stats(): Compute summary statistics for both correlation and VIF.
cor_stats(): Summary statistics for pairwise correlations.
vif_stats(): Summary statistics for variance inflation factors.

Preference Order

f_count_rf(): Score integer count predictors with random forest.

S3 Methods

print.collinear_output()
print.collinear_selection()
summary.collinear_output()
summary.collinear_selection()

New Datasets and Models

Name	Description
`experiment_adaptive_thresholds`	Validation experiment results (10,000 iterations)
`experiment_cor_vs_vif`	Correlation vs VIF equivalence experiment results
`gam_cor_to_vif`	Fitted GAM for mapping `max_cor` to `max_vif`
`prediction_cor_to_vif`	Look-up table for threshold equivalence
`toy`	Simple dataset illustrating multicollinearity concepts
`vi_smol`	Smaller version of `vi` dataset (610 rows) for faster examples
`vi_responses`	Character vector of response variable names

Improvements

VIF Computation

Ridge regularization fallback for near-singular matrices.
Improved tolerance calculation for solve() to prevent false singularity detection.
VIF values exceeding 1M are now capped to Inf.

Validation

New validate_arg_*() functions provide consistent argument checking across the package.
Hierarchical function name tracking for clearer error messages.

Documentation

Comprehensive roxygen documentation with working examples.
@family tags for better cross-referencing.
@inheritSection for consistent documentation of shared concepts.

Bug Fixes

Fixed correlation matrix handling that destroyed positive semi-definite property when applying abs() before VIF computation.
Fixed edge cases in VIF computation for ill-conditioned matrices.
Proper handling of single-predictor cases across all functions.

Deprecated

Value "auto" in preference_order argument (ignored with message)

collinear 2.0.0

Main Improvements

Expanded Functionality: Functions collinear() and preference_order() support both categorical and numeric responses and predictors, and can handle several responses at once.
Robust Selection Algorithms: Enhanced selection in vif_select() and cor_select().
Enhanced Functionality to Rank Predictors: New functions to compute association between response and predictors covering most use-cases, and automated function selection depending on data features.
Simplified Target Encoding: Streamlined and parallelized for better efficiency, and new default is "loo" (leave-one-out).
Parallelization and Progress Bars: Utilizes future and progressr for enhanced performance and user experience.

collinear 1.1.1

Initial CRAN release
Basic multicollinearity filtering with collinear(), cor_select(), and vif_select()
Target encoding methods: mean, rank, leave-one-out
Preference order functionality
Support for mixed numeric and categorical predictors

collinear 3.0.2

collinear 3.0.1

Bug Fixes

Data Changes

Other Changes

collinear 3.0.0

Breaking Changes

API Changes

Renamed Functions

Renamed f_ Functions for Preference Order

Major New Features

Adaptive Multicollinearity Thresholds

Tidymodels Integration

Cross-Validation Support in Preference Order

Rich Output Structure

Correlation Matrix Improvements

New Functions

Multicollinearity Assessment

Preference Order

S3 Methods

New Datasets and Models

Improvements

VIF Computation

Validation

Documentation

Bug Fixes

Deprecated

collinear 2.0.0

Main Improvements

collinear 1.1.1

Renamed `f_` Functions for Preference Order