*Stephen Wade*

ðŸš§ Under construction ðŸš§

`literanger`

is an adaption of the `ranger`

R package for training and predicting from random forest models within
multiple imputation algorithms. `ranger`

is a fast
implementation of random forests (Breiman, 2001) or
recursive partitioning, particularly suited for high dimensional data
(Wright et al,
2017). `literanger`

enables random forests to be embedded
in the fully conditional specification framework for multiple imputation
known as â€˜Multiple Imputation via Chained Equationsâ€™ (Van Buuren,
2007).

Implementations of multiple imputation with random forests include:

`mice`

which uses random forests to predict in a similar fashion to Doove et al, (2014), i.e.Â for each observation, a draw is taken from the sample of all values that belong to the terminal node of a randomly drawn tree.`miceRanger`

and`missRanger`

which use predictive mean matching.

This package enables a minor variation on `mice`

â€™s use of
random forests. The prediction can be drawn from the in-bag samples in
the terminal node for each *missing* data point. Thus, the
computational effort during prediction then scales with the number of
missing values, rather than with the product of the size of the whole
dataset and the number of trees (as in `mice`

).

A more general advantage of this package is re-cycling of the trained
forest object and the separation of the (training) data from the forest,
see `ranger`

issue #304.

A multiple imputation algorithm using this package is under
development: called `mimputest`

.

```
require(literanger)
<- sample(nrow(iris), 2/3 * nrow(iris))
train_idx <- iris[ train_idx, ]
iris_train <- iris[-train_idx, ]
iris_test <- train(data=iris_train, response_name="Species")
rf_iris <- predict(rf_iris, newdata=iris_test,
pred_iris_bagged prediction_type="bagged")
<- predict(rf_iris, newdata=iris_test,
pred_iris_inbag prediction_type="inbag")
# compare bagged vs actual test values
table(iris_test$Species, pred_iris_bagged$values)
# compare bagged prediction vs in-bag draw
table(pred_iris_bagged$values, pred_iris_inbag$values)
```

Installation is easy using `devtools`

:

```
library(devtools)
install_github('stephematician/literanger')
```

The `cpp11`

package is also required, available on CRAN:

`install.packages('cpp11')`

Not exhaustive:

~~prediction type: terminal nodes for every tree (e.g.Â for mice algorithm);~~~~finish documentation, e.g.Â this README~~;- prepare CRAN submission;
- implement variable importance measures;
- probability and survival forests.

Breiman, L. (2001). Random forests. *Machine learning*, 45,
pp.Â 5-32. doi:10.1023/A:1010933404324.

Doove, L.L., Van Buuren, S. and Dusseldorp, E., 2014. Recursive
partitioning for missing data imputation in the presence of interaction
effects. *Computational Statistics & Data Analysis*, 72,
pp.Â 92-104. doi:10.1016/j.csda.2013.10.025.

Van Buuren, S. 2007. Multiple imputation of discrete and continuous
data by fully conditional specification. *Statistical Methods in
Medical Research*, 16(3), pp.Â 219-242. doi:10.1177/0962280206074463.

Wright, M. N. and Ziegler, A., 2017. ranger: A fast implementation of
random forests for high dimensional data in C++ and R. *Journal of
Statistical Software*, 77(i01), pp.Â 1-17. doi:10.18637/jss.v077.i01.