Splitting the dataset

Choonghyun Ryu

2026-01-11

Preface

To develop a classification model, the original data must be divided into train data set and test data set. You should do the following:

Cleansing the dataset
Split the data into a train set and a test set
- Split the data.frame or tbl_df into a train set and a test set
- Compare dataset
  - Comparison of categorical variables
  - Comparison of numeric variables
  - Diagnosis of train set and test set
- Extract train/test dataset
  - Extract train set or test set
  - Extract the data to fit the model
Modeling and Evaluate, Predict

The alookr package makes these steps fast and easy:

How to perform split the data

For information on how to perform split the data into a train set and a test set, refer to the following website.

Splitting the dataset