Fig. 1 Overview of the proposed scheme
Fig. 2 View of the Webpage
- Driving Behavior Dataset
- Dataset Paper: I. Cojocaru and P. Popescu (2022). Building a Driving Behaviour Dataset. Proceedings of RoCHI 2022.
- I have used Normal and Aggressive Class for this dataset, so the experiment in this repository is a binary classification task.
- Below is the distribution of train dataset and test dataset.
Train Dataset |
Test Dataset |
![]() |
![]() |
Table 1. Distrubtion of train dataset and test dataset
- In the original dataset, there are 6 variables - Acceleration (X, Y, Z axis in meters per second squared (m/s2)) and Rotation (X, Y, Z axis in degrees per second (°/s)).
- Beyond the existing variables, I have added some features that can be calculated by using existing variables.
$\text{AccMagnitude} = \sqrt{\text{AccX}^2 + \text{AccY}^2 + \text{AccZ}^2}$ - The overall magnitude of 3-axis acceleration
$\text{RotMagnitude} = \sqrt{\text{RotX}^2 + \text{RotY}^2 + \text{RotZ}^2}$ - The overall magnitude of 3-axis rotational velocity
$\text{JerkX} = \frac{d(\text{AccX})}{dt}$ $\text{JerkY} = \frac{d(\text{AccY})}{dt}$ $\text{JerkZ} = \frac{d(\text{AccZ})}{dt}$ $\text{JerkMagnitude} = \sqrt{\text{JerkX}^2 + \text{JerkY}^2 + \text{JerkZ}^2}$ - The rate of change of acceleration over time
- Sudden changes in acceleration can indicate aggressive driving.
- Using Optuna to optimize hyperparameters of the predictive model
Model | w/o Feature Engineering (Original Data) | w/ Feature Engineering (Our Scheme) | ||||||
---|---|---|---|---|---|---|---|---|
Precision | Recall | F1 Score | Accuracy | Precision | Recall | F1 Score | Accuracy | |
CNN-LSTM | 0.7536 | 0.6767 | 0.6652 | 0.7000 | 0.7333 | 0.7091 | 0.7093 | 0.7222 |
ConvLSTM | 0.7091 | 0.7003 | 0.6875 | 0.6889 | 0.7111 | 0.7128 | 0.7105 | 0.7111 |
Transformer | 0.7039 | 0.7046 | 0.7000 | 0.7000 | 0.7407 | 0.7330 | 0.7214 | 0.7222 |
Table 2. Comparison of the performance of forecasting models - a window of instances classification
Model | w/o Feature Engineering (Original Data) | w/ Feature Engineering (Our Scheme) | ||||||
---|---|---|---|---|---|---|---|---|
Precision | Recall | F1 Score | Accuracy | Precision | Recall | F1 Score | Accuracy | |
Logistic Regression | 0.5602 | 0.5572 | 0.5557 | 0.5691 | 0.5930 | 0.5901 | 0.5900 | 0.5994 |
MLP Classifier | 0.5915 | 0.5878 | 0.5874 | 0.5983 | 0.5986 | 0.5989 | 0.5987 | 0.6022 |
K-Neighbors Classifier | 0.5549 | 0.5546 | 0.5547 | 0.5602 | 0.5700 | 0.5680 | 0.5677 | 0.5773 |
SGD Classifier | 0.5503 | 0.5483 | 0.5472 | 0.5591 | 0.5926 | 0.5813 | 0.5761 | 0.5989 |
Random Forest | 0.5520 | 0.5523 | 0.5520 | 0.5552 | 0.5671 | 0.5675 | 0.5671 | 0.5702 |
Decision Tree | 0.5376 | 0.5380 | 0.5361 | 0.5370 | 0.5398 | 0.5400 | 0.5396 | 0.5425 |
Gaussan NB | 0.5893 | 0.5767 | 0.5699 | 0.5956 | 0.5949 | 0.5845 | 0.5882 | 0.6011 |
AdaBoost | 0.5848 | 0.5798 | 0.5784 | 0.5923 | 0.5885 | 0.5869 | 0.5871 | 0.5945 |
Gradient Boosting | 0.5838 | 0.5799 | 0.5790 | 0.5912 | 0.5861 | 0.5856 | 0.5858 | 0.5912 |
XGBoost | 0.5421 | 0.5423 | 0.5420 | 0.5453 | 0.5787 | 0.5794 | 0.5785 | 0.5807 |
CatBoost | 0.5836 | 0.5808 | 0.5805 | 0.5906 | 0.5946 | 0.5941 | 0.5942 | 0.5994 |
LightGBM | 0.5622 | 0.5621 | 0.5621 | 0.5669 | 0.5989 | 0.5991 | 0.5990 | 0.6028 |
Table 3. Comparison of the performance of forecasting models - one instance classification
Fig. 3 The change in driving behavior over time (Upper 50 data)
Fig. 4 Distribution of prediction values
- We can also consider other techniques to improve the performance of the predictive models, such as data augmentation (e.g., CTGAN and TVAE) and changing the loss function (e.g., focal loss and class-balanced loss). However, since we have focused on MLOps, we did not consider these techniques. We will address methods to solve the data imbalance problem soon.
- Coursera - Structuring Machine Learning Projects (February 02, 2022)
- Coursera - Introduction to Machine Learning in Production (September 21, 2021)
- [1] I. Cojocaru and P. Popescu (2022). Building a Driving Behaviour Dataset. Proceedings of RoCHI 2022.
- [2] R. Shwartz-Ziv and A. Armon (2022). Tabular data: Deep learning is not all you need. Information Fusion, vol. 81, pp. 84-90.
- [3] L. Grinstajn et al. (2022). Why do tree-based models still outperform deep learning on tabular data?. 36th Conference on Neural Information Processing Systems (NeurIPS 2022) Track on Datasets and Benchmarks.
- [4] D. Kreuzberger et al. (2023). Machine Learning Operations (MLOps): Overview, Definition, and Architecture. IEEE Access, vol. 11, pp. 31866-31879.