The datasets contained many missing values in their features. Different types of wrong predictions have different weightings. Predicting an pneumatic system not to be broken, when it actually is (“False-Negatives”) can lead to a breakdown on the road. When an pneumatic system is predicted as broken and it is not (“False-Positives”) costs arise due to unnecessary check ups.
Comparing the costs produced by a breakdown to the costs of an unnecessary check up the breakdown costs are way more expensive, so the algorithm had to prioritize while it was trained. The proportions between positive (broken pneumatic system) and negative (working pneumatic system) label (1,000 : 59,000) were unbalanced. To achieve a high precision on the True-Positives and to reduce the False-Negatives we needed to balance the training data and to adapt the algorithm. As the values of the features had a wide range and include very high numbers, a normalization method was performed. Furthermore, bootstrapping and class weightings were applied.
The models considered were Logistic Regression, Random Forest Classifier, Support Vector Classifier, Quadratic Discriminant Analysis and Neural Networks. The best performing algorithm to decrease the number of missed detections was the Quadratic Discriminant Analysis.