The pneumatic system is a sensitive centerpiece of every truck. Being responsible to generate pressurized air, that is utilized for functions such as braking and gear changes, a defect can cause dramatic and cost-intensive consequences. This also results in high maintenance costs for the pneumatic system with the purpose to prevent those defects.
A lot of sensors register the current conditions for numerous components in the trucks. The aim of the project was to utilize this data in a machine learning model to prevent breakdowns, caused by the pneumatic system. The model was supposed to predict, if a failure of the pneumatic system will happen, to initialize the correct maintenance measures in time. With a successful implementation a high amount of cost savings can be realized.
The training data set contained sensor data of 60,000 trucks, including 1,000 trucks that have been labelled with a defect of the pneumatic system. The test data set contained sensor data of 16,000 trucks, where a defect of the pneumatic system had to be predicted. For each truck 171 features, received by the sensors, were recorded.
Challenges & Solutions
The datasets contained many missing values in their features. Different types of wrong predictions have different weightings. Predicting an pneumatic system not to be broken, when it actually is (“False-Negatives”) can lead to a breakdown on the road. When an pneumatic system is predicted as broken and it is not (“False-Positives”) costs arise due to unnecessary check ups. Comparing the costs produced by a breakdown to the costs of an unnecessary check up the breakdown costs are way more expensive, so the algorithm had to prioritize while it was trained.
The proportions between positive (broken pneumatic system) and negative (working pneumatic system) label (1,000 : 59,000) were unbalanced. To achieve a high precision on the True-Positives and to reduce the False-Negatives we needed to balance the training data and to adapt the algorithm. As the values of the features had a wide range and include very high numbers, a normalization method was performed. Furthermore, bootstrapping and class weightings were applied.
The models considered were Logistic Regression, Random Forest Classifier, Support Vector Classifier, Quadratic Discriminant Analysis and Neural Networks. The best performing algorithm to decrease the number of missed detections was the Quadratic Discriminant Analysis.
The trained model reached a good precision to predict the failures (92.3%) and had a decent number of false alarms (3.8%). Since the false alarms are not too harmful the results are highly acceptable and provide a good model, that is ready to use. It is not intended to replace the mechanics with their technical knowledge, but it is meant to assist them and discover cases that would probably have caused breakdowns, if left undiscovered.