LSTM-BASED-AUTO-BI-LSTM FOR REMAINING USEFUL LIFE (RUL) PREDICTION: THE FIRST ROUND OF TEST RESULTS

Abstract

The Remaining Useful Life (RUL) is one of the most critical indicators to detect a component's failure before it effectively occurs. It can be predicted by historical data or direct data extraction by adopting model-based, data-driven, or hybrid methodologies. Data-driven methods have mainly used Machine Learning (ML) approaches, despite several studies still pointing out different challenges in this sense. For instance, traditional ML methods cannot extract features directly from time series depending, in some cases, on the prior knowledge of the system. In this context, this work proposes a DL-based approach called LSTM-based-AUTO-Bi-LSTM. It ensembles an LSTM-based autoencoder to automatically perform feature engineering (instead of manually) with Bidirectional Long Short-Term Memory (Bi-LSTM) to predict RUL. We have tested the model using the Turbofan Engine Degradation Simulation Dataset (FD001), an open dataset. It was generated from the Commercial Modular Aero-Propulsion System Simulation (C-MAPSS) from the Prognostics Center of Excellence (PcoE), from the National Aeronautics and Space Administration (NASA). The objective is to release the first round of analytical results and statistical visualisations of the model application, which will guide us in future improvements.

1. INTRODUCTION

Cyber-Physical Systems (CPS), Internet of Things (IoT), Internet of Services (IoS), and Data Analytics have built Industry 4.0, which has improved manufacturing efficiency and helped industries face challenges such as economic, social, and environmental (Ruiz-Sarmiento et al., 2020) . Condition-Based Maintenance (CBM) performs machines and components' maintenance routines based on their needs, and Prognostics and Health Management (PHM) monitors components' wear evolution using indicators. PHM is a proactive way of implementing CBM by predicting the Remaining Useful Life (RUL), one of the most critical indicators to detect a component's failure before it effectively occurs (Wang et al., 2021; Huang et al., 2019; Wu et al., 2017; Kan et al., 2015) . RUL can be predicted by historical data or direct data extraction by adopting model-based, datadriven, or hybrid methodologies. Model-based methods are challenging, expensive, and timeconsuming to develop in complex equipment due to the need for prior system knowledge. Datadriven methods have mainly used Machine Learning (ML) approaches. They are less complex and expensive, more applicable and provide a suitable trade-off between complexity, cost, precision, and applicability (Cheng et al., 2021; Mrugalska, 2019; Li et al., 2019; Yang et al., 2016) , although they require large amounts of historical data for development (Liewald et al., 2022) Meanwhile, despite the increased use of ML to predict RUL, several studies have still pointed out different challenges in this sense (Huang et al., 2019) . For example, most ML methods' accuracy in predicting RUL largely depends on the feature extraction quality, and their performance is affected in the case of very complex systems with multiple components, multiple states, and a considerable amount of parameters (Zhao et al., 2021; Chen et al., 2019) . Moreover, the literature has also reported that most of these models do not consider operation conditions; the machines operate in different states, even on the same shop floor. It significantly impacts the degradation behaviour and raw sensor signals that may be non-stationary, nonlinear, and mixed with much noise (Liu et al., 2020a) . Finally, traditional ML methods cannot extract features directly from time series depending on the complex intermediate transformation and, in some cases, depending on the prior knowledge of the system (Cabrera et al., 2020) . To overcome several challenges and improve the accuracy of RUL prediction, there has been a prominent use of Deep Learning (DL) Methods, especially Recurrent Neural Networks (RNN) and Long Short-Term Memory (LSTM), besides other variations (Zhu et al., 2019; Li et al., 2020; Liu et al., 2020b) . They have emerged and achieved outstanding results in different areas due to their strong capacity to map the relationship between degradation paths and measured data. Also, these methods can learn feature representation automatically, such that it is not necessary to design features manually, eliminating the need for previous knowledge of the system (Zhu et al., 2019) . Finally, DL methods have a high capacity to deal with many complex data (Kong et al., 2019) [17] . Nonetheless, the literature reports some drawbacks, such as the data deficit issue, especially considering the varying operation conditions and the degradation mode of the components in practical industrial applications (Liu et al., 2020a) . In this context, Ferreira & Gonc ¸alves (2022),among other results, have mapped 14 challenges in using ML methods for RUL prediction and pointed out some approaches used in the literature to overcome these challenges. From this collection of approaches, it was possible to propose an architecture called LSTM-based-AUTO-Bi-LSTM, which ensembles an Autoencoder (Unsupervised/Reconstructive Learning Technique) with the DL method Bidirectional Long Short-Term Memory (Bi-LSTM). The autoencoder aims to perform feature engineering automatically (instead of manually). The Bi-LSTM aims to predict the RUL based on the outputs of the autoencoder. This type of ensembling is, at least, very few applied in the RUL prediction process. To test our model, we have explored the turbofan engine problem through the dataset gathered from PCoE/NASA. Therefore, this work aims to release the first round of analytical results and statistical visualisations of the model application. The remaining of this work is as follows. Section 2 describes the problem and the used dataset, and Section 3 introduces the LSTM-based-AUTO-Bi-LSTM architecture. Section 4 describes the experimental context, and Section 5 presents the results and compares them with the literature. Finally, Section 6 concludes this work by giving some directions for future works.

2. THE PROBLEM AND DATASET

2.1 THE PROBLEM PHM has been an essential topic in the industry for predicting the state of assets to avoid downtime and failures (NASA, 2022). In the aircraft industry, attempted maintenance is critical to ensure operation safety (Zheng et al., 2018) , besides increasing economic efficiency (Deng et al., 2019) . According to the International Air Transport Association (IATA), maintenance costs of the major aviation companies reached $15.57 billion between 2012 and 2016, which represented a growth of 3% (Kraus & Feuerriegel, 2019) . Turbofan engines, specifically, are responsible for about 30% of the failures in an aircraft, and in great-proportion accidents, these systems have been the root cause in 40% of the cases. Besides, propulsion device maintenance costs share about 40% of the full aircraft maintenance costs (Tang et al., 2021) . The main components of a turbofan engine include the fan, low-pressure compressor (LPL), high-pressure compressor (HPC), combustor, high-pressure turbine (HPT), and low-pressure turbine (LPT), and nozzle.

2.2. THE DATASET

The dataset was gathered from the Prognostics Center of Excellence -PCoE, from the National Aeronautics and Aerospace Administration (NASA). In this sense, the information provided in this subsection was retrieved from that source NASA (2022) and Saxena et al. (2008) . Engine degradation simulation was carried out using Commercial Modular Aero-Propulsion System Simulation (C-MAPSS). Four different datasets (FD001, FD002, FD003, and FD004) were simulated under various operational conditions. They comprised a range of values for three operating conditions -Altitude, from 0 to 42K ft., Mach Number, from 0 to 0.84, and Throttle Resolver Angle (TRA), from 20 to 100 -and fault modes -High-Pressure Compressor Degradation or/and Fan

