Data quality issues, and how to solve them

Data quality often prevents reliability engineers from developing optimal maintenance strategies, efficiently undertaking root cause analysis (RCA) and extracting value from their data. As a result, improvement projects require large amounts of time and budget to be spent as relevant data must be located (typically from multiple sources), the data format must be identified (which may vary over time and geo-locations), data must be validated, and finally, the data must be prepared for the data extraction process, extract-transform-load (ETL).

In the majority of cases, ETL is manual, small scale and can lead to many errors. Each step results in lost time and money. Scalability is also poor, indicating that it is not a sustainable process. This article highlights some of the data quality issues businesses face, and the steps which may be taken to solve them.

‍

1. Incomplete data

Notifications are raised by frontline workers, under time and safety constraints, leading to information that can be rushed, incomplete or lacking context (see bottom of image). Guidance on what is a good notification may not exist, or the opposite may be the case over-templated. In the latter case, where there is too much expectation on the workers, the template is typically ignored, or the worker spends time entering the least useful pieces of information.

However, context is vital. Information about the fault or symptoms which triggered the work order can provide engineers with direction, months or years later, when they are undertaking an RCA.

By demonstrating how machine learning software uses this data, frontline staff get to understand the long-term value of efficient data collection within notifications and work orders, quickly leading to improvements in data quality.

‍

2. Emerging work resulting in incorrect data

During maintenance operations, it is typical for additional work to arise. An example may be technicians noticing an unrelated part that requires maintenance (immediately or soon) and deciding to undertake that work straight away. This behaviour is opportunistic and usually the correct thing to do operationally – some extra work now to head off problems that were not planned for. It is important to ensure that this work is captured in the data. If not, then that work can be “lost” or extremely difficult to find during the RCA process.

‍

3. Insufficient data integrity control

One of the first departments to be notified of a work order is usually the supply function, who are tasked with sourcing parts, and ensuring they arrive in time for the maintenance. Data quality and its impact on reliability and RCA are not their key responsibilities. It may be years before an engineer is tasked with analysing the data, and by that point any ability to understand gaps in data is lost, greatly diminished, or will come at significant cost.

Solid data quality standards and process can be achieved at the frontline, as and when data is recorded. IronMan® supports this process by giving reliability engineers and technicians an up-to-date view of their data quality. Within a short period, behaviour and ownership of data quality can be improved.

‍

Contact OXMT.

‍