Defect Elimination

September 1, 2021

Defect elimination is an approach for improving reliability by eliminating common causes of failure. In the next pair of blogs we will cover two methods that could be employed in reaching this goal.

In an earlier blog, which you can find here, we discussed what the main influences of failure are with this diagram. This provides an important context for defect elimination.

Influences on failure

There is emerging empirical evidence that in some organisations up to 60% of the root causes of failure are derived from the Operating context, whether this be:

· Maintenance induced failure

· Operations induced failure (operating beyond the equipment limits)

· Problems with interfaces to machinery (for example, low-quality fuel or noisy electrical supplies)

The vital point to grasp is that failure causes rooted in mal-operations, maintenance inputs or outputs, or environmental issues, are within the purview of the operating company. They are best placed to identify and eliminate these issues.

The causes due to design and manufacture are usually relatively small as manufacturers would lose business if they did not focus on producing reliable equipment. Failures due to supply problems can be addressed using Warranty. If the operating environment is harsh and highly variable it can cause higher rates of failure, so this influence may be more significant.This doesn’t necessarily mean the solution is complex. For example, in a mining situation the quality of road surfaces can be actively maintained to reduce shaking and shock conditions to mobile assets leading to improved reliability.

Defect elimination can be approached with two approaches split between a top-down general approach focused on quality, and a bottom-up data-centric approach based on identifying critical machinery, quantifying their reliability. Then both methods apply Root-Cause Analysis (RCA).

This blog will cover the first, quality-oriented approach.


The quality approach is a good place to start if the organisation does not have a developed and mature data set, or analysis processes and reliability systems. All we need is to use the expert knowledge in our experienced front line maintainers. With this method we recognise the lack of quality of maintenance and operations as the largest cause of premature failure. We could aim to adopt total productive maintenance (TPM) or total quality maintenance (TQM) where we rely on delegating and empowering the frontline workforce (both maintainers and operators) to become responsible for continuous improvement. This is where they determine which problems are causing issues and decide how to solve them. The management and leadership functions support their teams with guidance and work to remove barriers for improvements.

If TQM is not employed, it should be possible to work with the front-line maintainers and determine which maintenance processes could be improved. Improvement projects could then be implemented. The roles of the managers or maintenance leaders should be conducting an analysis of the maintenance system to determine which working practices can be improved.

Metrics can be set that will determine the data that should be collected in order to show progress. There may be a delay of some months before seeing improvements in machinery reliability as there is a time lag between an improved maintenance or operating procedures and seeing better reliability. This time lag must be recognised, and improvements should be regarded as investment for the future.

Other initiatives such as adopting the 5S regime to improve cleanliness and tidiness in the workplace may also be initiated at the same time. Some workers may be trained in 5S or other simple problem solving and improvement processes so they can champion the change in working practices. These changes are driven by asking questions such as: are tools kept on shadow boards to check that they are not only available, but that they have not been inadvertently left inside machinery?


Questions to ask yourself in an analysis of the maintenance system, that constitute ‘vulnerabilities’ where maintenance induced failure could originate include:

1. What are your critical machines, which machines have the greatest impact on loss of production or product quality, and which take the most time and effort to recover?

2. How much maintenance is invasive? What maintenance requires machinery to be broken down and what containment boundaries (gas and fluid) need to be breached for maintenance? Is maintenance hygiene such as keeping dirt of foreign objects out of gas or fluid systems an area that could be improved?

3. In electronic systems, how many times are PCBs unnecessarily removed and replaced. Is anti-static hygiene consistently applied. Are PCBs swapped to find faults (willy-nilly board changing is bad practice as this may propagate faults to healthy PCBs)

4. What machines are critical in terms of alignment — decoupling and coupling of motors and driven equipment is critical. A slightly misaligned train of equipment can severely reduce life.

5. Are there issues with bedding machinery down and proper securing to bedplates that will expose the machinery to cyclic stresses? Could vibration surveys be done to check vibration as part of recommissioning machinery to capture and fix any issues?

6. How exposed are machines to environmental stressors? Could some simple modifications that reduces environmental factors help reduce their severity?

7. Are there maintenance jobs where spatial or other constraints such as ‘hard to maintain’ issues exist? Does the lack of accessibility increase the risk of breaches of quality?

8. Does replacing a part risk damaging mating surfaces, seals, alignment or securing, potentially lowering quality?

9. Are the right tools being used? Simple practices such as using mole-grips instead of wrenches should be discouraged. Are tools that need to be calibrated in date for their maintenance? Are simple processes such as the right order for torquing up a set of nuts or bolts regularly practiced?

10. Are maintenance processes captured, used and adhered to? Are these tasks captured in task lists that are issued with work-orders? Are there organisational cultural issues that work against standardised processes?

11. What is the state of training and experience of the workforce? Would they benefit from further training and qualifications?

12. Do you have to strip down other machinery (which may be invasive) to get adequate access to the component you need to work on or change? (In the Navy we called this “work in wake”)

This list is by no means exhaustive, but if the management draws up such lists, and with their front line they can determine which should be improved first. If we set priorities to eliminate quality causes of failure that are most common, then it is a great start on the road to defect elimination.

The changes using the quality approach should also start to look at how the right data should be collected and what simple analysis could be used to derive metrics to measure the improvement. Improvement in reliability is always a primary function of the maintenance team and will always align with the needs of an asset-rich business.


Have any of you employed a quality improvement approach or defect elimination? What questions did you ask yourself that are not on the list above. How did you decide what to improve? We would love to hear about your experiences.

Optimise your maintenance strategies with IronMan®
Get the most from your data.
Book a demo