Maintenance can be a complex domain. There are several categories of maintenance tasks with varying descriptions, including: preventative, corrective, predictive, proactive, scheduled, unscheduled, condition based, and reliability centred maintenance.
In this blog we will put forward a taxonomy of maintenance tasks that not only includes a description of each category, but also when and why they are applicable.
‘BS 13306:2017’ is an international standard that defines maintenance terminology. It provides taxonomies of both maintenance types and activities; however, it is not definitive as it does not fully describe defect elimination or failure finding tasks, nor does it recognise that conducting no-scheduled maintenance (NSM) is a deliberate choice.
This taxonomy is justified as the scheduled elements are provided with an idea of how periodicities for scheduling the tasks are derived, and which tasks are applicable to specific failure patterns. This breakdown allows the measurement of effectiveness and efficiency of the tasks.
The following diagram shows the taxonomy used, which will be further explained below…
The scheduling of preventative maintenance implies repeating the task at a periodic interval. This could be:
We will now look at each type of preventative maintenance…
This refers to a periodically scheduled maintenance intervention that allows the condition of a component to be restored to a nominal value. The condition should slowly deteriorate over operating time, and the periodicity is set before machinery functions are seriously degraded.
An example of a scheduled restoration task includes cleaning carbon dust from rotating electrical machines’ carbon brushes used in commutators or slip rings when the task falls due. The carbon dust accumulates and gradually reduces the insulation resistance in the electrical windings. The task of cleaning the carbon dust restores nominal values of insulation resistance.
This refers to a periodically scheduled maintenance intervention that relies on the condition deteriorating over time or age, where the probability of functional failure substantially increases after a period of useful life. The component cannot be restored on the maintenance line and is swapped out for a new or an off-line refurbished component. The old component is replaced regardless if it is still functional, which implies the condition of the component is not measurable when it is in service.
Scheduled restoration or replacements are only applicable if the component failure patterns are wear-out. There needs to be an initial period of useful life before the component finally wears out and there is a rapid increase in the probability of failure. The periodicity for timing the task is set to equal the end of useful life. Many maintenance regimes mistakenly apply scheduled replacement tasks where the component’s failure pattern is either random or premature. This is wasteful and increases costs for no benefit.
An example of scheduled replacement includes filters that cannot be restored on-line, which are changed out for new after a period of operation.
Condition based maintenance does not prevent failure; instead it detects the onset of failure or tracks gradual deterioration of condition so we can take early recovery action. It enables us to avoid the worst consequences of full failure.
Predictive maintenance is a type of condition based maintenance where we rely on fixed sensors and have access to the sensor time series data. Because the data monitoring and analysis is continuous there is no scheduling. The data acquisition and processing provides a high degree of automation so beyond the initial investment in the software the ongoing costs are low. The data is analysed in order to:
An example may be a bearing that suffers a shock overload event that initiates failure that could then be detected using fixed vibration and temperature sensors.
Examples include structural cracking due to cyclic fatigue — this may only be practically measurable when cracks grow to the surface of structures where they can be conveniently detected. Additionally, corrosion in accessible places may be observed and measured at any time during the service life of the asset.
An example includes the bearing failure. The initial vibration based symptom that is used to diagnose the failure may be followed, after some time, by the bearing temperature increasing. As the bearing deteriorates it generates more heat. The time gap between different symptoms presenting should be predictable and yields prognostic information.
With scheduled condition based maintenance the periodicity of the task is not related to useful life; instead, it is a function of the P-F interval. This is the time from initial diagnosis (designating a Potential failure, the P point) to the time of Functional failure, the F point.
The periodicity for non-safety implicated failure modes is half the P-F interval; whilst safety implicated failure should be one third of the P-F interval. When used effectively, once the failure is diagnosed there should be sufficient time (remaining useful life) to plan and predispose resources for recovery. The time for taking the machine out of service may also be optimised, before final failure, minimising operational disruption. The recovery task effectively becomes a deferred corrective maintenance task.
Condition based maintenance is applicable for all of patterns of failure and is the only practical choice for the random failure pattern. For the premature failure pattern, the primary maintenance response should be to conduct Root-Cause Analysis, but on-condition maintenance may also be done as a palliative action. This attempts to avoid final failure consequences before the causes of premature failure are eliminated.
*NDE is Non-destructive Examination, NDT is Non-destructive testing
This task is especially important to conduct for protection alarm or interlock systems that can fail but go unnoticed by the operating and maintenance crew conducting their normal duties. If the equipment suffers a failure, whilst the protection device which usually protects against such failure, has itself failed, then the consequences of the two failures is likely to be grave.
An example being if a home fire or smoke detection system fails, and a subsequent fire causes casualties through smoke inhalation because there was no alarm.
The periodicity of conducting failure finding tasks may be determined by calculating the joint probability that both failures occur in a set time period. The resultant probability is then reduced to an acceptable level (usually set by an organisation’s safety policy). Failure finding tasks are usually functional checks, introduced periodically to reduce the risks of undetected failures.
For the example of the fire alarm there will usually be a test button, or the function can be tested with a match. Manufacturers will recommend the periodicity of the checks.
During product design a great deal of effort is spent to reduce intrinsic safety risks to an acceptable minimum. Safety and protection devices are usually designed to ‘fail safe’ or fail in an obvious way, so it can be easily noticed. Sometimes designing safe is not possible, and redundancy is introduced with many separate protection systems which do not have common failure modes working in a voting system. This might mean two out of three protection devices have to fail or detect a dangerous measurement to trip the equipment.
In the Maintenance gurus experience, within many organisations failure finding tasks are not recognised for how vital they are for safety. Many regimes carry out physical safety checks under a different management system from a CMMS and resultantly, these checks are not recognised as maintenance.
In a previous blog we used an example of a pilot walking around their aircraft checking for any possible failures. Although it is not usually recognised as such these are classic failure finding maintenance tasks.
How many maintenance regimes clearly designate which of their tasks are failure finding, so they do not mistakenly change the periodicity of checks, because they rarely find problems thereby inadvertently introducing a safety risk?
These checks are usually fairly frequent; many are done of a daily or shift basis. The tasks include checking and replenishing consumables, lubrication and cleaning. Cleaning is very important as the risks of contamination or fire increase in machinery spaces if cleaning is neglected.
Zonal checks usually involve experienced operators or maintainers walking around sections of their machinery, using their five human senses and experience to look for anything out of the ordinary. This is normal working practice for maintainers. Experienced maintainers develop a tacit skill to notice anomalies which can then be acted on. The proportion of incipient failure found by these practices is surprisingly high. However, these tasks are not captured in the CMMS and are unlikely to be accounted for financially. This can result in an unexpected decrease in reliability if drastic manpower reduction is implemented.
This inclusion may seem strange; however, it is important to record that components have been analysed and the risks of failure are low, coupled with a lack of maintenance tasks that are practical or cost effective. This is an active decision and is justified by the analysis work. If the risks and consequences of failure are unacceptable and maintenance isn’t practical, further modifications might be required to reduce risks to acceptable levels.
This is where a failure has occurred, but the consequences may be small enabling operations to continue, possibly at a reduced performance until recovery can be planned and resources predisposed.
An example may be the failure of redundant equipment, or if the condition-based maintenance has enough ‘Remaining useful life’ to reach a planned outage.
Usually, equipment has a routine of being periodically released for planned maintenance, including short, frequent, planned outages or shutdowns. Many deferred corrective maintenance tasks can be completed during these periods.
This is where the recovery from a failure is urgent as the consequences of the failure may cause operational disruption. For mobile equipment the operating context may dictate that recovery is urgent and must be continuously worked on until completed.
A ship in a storm with loss of propulsion would be an example.
Components that require immediate recovery may be better candidates for predictive maintenance, or deferred corrective maintenance if the RUL is sufficient.
Modifications can include design change of components to avoids or protect against failures, increasing intrinsic reliability. Modifications can also include changing maintenance or operations. Perhaps the cadence of frequent planned outages is not optimal for reliability, or the duty cycle is putting too much stress on the equipment. The operating environment may also cause problems and may require modifications to enclose equipment and protect it from environmental stressors.
An example of maintenance that modifies the operating environment: in an open cast mine site suppressing dust and maintaining smoother mine road surfaces to reduce vibration for mobile equipment.
Defect elimination is sometimes called proactive maintenance. It is a constant improvement effort that looks for common preventable causes of failure. These causes can be eliminated by modification or improving quality. If the causes are problems with design or manufacturing, then suppliers may be contacted to determine if Warranty claims are appropriate, or whether modifications from the manufacturer are possible.
Defect elimination could be run in two basic ways:
1. Root Cause Analysis on all components with a premature failure pattern, or random pattern, where the engineering staff believe the component should have a wear-out pattern.
2. A quality management project approach looking at how operations, maintenance or logistics handling can be improved. This could be part of adopting a ‘Total Quality Maintenance’ approach to maintenance management.
This blog has attempted to clarify maintenance terminology, by delving into a task classification taxonomy and providing explanations of applicability.
We would love to hear how you classify your maintenance tasks in a CMMS system? How do you know if the task types are applicable to the failure behaviour of the components? How do you appropriately set periodicities using a principled method for each type of task?
How could we know which pattern our component failure modes are failing in? The answer is in one of the most powerful techniques we can deploy in reliability engineering, involving Weibull analysis.