Introduction
Failure Mode and Effects Analysis (FMEA) is the heart of the RCM process. It's where you systematically document how equipment can fail and what happens when it does. Well-executed FMEA drives effective maintenance strategy; poor FMEA leads to missed failures and wasted resources.
This guide walks you through creating effective FMEA documentation step by step. We'll use a practical example throughout—a motor-driven centrifugal pump—to illustrate each stage.What is FMEA?
FMEA is a structured approach to identifying:- •How an item can fail (failure modes)
- •What causes each failure (mechanisms)
- •What happens when it fails (effects)
- •How severe and likely each failure is
RCM FMEA vs Design FMEA
There are different types of FMEA. Design FMEA (DFMEA) is used during product development to identify potential design weaknesses. Process FMEA (PFMEA) examines manufacturing processes.
RCM FMEA (sometimes called operational FMEA) focuses on:- •Equipment already in service
- •The specific operating context
- •Maintenance task selection as the output
Before You Start: Preparation
Good preparation makes analysis sessions far more productive.
1. Define the System Boundaries
Be clear about what's included and excluded. For our pump example:
Included:- •Pump casing and internals (impeller, wear rings, shaft, seals)
- •Drive motor
- •Coupling
- •Baseplate and foundation bolts
- •Local instrumentation (pressure gauge, flow indicator)
- •Suction strainer
- •Upstream and downstream piping (separate system)
- •Electrical supply (covered under electrical distribution)
- •Control system (covered under DCS/PLC analysis)
- •Lubrication supply system (separate system)
2. Gather Reference Information
Collect before the session:- •P&IDs and equipment drawings
- •Equipment datasheets and specifications
- •Manufacturer maintenance manuals
- •Operating procedures
- •Maintenance history and failure records
- •Any previous analysis documentation
3. Assemble the Team
The ideal FMEA team includes:- •Facilitator: Guides the process, documents outputs
- •Operator: Knows how equipment behaves in service
- •Maintainer: Knows how it fails and what's practical
- •Engineer: Understands design intent and failure physics
4. Schedule Adequate Time
For a system like our pump, plan for 3-4 hours. Complex systems may need multiple sessions. Avoid rushing—incomplete analysis is worse than no analysis.
Step 1: Define Functions
Start by listing everything the equipment must do. Functions answer: "What do users expect this equipment to accomplish?"
Primary Functions
These are the main reasons the equipment exists.
Example — Centrifugal Pump Primary Function:| Function No | Function Description |
|---|---|
| 1 | Transfer cooling water from reservoir to heat exchangers at minimum 200 L/min at 5 bar discharge pressure |
- •What substance (cooling water)
- •From where to where (reservoir to heat exchangers)
- •How much (200 L/min minimum)
- •At what conditions (5 bar)
Secondary Functions
These are additional expectations beyond the primary purpose.
Example — Pump Secondary Functions:| Function No | Function Description |
|---|---|
| 2 | Contain the cooling water (external leaks not to exceed 10 mL/hour) |
| 3 | Allow flow to be isolated when required (suction and discharge isolation valves) |
| 4 | Indicate discharge pressure locally (gauge reading within ±5% of actual) |
| 5 | Indicate running status to control room (run signal within 2 seconds of start) |
| 6 | Operate without excessive vibration (< 4.5 mm/s RMS at bearing housings) |
| 7 | Operate within acceptable temperature limits (bearing temp < 80°C) |
| 8 | Start reliably when called upon |
Tips for Functions
- •Use the format "To [verb] [noun] [performance standard]"
- •Quantify wherever possible
- •Consider safety, environmental, control, and efficiency aspects
- •Ask operators: "What would make you say this isn't working properly?"
Step 2: Identify Functional Failures
For each function, determine how it can fail. Functional failures are states where the function is no longer fulfilled to the required standard.
Example — Functional Failures
| Function | Functional Failure Code | Functional Failure Description |
|---|---|---|
| 1 (Transfer water at 200 L/min at 5 bar) | 1A | Unable to transfer any water |
| 1 (Transfer water at 200 L/min at 5 bar) | 1B | Unable to transfer water at 200 L/min (reduced flow) |
| 1 (Transfer water at 200 L/min at 5 bar) | 1C | Unable to maintain 5 bar discharge pressure |
| 2 (Contain water, <10 mL/hour leak) | 2A | External leak exceeding 10 mL/hour |
| 5 (Indicate running status) | 5A | Indicates running when not running |
| 5 (Indicate running status) | 5B | Indicates not running when actually running |
| 8 (Start reliably) | 8A | Fails to start when commanded |
- •Complete loss vs partial loss
- •Fails high vs fails low (for instrumentation)
Step 3: Identify Failure Modes
For each functional failure, list the specific events that could cause it. These are failure modes—the "what" that goes wrong.
Example — Failure Modes for "Unable to transfer any water"
| Functional Failure | FM | Failure Mode |
|---|---|---|
| 1A (No water transfer) | 1 | Impeller completely worn/eroded |
| 1A | 2 | Impeller loose on shaft/detached |
| 1A | 3 | Shaft sheared |
| 1A | 4 | Motor winding failure (burn out) |
| 1A | 5 | Motor bearing seized |
| 1A | 6 | Coupling failure (sheared/disconnected) |
| 1A | 7 | Suction strainer completely blocked |
| 1A | 8 | Pump casing cracked/split |
| 1A | 9 | Loss of prime (air-bound) |
Finding Failure Modes
Good sources include:- •Team experience: "What failures have we seen?"
- •Maintenance history: Work order records, failure reports
- •Generic failure mode libraries: Standard lists for equipment types
- •Manufacturer documentation: Known failure modes
- •Industry publications: Papers, case studies, standards
Level of Detail
The right level is where:- •Different failure modes need different maintenance responses
- •You can describe what happens when it occurs
- •You can assess likelihood and severity
Step 4: Describe Failure Effects
For each failure mode, describe what happens when it occurs. This is crucial for consequence assessment and maintenance task justification.
What to Include
- 1.Evidence of failure: How would someone know it happened?
- 2.Immediate effects: What happens to the process/production?
- 3.Safety/environmental impact: Any hazards created?
- 4.Secondary damage: Does it damage other equipment?
- 5.Corrective action required: What's needed to restore function?
- 6.Downtime/duration: How long to repair?
Example — Failure Effect
Failure Mode: Motor bearing seized Failure Effect:"Motor makes increasing noise (high-pitched grinding) for approximately 24-48 hours before seizing. When bearing seizes, motor trips on overload. Control room receives motor trip alarm. Standby pump auto-starts. No safety or environmental impact. If detected early (on noise), bearing can be replaced in situ—4 hours, two fitters, £50 parts. If motor runs while seized, motor rewind required—remove motor, send to shop, 2 weeks, £1,200. No secondary equipment damage."
Tips for Failure Effects
- •Write in complete sentences
- •Be specific about times, people, costs
- •Distinguish between early detection and run-to-failure scenarios
- •Include evidence that would be apparent to operators/maintainers
Step 5: Assess Consequences
For each failure mode, categorise the consequences. This directly influences maintenance task selection.
Consequence Categories (SAE JA1011)
| Category | Definition | Implications |
|---|---|---|
| Hidden | Failure not evident under normal circumstances | Failure-finding task mandatory |
| Safety/Environmental | Could cause injury or environmental breach | Proactive task must reduce risk to acceptable level; redesign if not possible |
| Operational | Affects output, quality, or customer service | Proactive task if cost-effective |
| Non-operational | Only involves repair cost | Run-to-failure often acceptable |
Assessment Questions
Work through these in order:- 1.Is the failure evident to operators under normal conditions?
- 1.Does the failure cause or contribute to a safety or environmental hazard?
- 1.Does the failure affect operations (output, quality, service)?
Example — Consequence Assessment
| Failure Mode | Evident? | Safety/Env? | Operational? | Consequence |
|---|---|---|---|---|
| Motor bearing seized | Yes (noise, alarm) | No | Minor (standby available) | Non-operational |
| Mechanical seal failure | Yes (visible leak) | No | Minor (continues operating) | Non-operational |
| Relief valve fails to open | No (only evident on demand) | Yes (overpressure possible) | — | Hidden + Safety |
Step 6: Select Maintenance Tasks
Based on failure effects and consequences, select appropriate maintenance tasks.
Task Selection Logic
For each failure mode, ask (in order):
For hidden failures:- 1.Is there an on-condition task that will detect potential failure?
- 2.Is there a scheduled restoration or discard task?
- 3.Is there a failure-finding task?
- 4.Redesign may be necessary
- 1.Is there an on-condition task that's worth doing?
- 2.Is there a scheduled restoration or discard task that's worth doing?
- 3.For safety/environmental: redesign may be necessary
- 4.For operational/non-operational: run-to-failure may be acceptable
Task Types Summary
| Task Type | What It Does | When to Use |
|---|---|---|
| On-condition | Detects potential failure before functional failure | When P-F interval exists and task is practical |
| Scheduled restoration | Restores original capability | When wear-out age is identifiable and most survive to it |
| Scheduled discard | Replaces item | When wear-out age is identifiable and failure unacceptable |
| Failure-finding | Checks if item has failed | For hidden failures only |
| Combination | Multiple tasks together | When single task doesn't address adequately |
| Run-to-failure | No scheduled maintenance | When consequences are acceptable |
Example — Task Selection
Failure Mode: Motor bearing wear leading to seizure Task selection reasoning: Is there a potential failure condition? Yes — vibration increase, noise, temperature rise Is there a P-F interval? Yes — typically weeks to months Is monitoring practical? Yes — monthly vibration readings take 5 minutes Is it worth doing? Yes — early detection prevents expensive motor rewind Selected task: Monthly vibration monitoring (portable). Set alert at 4.5 mm/s, action at 7 mm/s. Repair time: 4 hours vs 2 weeks if undetected.Step 7: Document and Review
The FMEA Worksheet
Standard RCM FMEA uses two worksheets:
Information Worksheet: Captures functions, functional failures, failure modes, and failure effects Decision Worksheet: Records consequence assessment, task selection logic, and recommended tasksExample FMEA Information Worksheet Extract
| Function | Functional Failure | Failure Mode | Failure Effect |
|---|---|---|---|
| 1. Transfer cooling water at 200 L/min at 5 bar | 1A. Unable to transfer any water | 1A.1 Motor bearing seized | Motor makes grinding noise 24-48 hrs before seizing. Motor trips on overload. Standby pump starts. 4 hr repair if detected early (£50), 2 weeks + £1,200 if motor damaged. |
| 1. Transfer cooling water at 200 L/min at 5 bar | 1A. Unable to transfer any water | 1A.2 Coupling failure | Sudden loss of flow. Motor continues running (no load). Control room sees flow alarm and motor amps drop. 2 hr repair, £150 coupling. |
| 2. Contain water (<10 mL/hr leak) | 2A. External leak >10 mL/hr | 2A.1 Mechanical seal worn | Progressive increase in seal leak over weeks. Visible dripping. Pump continues operating. Seal replacement requires pump isolation, 3 hrs, £280 seal kit. |
Review and Approval
Before implementing, FMEA should be reviewed by:- •Someone not on the analysis team (fresh eyes)
- •Operations management (will they accept the tasks?)
- •Maintenance management (can they resource the tasks?)
- •Engineering authority (technically sound?)
Common Pitfalls to Avoid
1. Incomplete Functions
Missing secondary functions means missing failure modes. Always consider containment, safety, control, indication, and efficiency.2. Copying Other Analyses
Generic FMEAs for "pumps" ignore your operating context. Adapt—don't adopt.3. Risk Ranking as the Output
Some FMEA approaches focus on calculating Risk Priority Numbers (RPN). RCM FMEA focuses on task selection. High-risk items need effective tasks; the ranking itself isn't the point.4. Stopping at FMEA
FMEA without implementation is just documentation. Every failure mode should result in either a task or a conscious decision to run to failure.5. One-Time Exercise
Equipment, processes, and knowledge evolve. Review FMEAs after failures, modifications, or at regular intervals.Key Takeaways
- •Prepare thoroughly: Gather documentation, assemble the right team, define boundaries
- •Start with functions: Clear, quantified functions drive good analysis
- •Be systematic: Work through functional failures, then failure modes for each
- •Write useful failure effects: Include evidence, impact, and repair requirements
- •Assess consequences properly: Hidden, safety, operational, non-operational
- •Select appropriate tasks: Based on P-F intervals, wear-out ages, and consequence severity
- •Document everything: Future reviewers (and future you) will thank you
- •Implement what you analyse: FMEA isn't complete until tasks are in your CMMS
Tools and Templates
A well-designed template guides analysis and ensures consistency. Key features to look for:- •Clear prompts for each field
- •Space for adequate detail in failure effects
- •Decision logic diagram integration
- •Easy transfer to CMMS
- •Information Worksheet template
- •Decision Worksheet template
- •RCM Decision Diagram
- •Completed example analysis
- •Quick-start guide
FMEA is a skill that improves with practice. Your first analysis will be slower and rougher than your tenth. The key is to start, learn, and continuously improve. Every completed analysis makes your maintenance programme stronger and builds your team's capability.
Good luck—and thorough analysis!Ready to Improve Your Maintenance Programme?
Our professionally designed RCM templates and tools help you implement reliability best practices efficiently.
Reliability HQ
Sharing practical reliability engineering knowledge to help maintenance professionals implement RCM effectively. Based on SAE JA1011 standards and real-world experience.