Probability (P)
It is necessary to look at the cause of a failure mode and the likelihood of occurrence. This can be done by analysis, calculations / FEM, looking at similar items or processes and the failure modes that have been documented for them in the past. A failure cause is looked upon as a design weakness. All the potential causes for a failure mode should be identified and documented. This should be in technical terms. Examples of causes are: Human errors in handling, Manufacturing induced faults, Fatigue, Creep, Abrasive wear, erroneous algorithms, excessive voltage or improper operating conditions or use (depending on the used ground rules). A failure mode is given a Probability Ranking.
| Rating | Meaning |
|---|---|
| A | Extremely Unlikely (Virtually impossible or No known occurrences on similar products or processes, with many running hours) |
| B | Remote (relatively few failures) |
| C | Occasional (occasional failures) |
| D | Reasonably Possible (repeated failures) |
| E | Frequent (failure is almost inevitable) |
Severity (S)
Determine the Severity for the worst-case scenario adverse end effect (state). It is convenient to write these effects down in terms of what the user might see or experience in terms of functional failures. Examples of these end effects are: full loss of function x, degraded performance, functions in reversed mode, too late functioning, erratic functioning, etc. Each end effect is given a Severity number (S) from, say, I (no effect) to VI (catastrophic), based on cost and/or loss of life or quality of life. These numbers prioritize the failure modes (together with probability and detectability). Below a typical classification is given. Other classifications are possible. See also hazard analysis.
| Rating | Meaning |
|---|---|
| I | No relevant effect on reliability or safety |
| II | Very minor, no damage, no injuries, only results in a maintenance action (only noticed by discriminating customers) |
| III | Minor, low damage, light injuries (affects very little of the system, noticed by average customer) |
| IV | Moderate, moderate damage, injuries possible (most customers are annoyed, mostly financial damage) |
| V | Critical (causes a loss of primary function; Loss of all safety Margins, 1 failure away from a catastrophe, severe damage, severe injuries, max 1 possible death ) |
| VI | Catastrophic (product becomes inoperative; the failure may result in complete unsafe operation and possible multiple deaths) |
Detection (D)
The means or method by which a failure is detected, isolated by operator and/or maintainer and the time it may take. This is important for maintainability control (Availability of the system) and it is especially important for multiple failure scenarios. This may involve dormant failure modes (e.g. No direct system effect, while a redundant system / item automatic takes over or when the failure only is problematic during specific mission or system states) or latent failures (e.g. deterioration failure mechanisms, like a metal growing crack, but not a critical length). It should be made clear how the failure mode or cause can be discovered by an operator under normal system operation or if it can be discovered by the maintenance crew by some diagnostic action or automatic built in system test. A dormancy and/or latency period may be entered.
| Rating | Meaning |
|---|---|
| 1 | Certain - fault will be caught on test |
| 2 | Almost certain |
| 3 | High |
| 4 | Moderate |
| 5 | Low |
| 6 | Fault is undetected by Operators or Maintainers |
Dormancy or Latency Period
The average time that a failure mode may be undetected may be entered if known. For example:
- Seconds, auto detected by maintenance computer
- 8 hours, detected by turn-around inspection
- 2 months, detected by scheduled maintenance block X
- 2 years, detected by overhaul task x
Indication
If the undetected failure allows the system to remain in a safe / working state, a second failure situation should be explored to determine whether or not an indication will be evident to all operators and what corrective action they may or should take.
Indications to the operator should be described as follows:
- Normal. An indication that is evident to an operator when the system or equipment is operating normally.
- Abnormal. An indication that is evident to an operator when the system has malfunctioned or failed.
- Incorrect. An erroneous indication to an operator due to the malfunction or failure of an indicator (i.e., instruments, sensing devices, visual or audible warning devices, etc.).
After these three basic steps the Risk level may be provided.
Risk level (P*S) and (D)
Risk is the combination of End Effect Probability And Severity where probability and severity includes the effect on non-detectability (dormancy time). This may influence the end effect probability of failure or the worst case effect Severity. The exact calculation may not be easy in all cases, such as those where multiple scenarios (with multiple events) are possible and detectability / dormancy plays a crucial role (as for redundant systems). In that case Fault Tree Analysis and/or Event Trees may be needed to determine exact probability and risk levels.
Preliminary Risk levels can be selected based on a Risk Matrix like shown below, based on Mil. Std. 882.[24] The higher the Risk level, the more justification and mitigation is needed to provide evidence and lower the risk to an acceptable level. High risk should be indicated to higher level management, who are responsible for final decision-making.
| Probability / Severity --> | I | II | III | IV | V | VI |
|---|---|---|---|---|---|---|
| A | Low | Low | Low | Low | Moderate | High |
| B | Low | Low | Low | Moderate | High | Unacceptable |
| C | Low | Low | Moderate | Moderate | High | Unacceptable |
| D | Low | Moderate | Moderate | High | Unacceptable | Unacceptable |
| E | Moderate | Moderate | High | Unacceptable | Unacceptable | Unacceptable |
The FMEA should be updated whenever:
- A new cycle begins (new product/process)
- Changes are made to the operating conditions
- A change is made in the design
- New regulations are instituted
- Customer feedback indicates a problem
Uses
- Development of system requirements that minimize the likelihood of failures.
- Development of designs and test systems to ensure that the failures have been eliminated or the risk is reduced to acceptable level.
- Development and evaluation of diagnostic systems
- To help with design choices (trade-off analysis).
Advantages
- Improve the quality, reliability and safety of a product/process
- Improve company image and competitiveness
- Increase user satisfaction
- Reduce system development time and cost
- Collect information to reduce future failures, capture engineering knowledge
- Reduce the potential for warranty concerns
- Early identification and elimination of potential failure modes
- Emphasize problem prevention
- Minimize late changes and associated cost
- Catalyst for teamwork and idea exchange between functions
- Reduce the possibility of same kind of failure in future
- Reduce impact on company profit margin
- Improve production yield
- Maximizes profit
References
- System Reliability Theory: Models, Statistical Methods, and Applications, Marvin Rausand & Arnljot Hoylan, Wiley Series in probability and statistics - second edition 2004, page 88
- Project Reliability Group (July 1990). Koch, John E., ed. Jet Propulsion Laboratory Reliability Analysis Handbook (pdf). Pasadena, California: Jet Propulsion Laboratory. JPL-D-5703. Retrieved 2013-08-25.
- Goddard Space Flight Center (GSFC) (1996-08-10). Performing a Failure Mode and Effects Analysis (pdf). Goddard Space Flight Center. 431-REF-000370. Retrieved 2013-08-25.
- Langford, J. W. (1995). Logistics: Principles and Applications. McGraw Hill. p. 488.
- United States Department of Defense (9 November 1949). MIL-P-1629 - Procedures for performing a failure mode effect and critical analysis. Department of Defense (US). MIL-P-1629.
- United States Department of Defense (24 November 1980). MIL-STD-1629A - Procedures for performing a failure mode effect and criticality analysis. Department of Defense (USA). MIL-STD-1629A.
- Neal, R.A. (1962). Modes of Failure Analysis Summary for the Nerva B-2 Reactor (PDF). Westinghouse Electric Corporation Astronuclear Laboratory. WANL–TNR–042. Retrieved 2010-03-13.
- Dill, Robert; et al. (1963). State of the Art Reliability Estimate of Saturn V Propulsion Systems (PDF). General Electric Company. RM 63TMP–22. Retrieved 2010-03-13.
- Procedure for Failure Mode, Effects and Criticality Analysis (FMECA) (PDF). National Aeronautics and Space Administration. 1966. RA–006–013–1A. Retrieved 2010-03-13.
- Failure Modes, Effects, and Criticality Analysis (FMECA) (PDF). National Aeronautics and Space Administration JPL. PD–AD–1307. Retrieved 2010-03-13.
- Experimenters' Reference Based Upon Skylab Experiment Management (PDF). National Aeronautics and Space Administration George C. Marshall Space Flight Center. 1974. M–GA–75–1. Retrieved 2011-08-16.
- Design Analysis Procedure For Failure Modes, Effects and Criticality Analysis (FMECA). Society for Automotive Engineers. 1967. ARP926.
- Dyer, Morris K.; Dewey G. Little; Earl G. Hoard; Alfred C. Taylor; Rayford Campbell (1972). Applicability of NASA Contract Quality Management and Failure Mode Effect Analysis Procedures to the USFS Outer Continental Shelf Oil and Gas Lease Management Program (PDF). National Aeronautics and Space Administration George C. Marshall Space Flight Center. TM X–2567. Retrieved2011-08-16.
No comments:
Post a Comment