Sunday, January 10, 2016

Probability (P)

It is necessary to look at the cause of a failure mode and the likelihood of occurrence. This can be done by analysis, calculations / FEM, looking at similar items or processes and the failure modes that have been documented for them in the past. A failure cause is looked upon as a design weakness. All the potential causes for a failure mode should be identified and documented. This should be in technical terms. Examples of causes are: Human errors in handling, Manufacturing induced faults, Fatigue, Creep, Abrasive wear, erroneous algorithms, excessive voltage or improper operating conditions or use (depending on the used ground rules). A failure mode is given a Probability Ranking.
RatingMeaning
AExtremely Unlikely (Virtually impossible or No known occurrences on similar products or processes, with many running hours)
BRemote (relatively few failures)
COccasional (occasional failures)
DReasonably Possible (repeated failures)
EFrequent (failure is almost inevitable)

Severity (S)

Determine the Severity for the worst-case scenario adverse end effect (state). It is convenient to write these effects down in terms of what the user might see or experience in terms of functional failures. Examples of these end effects are: full loss of function x, degraded performance, functions in reversed mode, too late functioning, erratic functioning, etc. Each end effect is given a Severity number (S) from, say, I (no effect) to VI (catastrophic), based on cost and/or loss of life or quality of life. These numbers prioritize the failure modes (together with probability and detectability). Below a typical classification is given. Other classifications are possible. See also hazard analysis.
RatingMeaning
INo relevant effect on reliability or safety
IIVery minor, no damage, no injuries, only results in a maintenance action (only noticed by discriminating customers)
IIIMinor, low damage, light injuries (affects very little of the system, noticed by average customer)
IVModerate, moderate damage, injuries possible (most customers are annoyed, mostly financial damage)
VCritical (causes a loss of primary function; Loss of all safety Margins, 1 failure away from a catastrophe, severe damage, severe injuries, max 1 possible death )
VICatastrophic (product becomes inoperative; the failure may result in complete unsafe operation and possible multiple deaths)

Detection (D)

The means or method by which a failure is detected, isolated by operator and/or maintainer and the time it may take. This is important for maintainability control (Availability of the system) and it is especially important for multiple failure scenarios. This may involve dormant failure modes (e.g. No direct system effect, while a redundant system / item automatic takes over or when the failure only is problematic during specific mission or system states) or latent failures (e.g. deterioration failure mechanisms, like a metal growing crack, but not a critical length). It should be made clear how the failure mode or cause can be discovered by an operator under normal system operation or if it can be discovered by the maintenance crew by some diagnostic action or automatic built in system test. A dormancy and/or latency period may be entered.
RatingMeaning
1Certain - fault will be caught on test
2Almost certain
3High
4Moderate
5Low
6Fault is undetected by Operators or Maintainers

Dormancy or Latency Period

The average time that a failure mode may be undetected may be entered if known. For example:
  • Seconds, auto detected by maintenance computer
  • 8 hours, detected by turn-around inspection
  • 2 months, detected by scheduled maintenance block X
  • 2 years, detected by overhaul task x

Indication

If the undetected failure allows the system to remain in a safe / working state, a second failure situation should be explored to determine whether or not an indication will be evident to all operators and what corrective action they may or should take.
Indications to the operator should be described as follows:
  • Normal. An indication that is evident to an operator when the system or equipment is operating normally.
  • Abnormal. An indication that is evident to an operator when the system has malfunctioned or failed.
  • Incorrect. An erroneous indication to an operator due to the malfunction or failure of an indicator (i.e., instruments, sensing devices, visual or audible warning devices, etc.).

This type of analysis is useful to determine how effective various test processes are at the detection of latent and dormant faults. The method used to accomplish this involves an examination of the applicable failure modes to determine whether or not their effects are detected, and to determine the percentage of failure rate applicable to the failure modes which are detected. The possibility that the detection means may itself fail latent should be accounted for in the coverage analysis as a limiting factor (i.e., coverage cannot be more reliable than the detection means availability). Inclusion of the detection coverage in the FMEA can lead to each individual failure that would have been one effect category now being a separate effect category due to the detection coverage possibilities. Another way to include detection coverage is for the FTA to conservatively assume that no holes in coverage due to latent failure in the detection method affect detection of all failures assigned to the failure effect category of concern. The FMEA can be revised if necessary for those cases where this conservative assumption does not allow the top event probability requirements to be met.
After these three basic steps the Risk level may be provided.

Risk level (P*S) and (D)

Risk is the combination of End Effect Probability And Severity where probability and severity includes the effect on non-detectability (dormancy time). This may influence the end effect probability of failure or the worst case effect Severity. The exact calculation may not be easy in all cases, such as those where multiple scenarios (with multiple events) are possible and detectability / dormancy plays a crucial role (as for redundant systems). In that case Fault Tree Analysis and/or Event Trees may be needed to determine exact probability and risk levels.
Preliminary Risk levels can be selected based on a Risk Matrix like shown below, based on Mil. Std. 882.[24] The higher the Risk level, the more justification and mitigation is needed to provide evidence and lower the risk to an acceptable level. High risk should be indicated to higher level management, who are responsible for final decision-making.
Probability / Severity -->IIIIIIIVVVI
ALowLowLowLowModerateHigh
BLowLowLowModerateHighUnacceptable
CLowLowModerateModerateHighUnacceptable
DLowModerateModerateHighUnacceptableUnacceptable
EModerateModerateHighUnacceptableUnacceptableUnacceptable
Timing
The FMEA should be updated whenever:
  • A new cycle begins (new product/process)
  • Changes are made to the operating conditions
  • A change is made in the design
  • New regulations are instituted
  • Customer feedback indicates a problem

Uses
  • Development of system requirements that minimize the likelihood of failures.
  • Development of designs and test systems to ensure that the failures have been eliminated or the risk is reduced to acceptable level.
  • Development and evaluation of diagnostic systems
  • To help with design choices (trade-off analysis).

Advantages
  • Improve the quality, reliability and safety of a product/process
  • Improve company image and competitiveness
  • Increase user satisfaction
  • Reduce system development time and cost
  • Collect information to reduce future failures, capture engineering knowledge
  • Reduce the potential for warranty concerns
  • Early identification and elimination of potential failure modes
  • Emphasize problem prevention
  • Minimize late changes and associated cost
  • Catalyst for teamwork and idea exchange between functions
  • Reduce the possibility of same kind of failure in future
  • Reduce impact on company profit margin
  • Improve production yield
  • Maximizes profit

References

  1. Jump up System Reliability Theory: Models, Statistical Methods, and Applications, Marvin Rausand & Arnljot Hoylan, Wiley Series in probability and statistics - second edition 2004, page 88
  2. Jump upProject Reliability Group (July 1990). Koch, John E., ed. Jet Propulsion Laboratory Reliability Analysis Handbook (pdf). Pasadena, California: Jet Propulsion Laboratory. JPL-D-5703. Retrieved 2013-08-25.
  3. Jump up Goddard Space Flight Center (GSFC) (1996-08-10). Performing a Failure Mode and Effects Analysis (pdf). Goddard Space Flight Center. 431-REF-000370. Retrieved 2013-08-25.
  4. Jump up Langford, J. W. (1995). Logistics: Principles and Applications. McGraw Hill. p. 488.
  5. Jump up United States Department of Defense (9 November 1949). MIL-P-1629 - Procedures for performing a failure mode effect and critical analysis. Department of Defense (US). MIL-P-1629.
  6. Jump up United States Department of Defense (24 November 1980). MIL-STD-1629A - Procedures for performing a failure mode effect and criticality analysis. Department of Defense (USA). MIL-STD-1629A.
  7. Jump up Neal, R.A. (1962). Modes of Failure Analysis Summary for the Nerva B-2 Reactor (PDF). Westinghouse Electric Corporation Astronuclear Laboratory. WANL–TNR–042. Retrieved 2010-03-13.
  8. Jump up Dill, Robert; et al. (1963). State of the Art Reliability Estimate of Saturn V Propulsion Systems (PDF). General Electric Company. RM 63TMP–22. Retrieved 2010-03-13.
  9. Jump upProcedure for Failure Mode, Effects and Criticality Analysis (FMECA) (PDF). National Aeronautics and Space Administration. 1966. RA–006–013–1A. Retrieved 2010-03-13.
  10. Jump upFailure Modes, Effects, and Criticality Analysis (FMECA) (PDF). National Aeronautics and Space Administration JPL. PD–AD–1307. Retrieved 2010-03-13.
  11. Jump up Experimenters' Reference Based Upon Skylab Experiment Management (PDF). National Aeronautics and Space Administration George C. Marshall Space Flight Center. 1974. M–GA–75–1. Retrieved 2011-08-16.
  12. Jump up Design Analysis Procedure For Failure Modes, Effects and Criticality Analysis (FMECA). Society for Automotive Engineers. 1967. ARP926.
  13. Jump up Dyer, Morris K.; Dewey G. Little; Earl G. Hoard; Alfred C. Taylor; Rayford Campbell (1972). Applicability of NASA Contract Quality Management and Failure Mode Effect Analysis Procedures to the USFS Outer Continental Shelf Oil and Gas Lease Management Program (PDF). National Aeronautics and Space Administration George C. Marshall Space Flight Center. TM X–2567. Retrieved2011-08-16.

No comments:

Post a Comment