Failure Investigations

1Q. What does a failure investigation consist of?

1A. There are many types, however for failure investigations on machinery or pressure vessels it usually means a Statement of the Problem, meaning what happened and then Collection of Data, which are interviews, reviewing logs, computer records, or performing analytical and metallurgical analysis. It's a good idea to try to determine what has changed from before the failure occurred. With this data we can then Form A Hypothesis or saying another way, Make A Best Guess to what happened. We then Compare the Best Guess Against The Data to see if it makes sense. If it does we have found a Most Probable Cause and if it doesn't we get additional data until it does. Finally, we implement a solution so it doesn't happen again. In the world of science this is The Scientific Method. As you can see it is all quite orderly and not unlike the procedure a good detective might follow.

2Q. We send out the parts to a metallurgical laboratory for evaluation what else do we need to do?

2A. It depends on the failure, but when Safety or Production losses are involved you would need to do quite a bit more in a failure investigation. A metallurgical analysis can provide you information on what failure mode was involved. Was it fatigue, corrosion, brittle fracture, ductile failure, sudden failure, wrong material etc. What still needs to be pieced together is what could have caused it. Unless you know what caused it you can't fix it. Suppose a failure in a piping connection is analyzed as a high cycle fatigue failure emanating from a weld defect. What do we do, what caused the vibration, what do we change? Interviews with operations personnel, vibration analysis modeling of the piping system, stress analysis of the piping joint may also be necessary to solve the problem and see if it is possible at other points in the system. A metallurgical evaluation only provides a piece of the puzzle just as a forensic laboratory only provides partial information in solving a crime.

3Q. We have gone through a failure analysis on a machine but it is a random type failure and the investigation hasn't determined the cause. What should we do next?

3A. When everything in 2. has been done and a most probable cause has not been identified the cause cannot be eliminated and could occur again. Not a comforting thought, but it does happen. In such cases you can either address all the possible causes and make sure the effects of another failure are minimal. A risk analysis of some type might be in order here. Instrumenting up with continual monitoring on vibration, strain, pressure spikes etc. may also help identify the cause if it can capture it. Unfortunately the failure usually occurs again after several months when the instrumentation is no longer functioning. Random failures are tough.

4Q. What are a couple of good questions ask to help in troubleshooting?

4A. Has it been a problem since installation or did the problem start after a downtime, repair or redesign? Each question will help eliminate possible causes. For example, if vibration of a turbine has always been a problem, then it might be a rotor dynamics problem. If it started after several years, it might be fouling, a missing blade or operational change.