The following is from Nuclear Safety magazine, Vol. 32, #2 (April-June 1991), p. 285, summarizing a Nuclear Regulatory Commission report about the Calvert Cliffs Nuclear Power Plant in Maryland:
An NRC report, issued in May, 1989, found that operators at the plant did not respond to the interest in safety that swept the industry after the 1979 Three Mile Island accident. The report stated that workers were encouraged to cut corners to keep reactors running; this led to a series of problems that ended in the death of an employee who drowned in a water storage tank after disregarding safety requirements. The report concluded that workers had seriously deviated from one of the industry's highest priorities: following procedures.
To say that "workers were encouraged to cut corners" is probably an overstatement if interpreted literally. A main goal of reactor management is to keep the reactors running, since a shutdown costs millions of dollars. That's why managers are paid well. But managers who want to keep their jobs would never just give an order to "cut corners." They don't have to. Just push hard for performance, and the workers will figure out how to cut corners by themselves. And the series of problems won't "end" with the death of the employee, no matter how much the NRC might wish that to be the case. Running a plant of any kind involves a balance between efficiency and safety: the highest efficiency occurs when the plant is just barely in control. The only difference with a nuclear power plant is that the consequences of finding the wrong balance can be much more serious than an individual drowning.
I don't doubt that the employee drowned because he was not following good safety practices. But safety involves a lot more than just following procedures. Writing a procedure for normal operation is straightforward. The problem comes in when you try to describe what to do about contingencies. A fault tree diagram (see simple example at right), which traces out the consequences of an abnormal condition, helps here, but there are thousands of critical components in a nuclear power plant, many of which affect each other, so a fault tree quickly becomes extremely complicated. It can be pared down by considering only credible combinations of faults, but the line between a credible incident and one that's too unlikely to consider is not clear. After all, the operators at Three Mile Island were following procedures. It's just that they ran into a combination of circumstances that no procedure writer had anticipated. Here's a brief summary of what went wrong.
Three Mile Island (TMI) Accident, 1979:
A PWR is a three-loop system. The reactor cooling water (Primary Coolant Loop, shown in red) continuously recirculates. When it leaves the reactor core, it's at a high temperature: about 600 degrees F. It doesn't boil, though, because it's also under high pressure (2250 psi). The hot, high-pressure water circulates through a large number of small tubes in the steam generator, which is a heat exchanger that boils a second loop of water (blue) into steam (light blue) which drives the turbine that generates electricity. After going through the turbine, this steam is condensed back to water by being cooled by plain water (from the Susquehanna River, in the case of TMI).
In the 1979 TMI accident, A pressure relief valve opened in response to an abnormal plant condition, and then remained stuck open, but the indication to the control room was that it was closed. It later turned out that the indicator only showed that the valve had been told to close, not that it had physically done so. This was a flaw in the design of the indicator system, probably caused by some design engineers cutting corners. They, in turn, were cutting corners under pressure to produce a valve system that would meet specifications and provide a profit in competition with other manufacturers. Evidently no one, neither at the manufacturer, the utility, nor the NRC, asked the obvious question as to what use the indicator would be if the valve didn't close when commanded.
Operators had been instructed to never overfill the pressurizer, because they would not be able to further control pressure if it were completely full. What neither the designers nor the NRC had considered was that, if the pressure decreases due to a valve that's stuck open while the temperature increases, the water in the standpipe boils away. This happened during the accident: although the Pressurizer was empty, so was the standpipe, so the level sensor gave the same indication as if the standpipe were full as normal, and the Pressurizer were completely full. Therefore, the operators did not add water, so the reactor core, no longer immersed in water, overheated and began to melt.
The absolute pressure sensors in the loop and a temperature
sensor inside the Pressurizer ("TE" on the diagram) showed, correctly,
that the pressure was low and the temperature was rising to dangerous
contradictory information, the operators had to choose which
instruments to believe. They had no procedures to help them decide,
since this had not been considered a credible occurence when the
procedures were written.
Operators in the control room are alerted to instrument alarm conditions by annunciators, which use alarm sounds and flashing lights (the rows of square red lights at the top of the control panel shown to the right are annunciators). Over 100 of these annunciators went off at about the same time, with no information as to which were critical alarms, or which alarm conditions were part of the basic problem and which were secondary effects.
With too much conflicting information, and a large number of flashing lights, they made the best judgements they could. Then, fortunately, a new shift of operators came on duty. A massive disaster was prevented only because one of the fresh operators recognized what was really happening, and took the right steps to correct the condition: adding coolant water. It was too late to save the reactor, but in time to save the community (per the accident report, the reactor was within about a half-hour of an out-of-control complete meltdown). The new operator had no procedures to guide him. He figured out what to do because he brought a fresh perspective, and perhaps his understanding about how the reactor was supposed to work was above his pay grade.
The problem with TMI wasn't really a lack of procedures. You could say it was a failure of imagination on the part of the designers and regulators, who failed to consider all possible abnormalities, but the combinations are almost infinite, and all real-life accidents are the result of unforseen combinations of events. The real root cause of the emergency, which was not mentioned in the accident report, was management's urge to achieve economies of scale. A 250 megawatt reactor, when shut down, will cool off by itself without water circulation, so it's intrinsically safe. A reactor that generates almost 1000 megawatts, like the one at TMI, costs much less on a per-watt basis, but requires that the reactor be supplied with lots of cooling water after shutdown to keep itself from melting. That's an intrinsically unsafe condition, and the surprising thing, in retrospect, is that only two of these monsters have consumed themselves (the reactor at Chernobyl was also rated at about 1000 megawatts). There's basically no difference between efficiency and greed, which is the basis for some economists' conclusion that "greed is good."
It's easy to see in retrospect how this accident could have been prevented, but the errors occurred in the design phase, not in either the writing or following of procedures. If an engineer can't write procedures to cover all emergencies, it's also hard to imagine how one could write procedures that prevent workers from hurting themselves under ordinary, boring, non-emergency circumstances, such as the drowning described in the first paragraph. Some of the items might be:
1. Don't put sharp objects in your eyes or nose.