I was recently reading The Grid by Gretchen Bakke where a discussion of massive power plant brownouts led to discussing two approaches to industrial accidents:
One might be given to think that this blackout might have been prevented if somebody had just noticed as things slowly went awry – if in 2002 all of FirstEnergy’s “known common problems” had been dealt with rather than merely 17 percent of them, if the trees had been clipped, if a bright young eye had seen the static in the screen. But what most students of industrial accidents recognize is that perfect knowledge of complex systems is not actually the best way to make these systems safe and reliable. In part because perfect real-time knowledge is extremely difficult to come by, not only for the grid but for other dangerous yet necessary elements of modern life – like airplanes and nuclear power plants. One can just never be sure that every single bit of necessary information is being accurately tracked (and God knows what havoc those missing bits are wreaking while they presumed to-be-known bits chug along their orderly way). Even if we could eliminate all the “unknown unknowns” (to borrow a phrase from Donald Rumsfeld) from systems engineering – and we can’t – there would still be a serious problem to contend with, and that is how even closely monitored elements interact with each other in real time. And of course humans, who are always also component parts of these systems, rarely function as predictable as even the shoddiest of mechanical elements.
Rather than attempting the impossible feat of perfect control grounded in perfect information, complex industrial undertaking have for decades been veering toward another model for avoiding serious disaster. This would also seem to be the right approach for the grid, as its premise is that imperfect knowledge should not impede safe, steady functioning. The so-called Swiss Cheese Model of Industrial Accidents assumes glitches all over the place, tiny little failures or unpredicted oddities as a normal side effect of complexity. Rather than trying to “know and control” systems designers attempt to build, manage, and regulate complexity in such a way that small things are significantly impeded on their path to becoming catastrophically massive things. Three trees and a bug shouldn’t black out half the country. (p.135-136)
Social systems today are increasingly complex – see a recent post about the increasing complexity of cities – and we have more and more data regarding the components and the whole of systems. However, as this example illustrates, humans don’t always know what to do with all this data or see the necessary patterns.
The Swiss Cheese Model seems to privilege redundancy and resiliency over stopping all problems. At the same time, I assume there are limits to how many holes in the cheese are allowed, particularly when millions of residents might be affected. Who sets that limit and how is that decision made? We’ll accept a certain number of electrical failures each year but no more?