Simple-Complex Systems
In modern life, we're used to little things failing: a computer program crashes, a phone battery doesn't charge as much, a wire frays, power goes out in a storm. People who have studied or lived through disasters have learned that these little failures can have massive unforeseen consequences. The TV show The Expanse has a wonderfully clear explanation of these "simple-complex systems".
The Expanse season 2, episode 10 "The Cascade" |
I've mentioned The Expanse before. The context for this explanation is Prax (botanist) and Amos (starship mechanic) searching through Ganymede Station (Ganymede is a moon of Jupiter; the station is dug into the moon). Prax notices that some of the plants used to scrub the air and create oxygen look sick, and decides to investigate.
(Thanks to the Expanse Fandom community for this transcript; emphasis added by me)
Prax : They're using distilled water in the hydroponic supply instead of the proper mineral solutions needed for long-term stability.
Amos : That sounds bad.
Prax : They'll only be able to get away with it for another week, maybe two. After that, the air, the scrubbing plants, what's left of them, will die off. When that happens, they won't be able to stop the cascade.
Amos : What's the cascade?
Prax : In real nature, there's enough diversity to cushion an ecosystem when something catastrophic happens. Nothing that we build, our ships, our stations, has that depth. Now in an artificial ecosystem, when one thing goes wrong, there's only a certain amount of pathways that can compensate for it. Eventually those pathways get overstressed, and then they fail. Which leaves fewer pathways, and then they'll get overstressed and then they fail.
Amos : So it's not the thing that breaks you that you need to watch out for.
Prax : Exactly. And Ganymede is a simple complex system. Because it's simple, it's prone to cascades, and because it's complex, you can't predict what's going to break down next or how.
Amos : Yeah, but Ganymede is the most important food station out here. They're not going to let it just collapse.
Prax : This station is dead already. They just don't know it yet.
Real life
Many of the systems in our lives are simple-complex: they have less redundancy than nature, but enough complexity that we can't catch every potential failure path. At their most innocuous, a computer might hang and crash the operating system because of some unexpected behavior by a user and a program. At the more severe end would be the 1975 Browns Ferry nuclear incident: workers conducting a safety inspection were checking for leaks using a small candle...which caused a fire which nearly caused the meltdown of one reactor. The fire was in a critical location where many control cables converged (because people would prefer fewer holes in a reactor containment vessel), which meant that the system redundancies and backups had a single point of failure.
American Airlines flight 96 in 1972 and Turkish Airlines flight 981 in 1974 were accidents where a bent cargo door handle and its linkages caused loss of all control to one of the three engines and several aerodynamic control surfaces. Self-driving vehicles get in accidents that humans would avoid. Seemingly recoverable failures or errors can compound in unexpected ways.
Modern systems design is about making large and/or critical systems as redundant (rather than "simple") as possible, as the complexity is often irreducible. As system designers or users, we seek multiple ways to perform a task or ensure something is done; ideally these options should include some widely-separated choices. Navigating on a road trip doesn't just mean downloading two different map apps to your phone; it can mean packing along a second device or even a paper atlas. In a power outage, communications with the outside world might be a cell phone (if local towers still work), a landline (if the local phone exchange still works), driving / biking / walking to somewhere with power, or meeting at a community location.
I've done some thinking on stressed systems before, like suddenly ramping up production for wartime or upgrades of super-optimized aircraft. As much as possible, I stress preserving optionality and building for iteration so that systems can evolve as their environments and requirements change. And remember that no one does it alone: everyone needs a team, and that's true in spotting the various failure paths (to avoid them) and stopping cascades early (before they are unstoppable).
Comments
Post a Comment