Sep 11, 2012

Infrastructure Resilience

Infrastructure is an increasingly popular term these days.  Whether we are talking about our failing infrastructures because of the current economic crisis, or we are worrying about targeted infrastructure when discussing cyber-war and terror attacks.  All advanced societies depend on infrastructure, and the more advanced the society, the higher that dependency — and the consequences of failure.

Recently, I have been reading a fascinating bookResilience, by Andrew Zolli.  He writes about resilience across many levels, but also takes an in-depth dive into the resilience of infrastructure and how it affects our lives in many ways.  In reading the book, I was reminded of some of the lessons I learned after revealing terrorist networks and analyzing infrastructure as networks.  

Below is an infrastructure network in a prominent country.  This network is critical to this economy's international trade, helping create a trade surplus for this country.

The layout of this network is typical for many man-made networks — they are built for efficiency, and not resilience.   Efficiency affects the network two ways — when it is functioning as intended, and when it is failing.  The nodes above are connected with an almost minimum number of links so as to avoid redundancy — engineers often want to keep materials and choices to a minimum in complex systems.  If the network is hit by a random failure, often the result is not catastrophic.  There are many places the network can fail with just local affects.

Unfortunately, our strong focus on efficiency can be used against us in terror attacks and cyber-war.  Although efficient networks can handle random failures, they are extremely brittle when faced with intelligent, targeted attacks on nodes and links that maintain the network's connectivity.  The network above is both highly efficient and highly brittle.  You need to disable only two nodes for the network to break apart into unconnected components stopping the flow through the system.  Local geographic attacks can be successful with an attack on a single node or link.  If you were planning to disable the above network, which two nodes would you choose? (Put your answers in the Comments)

Our systems must be designed with the often conflicting goals of efficiency and resilience.  Resilience often requires redundancy —a common enemy of efficiency.  Redundancy provides alternate pathways in the network — if one path is blocked/disabled, others are available to continue the flow.  In the diagram above, in most places, if your path is blocked you have no alternatives — you sit and wait until the network is repaired.

Resiliency requires Redundancy

Network thinkers know that resiliency requires redundancy — we need alternative choices when we encounter failures.  Some redundancy is a good thing — just not too much!  Nature uses redundancy in living systems to help them adapt to change.  Most client networks we examine, that are effective at getting things done, have a reasonable amount of redundancy in the paths available throughout the structure.

The secret to resiliency are alternative paths in the network.  But, how do we know where to put the alternative paths?  We can use network analysis to determine our easy points of failure.  Other factors, such as geography, can help determine the most easily attacked nodes/links.  Although we can not plan for all possible attacks, or natural breakdowns, we can build some alternatives into our systems to make them more robust.  We want systems that degrade gradually after an attack, not brittle systems that fall apart after a few intelligently focused hits.

In a world of increasingly interconnected and interdependent systems and networks, we must learn to build these structures in new ways, that not only focus on efficiency, but also on robustness, recovery and the "ability to bounce back" as Andrew Zolli says.

Where would you add new links to make the network above more resilient?  Remember, the nodes are not people, but transfer points for whatever is flowing through this infrastructure (gas, oil, electricity, water, etc). (Put your answers in the Comments)


Anonymous said...

Another great article Valdis. To put my 2 cents on your questions in the article, and from a quick visual inspection of the graph, taking out nodes 044 and 047 would do the most damage to the graph.

If I was to add in another pathway to make the whole system much more resilient I would connect up nodes 033, 040,069 and 061 to make another loop for the entire system (and also short circuit quite a number of routes from bottom left of the graph to the bottom right of graph, possibly making the whole system more efficient at the same time.) This would also give quite a few more different routes through the system on the north-south access so to speak.
Maybe in a future article you could give some statistics on how this new route changes the dynamics of the whole graph connectivity wise and resiliency wise.

Jack Vinson said...

Hey Valdis! There are at least two pairs of breaking points: 044 + 047 and 004 + 032. We don't know enough about the people behind the nodes to say which would be "better" breaking points. Even breaking one node could create minor chaos by making the paths much longer.

Where to add connections? Look for places to shorten the paths - and to the point of the post, make some redundant paths. Far left to far right. That glob of people on the bottom right could be connected into the rest of the network better. ...

Anonymous said...

37 and 4?

Vera Monteiro said...

Hi Valdis,
and how does density relates to resilience, on a network infrastructure?

Vera Monteiro said...

Hi Valdis,
Thanks for sharing this great post. My question is how does resilience relates to density, on a network infrastructure?
Is it more density always synonymous of more resilience?

Valdis Krebs said...


Good question! More Density does NOT mean more resilience, after a certain point. And the density can not be random.

The key is lowering the network horizon and increasing the number of alternate paths... to a point. So initially more density is usually better, but after a point (no magical number here!) more density does not add value and starts to take away value and increase burden.


kevindoylejones said...

It's really wave management; it's a fluid and dynamic space, and needs to be managed topographically as what it is.