Jul 27, 2013

Dancing the Bunny Hop with the NSA


According to an NSA executive, 2-3 hops/steps is the social distance that the NSA uses to look outward, into social space, from a known terrorist suspect.  When they find a suspect's phone number or email, they investigate the network neighborhood around it.  Why?  They are trying to determine if a suspect is part of a group -- does s/he have co-conspirators?


A simple 3-hop (or 3 step) chain is shown below in Figure 1 -- each green link is a hop or a step and shows contact between the two persons, or nodes, in the network.


Figure 1

Analyzing a specific person's immediate network is also known as contact chaining -- we are connected to many chains via family, friends, colleagues and contacts we communicate with.  Many of these chains intersect and overlap creating a network with multiple paths to most nodes in the network.

Before we do contact-chaining, or any other network analysis, we must first determine: what is a "contact"? Many studies of on-line behavior often set the bar too low for what a "contact" is.  Sites like Facebook and Linkedin often contain way too many spurious ties -- people you have "approved" a link for, but you really do not know.  Facebook is famous for people having hundreds, if not thousands of "friends."  Facebook's own study of their user behavior shows that the average active friend circle (people you actually interact with, and maintain a relationship with with) is between 40-60 people -- a far cry from hundreds or thousands.

Consider a terror plot, as a project.  People have to communicate and work together to accomplish their project goals.  They need to organize the process, share information, meet deadlines, and adapt to changes and setbacks.  This requires regular communication and coordination. If this project activity is performed at a distance, it is trackable and mappable by electronic surveillance.  In the email and phone meta-data, the NSA is looking for a project team connected to a known suspect.

In the email and phone meta-data, the NSA is looking for a project team connected to a known suspect.

After the NSA executive let it slip that they were interested in all 2-3 step contacts around a suspect, many folks tried to estimate how many people would be affected by the multitudes of 3 step chains we are all a part of.  One estimate was 2.5 million people would be affected by each suspects 3 step network chains. While the estimator picked a good starting value for a typical American -- each person has about 40 unique and active friends -- he multiplied once too often ending up in the millions instead of the tens of thousands.

Soon, an even larger estimate appeared in the blogosphere -- 27 million people would be caught in the dragnet around each suspected terrorist!  This estimate started with a 300 person social circle around each individual -- which is twice the Dunbar number of 150!  This estimate also did not take into account the overlapping friendship networks we all have (many of our friends are also friends with each other).

Both of these estimates erred at who was in the center of the network.  The average American may have around 40 on-line contacts, and the social media guru may have 300,  but the domestic or international terrorist is trying to hide, not be discovered, on the Net.  Terrorists, and others behaving covertly, tend to have very small networks of people they trust -- no casual acquaintances to balloon network size!  From my experience of analyzing and mapping human networks for over 20 years, and from mapping the 9-11 hijackers and other covert and criminal networks, these two estimates seemed alarmingly high.

So, how many people could be "persons of interest" in a terrorist search?

Based on my post 9-11 experience, I looked for some social network data in my archives that might better illustrate what the 3 hop network neighborhood around a suspect might really look like.  I found data from a group that mixed both task and trust ties -- similar to what we may find in covert network --  a limited trust radius(trust only a few), yet with many tasks to accomplish.  The members of this network were not surveyed and asked to list who they viewed as colleagues and friends -- all data was gathered from their on-line activity -- it did not matter who they knew, it mattered who they actually contacted on-line.

Figure 2 shows the immediate network of a typical member of the group. The links show actual contact between two people/nodes.  The suspect, highlighted in the middle of the graph, has 13 observed contacts, many who also contact each other.  Terrorists, criminals, and others involved in covert activities keep their network small, for fear of discovery -- they keep only ties they can deeply trust.  Of course, living in the world, they have incidental ties, with local shopkeepers, neighbors, and delivery people.  But these incidental ties are usually face-to-face and do not show up via electronic surveillance.  A terrorist, whether domestic or international, will not readily share his/her phone number, email or other id with merchants and other locals.

Figure 2
All nodes are linked to the Suspect, who is in the center and highlighted in pink. Contacts, who also had observable interactions with each other, are also connected with a grey link.  This "ego network" -- showing 1 hop/step from the suspect -- is typical of many we see, with a clustering coefficient from 0.4 to 0.6 (your friends/colleagues are often friends/colleagues with each other).  

Group structures are hard to spot in 1 step networks, that is why we go out 2 and 3 steps in order to find any emergent groups amongst this collection of nodes.  At 2 hops/steps from the Suspect, we start to see some clustering of nodes.  Below are the interactions at 1 and 2 steps from the Suspect.
Figure 3

The magenta colored nodes are the same step 1 nodes seen in Figure 2.  The green nodes are two steps from the suspect.  We notice that some neighbors have more connections than others.  Again, the links only show the observed/recorded interactions.  The network begins to show some clustering.  What is interesting about the network at this point is where it starts to fold back into itself -- which 2 step contacts interact with each other and with various 1 step contacts?  The green nodes with more than one connection, especially to various magenta-colored nodes are probably more important than those who are just single spokes around a magenta hub.

Next we bring in the third hop, shown in Figure 4 by the blue nodes.
Figure 4

The network has grown much larger than the 1 hop network in Figure 1.  We have gone from 14 nodes to 185 nodes in three steps.  The suspect had 13 observed contacts in Figure 1.  Many would naively estimate the suspect's 3 step network to be 13 x 13 x 13 = 2,197.  But many friends/colleagues are also friends/colleagues with each other -- we have overlapping networks with those we are connected to.  Each node in the above networks represents one unique person.

Now that we have expanded the network out 3 steps, what do we do?  We shrink the network!  The NSA wants to find groups that the suspect may belong to, and find other key nodes in his/her network -- that is why they gather the 2-3 hop contact data.  Rather than investigate all 184 contacts of the suspect, we want to now reduce the network to its core, around the suspect.  The core network, of 47 nodes, is shown in Figure 5 below.  
Figure 5

We see that most of the 1 step nodes remain, with a good portion of the two-step nodes (green), but a very small percentage of the three-step nodes (blue).  The key nodes to focus on have been highlighted in yellow -- they are important to the structure and the flows in the network.  

Next, let's extract the clusters, and their overlaps, in the core.  It appears that there are 4 clusters with several of them overlapping via 4 nodes.  We erase the links and draw a Venn diagram in Figure 6 showing the four clusters and four nodes which act as linchpins holding the various clusters together.  
Figure 6

The four connecting nodes (linchpins) are probably the ones that will be investigated first, followed by the other nodes that were highlighted in yellow (see Figure 5).

So, we have gone from millions of nodes, to thousands of nodes, now to dozens of nodes affected by each terrorist suspect tracked.  According to an NSA slide released by Edward Snowden, the NSA currently has over 117,000 suspects.  With the previous 3 hop estimates estimates, 117,000 suspects would include most of the world population into the NSA dragnet(counting for overlaps).  With dozens of "persons of interest" we end up with about 1.5 million people within the sphere of analysis -- still, a lot of "false positives" (false alarms) to sort through.  1.5 million is a lot less than the 27 million estimate which was based on false assumptions about both 1) covert social circles, and 2) how human networks overlap.
Update July 2014: based on this Washington Post report on actual NSA data provided by Edward Snowden, even my low estimate of 1. 5 million was a little high -- based on that slice of NSA data Washington Post estimates around 1 million people in total are caught up in the network analysis of 90,000 suspects/targets.  I had used an earlier estimate of over 117,000 suspects.

The method I described is a logical approach for an experienced social network analyst.  It is probably not the method(s) used by the NSA.  Their methods may be similar, because they are looking for groups/clusters and trying to identify which nodes need closer scrutiny.

I applied a similar approach to mapping two initial suspects in the 9-11 attacks -- after the event.

2 comments:

  1. I'd never thought of this before, but it's intuitively obvious. If you're an underground bad guy you don't want to be exposed.

    The more "friends" you have the greater your risk of exposure/discovery.

    Duh!

    Great point.

    ReplyDelete
  2. Yes! Social media mavens want attention, they connect widely. Covert operatives want to limit the paths in their network -- just enough to get things done.

    ReplyDelete