Jun 7, 2013

Connecting the Dots


It appears that the US Government, via their National Security Agency (NSA), has collected a lot of data on who calls/emails whom both nationally and internationally (meta data is data about the communication: source, destination, time, place, duration).  The NSA's Prism program is truly big data.  But is it enough data?  Is it the right data?  Is it the data the USA needs to stop both domestic and international terror attacks?

Assume they had the Prism data before September 11, 2001... would the NSA be able to map out the Al Qaeda(AQ) network below?  The two red nodes, shown below, are AQ operatives who were known to be living in Los Angeles in 1999, the blue nodes came to the USA sometime in 2000 or 2001, the green nodes are foreign operatives supporting the 9-11 attacks.  A link shows who interacts with whom, via regular contact.  Notice how many short communication paths flow through these two initial suspects.  Here is a more detailed analysis of this terror network.


This data was collected after the 9-11 attacks, and is a reasonable depiction of the AQ network in the USA before September 11, 2001.  Would this network map have stopped the 9-11 attacks?  

Good question... for unknown, unpredictable events how do you know you have the right data or enough data or too much data?  Maybe your data is good for the last attack, but what about the next one?  Will it be the same, or different?  The map does not reveal the mission/timetable (if any) of the clustered nodes.

And then there is the problem of false positives.  We often hear the phrase: "I have done nothing wrong, I have nothing to worry about!" Yet... What if one of the red nodes was a co-worker of yours?  What if one of the red nodes participated in pick-up basketball games with you and your brother?  What if your kids play with kids of a supected drug runner?  What if your sister dates the brother of a suspected domestic terrorist?  What if your phone/email records link up with those of suspected or known bad guys?  What if you are linked with the wrong person at the wrong time?  Life can get very miserable as a person of interest.

We are trying to solve a societal problem by throwing technology at the problem.  We seem to do that with many problems these days.  Yet, technology can help us make sense of complex dynamics... if mixed with the social sciences.  The map above would have been real useful during the early months of 2001.  The network layout algorithms in the software allows us to see emergent structures, including key nodes and clusters, in this human network.  Once we have the map, we can measure the network, to find which nodes/links keep the network together.

Would the intelligence community have taken this map seriously in 2001?  Would other agencies have ignored the map because "it was not invented here"?  Good data and good analysis is not often utilized correctly when it moves from one organization/context to another.  Would the ABC agency know what to do with the analysis from the XYZ agency?

Big data implies with more, you get certainty.  Instead of certainty you might get the opposite because "the more" might actually include noise, or dirty data.  Noisy/dirty data, with the appearance of certainty/accuracy is the worst case scenario -- garbage in, garbage out.  Good analysis requires good data, but big data alone is not sufficient for complex analysis of events that have not yet happened.  We need to be careful what and whom gets caught in our nets of surveillance.

in June 2012 I was invited to do a TEDx talk in Riga, Latvia.  I chose to talk about the "tracks" we leave behind on the internet and via our computers and cell phones.  I talked about how this data could be mined and analyzed, and not always by those who had our best interests in mind.  Enjoy the video of the TEDx talk .

I don't believe mining the whole internet is the answer for our safety and security.  Making the haystack larger, does not help in finding the needles — which we must first identify!  Finding the network neighborhood around identified suspects is a proven method that should be continuously improved!

No comments:

Post a Comment