A A A Organization

A few weeks ago I was talking to a potential client, CEO, about his concerns that his business was not keeping up with changes in his market-space.  I told him about our research into adaptive/agile/resilient organizations and how it could help his organization.  He said that sounded great and he understood that, but wondered how he could sell those concepts to both his Board and his leadership team.  He said, “After all, they do not have an MBA, like me, and do not read Harvard Business Review.  Your words are just consultant-speak to them.  You would lose them after the second sentence.”  Not the first time I had heard that. Been there, done that.  

My peer group uses and understands terms and concepts such as complex adaptive systems, emergence, self-organizing, resilience, adaption, crowdsourcing, enterprise social networks, and acronyms such as ESN, SNA, ONA, CRM, and SMO.  Yet, these are not for client conversations.  Potential clients want to hear consultants talk in language they understand use everyday. Now, instead of talking about Organization Adaptability Quotients,  I talk about the Triple A (AAA) Organization.  Everyone understands that AAA signifies the highest possible financial rating an investment can receive.  This financial metric has morphed into other business spheres and is commonly understood as adjective signifying the best of something.

A A A  is more than a rating, it also describes the components of a successful organization.

  • Awareness
  • Alternatives
  • Action

For an organization to be agile and adaptive, the people in it need to be aware of what is happening around them, have alternative pathways to gather information and knowledge, and must be allowed to act to meet/solve both local and global goals/problems.  They need to both work in their hierarchy and in a self-organizing network simultaneously!

A wide, radial band of awareness by each employee allows them to adapt to what others are doing.  This awareness is both within the company, and also extends outside to customers, suppliers, and the organization’s marketplace/ecosystem.  Employees know what others are capable of, who the experts are, and what goals and pressures others have.  The more people a person has within his/her sphere of awareness (a.k.a. network horizon) the better s/he will function — as will those connected to that person.
Employees need to simultaneously work in the hierarchy and in self-organizing networks!
Business process improvement taught us to get rid of all redundancies in the workplace.  Yet, collaboration. innovation and change happen best when there are some alternative/redundant pathways available to get things done and make-sense of what is happening. Paths in the organization consist of the prescribed network — the hierarchy, and of the emergent networks — self-organized connections formed by employees amongst themselves to gather information, knowledge, expertise and advice to accomplish their goals. The emergent networks in an organization provide alternate, and often more direct, paths from the need, to the source.  It is not enough that alternatives are available, people need to be know where they are, and what they provide.  Not only is it important to be well located in the flow of things, but it is important to know the flow around you.  

Awareness and alternatives are useless without the ability to take action on them.  Does the leadership of the organization allow and trust employees to self-organize around tasks and goals?  Can you seek advice from someone outside of my department or project team?  Can you connect two people that should know each other because they are working on similar goals and have complimentary skills/knowledge?  In other words, does management keep a tight, rigid hierarchy or allow for looser adaptive structures that change with needs?  
It important to be well located in the flow of information,  and to know the flow around you.  

How well do the three As — Awareness, Alternatives, and Action — function in your organization?  Are they tuned for maximum harmony?  Do your hierarchy and networks work together in a Wirearchy?  Have you measured the wiring in your organization?  Do you know how your organization compares to others?  Do your employees know what to do and how to do it?  Do you know which tune-up(s) can be performed to improve your organization’s performance in the changing market-space?

Add to Flipboard Magazine.


Many Merry Connections!

Happy Holidays from Orgnet, LLC!

This year, again, give the Gift of Connection... 
Introduce two people you know, that would benefit from knowing one another!

This year's holiday network is based on the ancient Latvian Puzurs
usually made from straw, for the winter holidays.

Network Puzurs artwork/design: Copyright © 2013 Silvija Krebs


Mapping Contagions with Social Network Analysis

A contagion passed by human contact, such as SARS or TB, spreads through 
human networks based on how infectious and susceptible each party is. 
Multiple contacts with infectious people plays a role in the probability of infection. 
Public health officials perform contact tracing to map the spread of the infection 
and manage its diffusion. 

The network below above was created at the epidemiology unit of 
The Centers for Disease Control [CDC] in the United States.  The network map 
shows the spread of an airborne infectious disease -- Tuberculosis. The map 
was created using actual contact tracing data from the community in which the 
outbreak was occuring.  
Black nodes are persons with clinical disease (and are potentially infectious), 
pink nodes represent exposed persons with incubating (or dormant) infection 
and are not infectious, green represent exposed persons with no infection and 
are not infectious. The grey nodes have been found to be members of the 
human network but have not yet been evaluated by medical personnel.

Unfortunately the 'social butterfly' in this community, the black node in the 
center of the network, is also the most infectious -- a super spreader.  
Current procedures focus on inoculating the vulnerable -- often the very young 
and the very old.  Network analysis reveals that it may be smarter, and more 
efficient, to focus on the spreaders -- those with many contacts to many 
diverse groups.

For more information on how social network analysis [SNA] assists health care 
professionals to manage and discover contagious disease outbreaks, see these
two papers co-authored with the CDC...
  1. "Transmission Network Analysis to Complement Routine Tuberculosis Contact Investigations"
    by McKenzie Andre, Kashef Ijaz, Jon D. Tillinghast, Valdis E. Krebs, Lois A. Diem,
    Beverly Metchock, Theresa Crisp, Peter D. McElroy [PDF]
  2. "Embracing collaboration: A novel strategy for reducing bloodstream infections
    in outpatient hemodialysis centers"
    by Curt Lindberg, Gemma Downham, Prucia Buscell, Erin Jones, Pamela Peterson,
    Valdis Krebs [PDF]

Add to Flipboard Magazine.


Making Sense of Emergent Patterns in Networks

One of the most used functions of social network analysis software is to discover and display clusters and communities in networks -- the dense sub-networks, where there are more links internally, than externally.

It is easy for the common person to spot dense clusters of connection in a network visualization.  Yet, this is a difficult problem for algorithms.  Early cluster discovery and community detection algorithms took the easy way -- they forced every node into one, and only one cluster, because the math was easier.  It was like the college physics course I took where all of the problems we did were in a vacuum and there was no friction to be accounted for.  This taught the basic principles, but did not carry over into real life.

Sociologists where not happy with the early community detection algorithms because they did not reflect how humans naturally cluster, connect and group themselves.  We are members of many clusters, and through that multiple/partial membership in many groups, we cause those clusters to overlap -- groups are not distinct with just unique members in each.  Group boundaries and porous and fuzzy.

Today there are dozens of community detection algorithms, many allow for overlap, and multiple cluster membership.  Community detection is still a hard problem.  Smart network scientists don't always agree on what is in each community, as we shall see.

Figure 1 is a diagram of simple network of 16 people modeled in InFlow software.  Symmetric (non-directional) connections are shown by the green links in the diagram.  This first network layout is just a circle with nodes in numerical order clockwise.

Figure 1

We first apply the simple community algorithm which puts everyone in a single group only -- often resulting in some funny groupings.  The algorithm finds us 4 unique groups/clusters, and at first glance produces a nice picture.

Figure 2

Upon further inspection we see nodes 1 and 4 have more connections outside of their assigned cluster than inside -- why were they forced into that group?  They look like good candidates for membership in multiple groups.

Next we allow for cluster/group overlap -- multiple memberships -- and we are surprised.  There is more than one answer!  In complex systems, such as human groups, communities and organizations, there is usually no one right answer, or one best way of doing things -- there are often several good answers. It might be impossible to choose the best answer ahead of time! The next set of diagrams (Figure 3-5) all show reasonable clusterings found in the data above. They all show 4 clusters, each cluster enclosed in a gold frame.  If a node shows up inside more than one frame, it is a member of more than one cluster.

Figure 3

Figure 4

Figure 5

Next we run another algorithm and find not 4 clusters, but 3 emergent communities. 

Figure 6

Yet another algorithm gives us just two overlapping emergent groups.

Figure 7

Which of those above do you like?  Which do you think best represents the natural groupings in this toy network?

My favorite patterns are next.  There is no rule that says all nodes have to be assigned to at least one cluster!  Some play a role of connector, or between many nodes in the shortest paths that connect them -- they have high betweenness -- maybe they are liaison between many groups without belonging to any one of them?  One of the settings on the cluster analysis algorithm in InFlow, assigned all nodes to a group except for node 4 -- s/he is a connector of groups, but not a member of groups.

Figure 8

Adjusting the cluster algorithm a little more now get two nodes -- 4 and 5 -- that are not members of any cluster.  They are the connectors in this emergent network.  My favorite rendering of this emergent network is in Figure 9 below. Once I decided that not all nodes had to be members of clusters, Figure 9 was arrived at in to quick iterations.

Figure 9

One of the properties of human relationships is that they are messy, inexact, and complex.  We should not expect to find one perfect way to group or cluster a network of human relationships.  If we do find such a perfect solution, maybe we have over-simplified the problem, like in Figure 1?

One thing we see in the various clusters above is that nodes 1, 4, and 5 are often the linch-pins that hold two or more clusters together (the clusters overlap around these nodes).  If we run various network centrality metrics on this network, we consistently find nodes 1, 4, and 5 at the top of the list, no matter which metric we choose -- 4 being at the very top, most of the time.

Finding logical and plausible clusters in complex systems is not a simple task -- there is no one simple answer.  This is not like accounting, where everything should add up correctly every time, and you do get one right answer. Finding clusters in networks is often about sense-making, what are the logical patterns we see and what might they tell us?  In our human relationships, we always want "neat and clean", but we always get "messy and fuzzy."  The right software will help you through the messy, and help you make sense of it -- it will not provide simple answers.

What patterns do you see?


Tracking Two Known Terrorists... Rather Than Everybody

Social Network Analysis [SNA] is a mathematical method for mapping and measuring human networks.  SNA helps us 'connect the dots' of complex human behavior.

Early in 2000, the CIA was informed of two terrorist suspects linked to al-Qaeda. Nawaf Alhazmi and Khalid Almihdhar were photographed attending a meeting of known terrorists in Malaysia. After the meeting they returned to Los Angeles, where they had already set up residence in late 1999.

What do you do with these suspects? Arrest or deport them immediately? No, we need to use them to discover more of the al-Qaeda network. Once suspects have been discovered, we can use their daily activities to uncloak their network. Just like they used our technology against us, we can use their planning process against them. Watch them, and listen to their conversations to see...
  1. who they call / email (i.e meta-data)
  2. who visits with them locally and in other cities
  3. where their money comes from
The structure of their extended network begins to emerge as data is discovered via surveillance. A suspect being monitored may have many contacts -- both accidental and intentional. We must always be wary of 'guilt by association'. Accidental contacts, like the mail delivery person, the grocery store clerk, and neighbor may not be viewed with investigative interest. Intentional contacts are like the late afternoon visitor, whose car license plate is traced back to a rental company at the airport, where we discover he arrived from Toronto (got to notify the Canadians) and his name matches a cell phone number (with a Buffalo, NY area code) that our suspect calls regularly. This intentional contact is added to our map and we start tracking his interactions -- where do they lead? As data comes in, a picture of the terrorist organization slowly comes into focus.

How do investigators know whether they are on to something big? Often they don't. Yet in this case there was another strong clue that Alhazmi and Almihdhar were up to no good -- the attack on the USS Cole in October of 2000. One of the chief suspects in the Cole bombing [Khallad] was also present [along with Alhazmi and Almihdhar] at the terrorist meeting in Malaysia in January 2000.

Figure 2 shows the two suspects and their immediate ties. All direct ties of these two hijackers are colored green, and link thickness indicates the strength of connection.

Once we have their direct links, the next step is to find their indirect ties -- the 'connections of their connections'. Discovering the nodes and links within two steps of the suspects usually starts to reveal much about their network. Key individuals in the local network begin to stand out. In viewing the network map in Figure 2, most of us will focus on Mohammed Atta because we now know his history. The investigator uncloaking this network would not be aware of Atta's eventual importance. At this point he is just another node to be investigated.
Figure 3 shows the direct connections of the original suspects as green links, and their indirect connections as grey links. We now have enough data for two key conclusions:

  1. All 19 hijackers were within 2 steps of the two original suspects uncovered in 2000!
  2. Social network metrics reveal Mohammed Atta emerging as the local leader

With hindsight, we have now mapped enough of the 9-11 conspiracy to stop it. Again, the investigators are never sure they have uncovered enough information while they are in the process of uncloaking the covert organization! They also have to contend with superfluous data. This data was gathered after the event, so the investigators knew exactly what to look for. Before an event, it is not so easy.

As the network structure emerges, a key dynamic that needs to be closely monitored is the activity within the network. Network activity spikes when a planned event approaches. Is there an increase of flow across known links? Are new links rapidly emerging between known nodes? Are money flows suddenly going in the opposite direction? When activity reaches a certain pattern and threshold, it is time to stop monitoring the network, and time to start removing nodes.

IMHO this bottom-up approach of uncloaking a network around known suspects is more effective than a top down search for terrorist needles in the public haystack -- and it is less invasive of the general population, resulting in far fewer "false positives".

In early 2002 I wrote an academic article describing how I mapped the network of the 19 hijackers using  public (open source) data.  Original post from Orgnet.com.
Add to Flipboard Magazine.


Vacuuming the Internet

As part of the NSA surveillance revelations, there have been accusations that many popular consumer internet companies such as Google, Apple and Facebook have allowed the NSA to "directly attach to their servers" and vacuum up all of the data going in and out of these servers.  The management of these companies have vehemently denied giving the NSA unfettered access to their customer's data. This CNET article has a good summary of what has happened so far on this particular aspect of the NSA surveillance.

Network thinkers know that to effectively monitor a network, you don't seek out the edge nodes, you find the central hubs and monitor them — through them you will have access to most of what is flowing through the net. In a hub-and-spoke system the spokes are all dependent on their local hub to route information/data/bits -- in and out.  In the complex networks like the Internet, hubs are connected to other hubs (but not all).  The pattern of connections amongst the hubs determines which hubs are more central to the overall flow of things throughout the network.

Security expert Bruce Schneier writes...
"The primary way the NSA eavesdrops on internet communications is in the network. That's where their capabilities best scale. They have invested in enormous programs to automatically collect and analyze network traffic. Anything that requires them to attack individual endpoint computers is significantly more costly and risky for them, and they will do those things carefully and sparingly."

Below is a network map of the Autonomous Systems [AS] that form the backbone of the internet.  It is easy to find the central hubs in this network.  Load the 20,000+ nodes [each AS is represented by a node] and 48,000+ links [a data flow between two ASes is represented by a link] into a social network analysis software program and have it run the Betweenness or Connector metric.  These two network metrics reveal how central any node is in keeping everything interconnected.  The hubs will be reveled by the network metrics.  In the diagram below the hubs are sized by their Connector score -- the higher the score, the larger the node, and the more network paths flow through this node.  The colors are randomly assigned and have no meaning.

Most of the large Internet hubs are located in North America. 

The largest hubs [AS] are mostly telecomm companies, internet infrastructure providers, and organizations of the US government.  Most of the large Internet hubs are located in North America.  You can get a pretty good picture of what is flowing through the whole internet by monitoring just a dozen or two of the largest hubs.  An example of how these main hubs can be tapped, and utilized, is told in the story of Room 641a of SBC Communications in San Francisco.

Whether the NSA has a direct tap into your favorite social network, or search engine, we may never know.  Maybe they don't need the direct connect to capture all of the information flowing on the Net?  

How will the rest of the world view their dependence on the internet, with the U.S.A owning and monitoring the key hubs (key intersections of information flow) in the Net?

Add to Flipboard Magazine.


Dancing the Bunny Hop with the NSA

One hop, two hops, three hops... forward!

According to an NSA executive, 2-3 hops/steps is the social distance that the NSA uses to look outward, into social space, from a known terrorist suspect.  When they find a suspect's phone number or email, they investigate the network neighborhood around it.  Why?  They are trying to determine if a suspect is part of a group -- does s/he have co-conspirators?

A simple 3-hop (or 3 step) chain is shown below in Figure 1 -- each green link is a hop or a step and shows contact between the two persons, or nodes, in the network.

Figure 1

Analyzing a specific person's immediate network is also known as contact chaining -- we are connected to many chains via family, friends, colleagues and contacts we communicate with.  Many of these chains intersect and overlap creating a network with multiple paths to most nodes in the network.

Before we do contact-chaining, or any other network analysis, we must first determine: what is a "contact"? Many studies of on-line behavior often set the bar too low for what a "contact" is.  Sites like Facebook and Linkedin often contain way too many spurious ties -- people you have "approved" a link for, but you really do not know.  Facebook is famous for people having hundreds, if not thousands of "friends."  Facebook's own study of their user behavior shows that the average active friend circle (people you actually interact with, and maintain a relationship with with) is between 40-60 people -- a far cry from hundreds or thousands.

Consider a terror plot, as a project.  People have to communicate and work together to accomplish their project goals.  They need to organize the process, share information, meet deadlines, and adapt to changes and setbacks.  This requires regular communication and coordination. If this project activity is performed at a distance, it is trackable and mappable by electronic surveillance.  In the email and phone meta-data, the NSA is looking for a project team connected to a known suspect.

In the email and phone meta-data, the NSA is looking for a project team connected to a known suspect.

After the NSA executive let it slip that they were interested in all 2-3 step contacts around a suspect, many folks tried to estimate how many people would be affected by the multitudes of 3 step chains we are all a part of.  One estimate was 2.5 million people would be affected by each suspects 3 step network chains. While the estimator picked a good starting value for a typical American -- each person has about 40 unique and active friends -- he multiplied once too often ending up in the millions instead of the tens of thousands.

Soon, an even larger estimate appeared in the blogosphere -- 27 million people would be caught in the dragnet around each suspected terrorist!  This estimate started with a 300 person social circle around each individual -- which is twice the Dunbar number of 150!  This estimate also did not take into account the overlapping friendship networks we all have (many of our friends are also friends with each other).

Both of these estimates erred at who was in the center of the network.  The average American may have around 40 on-line contacts, and the social media guru may have 300,  but the domestic or international terrorist is trying to hide, not be discovered, on the Net.  Terrorists, and others behaving covertly, tend to have very small networks of people they trust -- no casual acquaintances to balloon network size!  From my experience of analyzing and mapping human networks for over 20 years, and from mapping the 9-11 hijackers and other covert and criminal networks, these two estimates seemed alarmingly high.

So, how many people could be "persons of interest" in a terrorist search?

Based on my post 9-11 experience, I looked for some social network data in my archives that might better illustrate what the 3 hop network neighborhood around a suspect might really look like.  I found data from a group that mixed both task and trust ties -- similar to what we may find in covert network --  a limited trust radius(trust only a few), yet with many tasks to accomplish.  The members of this network were not surveyed and asked to list who they viewed as colleagues and friends -- all data was gathered from their on-line activity -- it did not matter who they knew, it mattered who they actually contacted on-line.

Figure 2 shows the immediate network of a typical member of the group. The links show actual contact between two people/nodes.  The suspect, highlighted in the middle of the graph, has 13 observed contacts, many who also contact each other.  Terrorists, criminals, and others involved in covert activities keep their network small, for fear of discovery -- they keep only ties they can deeply trust.  Of course, living in the world, they have incidental ties, with local shopkeepers, neighbors, and delivery people.  But these incidental ties are usually face-to-face and do not show up via electronic surveillance.  A terrorist, whether domestic or international, will not readily share his/her phone number, email or other id with merchants and other locals.

Figure 2

All nodes are linked to the Suspect, who is in the center and highlighted in pink. Contacts, who also had observable interactions with each other, are also connected with a grey link.  This "ego network" -- showing 1 hop/step from the suspect -- is typical of many we see, with a clustering coefficient from 0.4 to 0.6 (your friends/colleagues are often friends/colleagues with each other).  

Group structures are hard to spot in 1 step networks, that is why we go out 2 and 3 steps in order to find any emergent groups amongst this collection of nodes.  At 2 hops/steps from the Suspect, we start to see some clustering of nodes.  Below are the interactions at 1 and 2 steps from the Suspect.
Figure 3

The magenta colored nodes are the same step 1 nodes seen in Figure 2.  The green nodes are two steps from the suspect.  We notice that some neighbors have more connections than others.  Again, the links only show the observed/recorded interactions.  The network begins to show some clustering.  What is interesting about the network at this point is where it starts to fold back into itself -- which 2 step contacts interact with each other and with various 1 step contacts?  The green nodes with more than one connection, especially to various magenta-colored nodes are probably more important than those who are just single spokes around a magenta hub.

Next we bring in the third hop, shown in Figure 4 by the blue nodes.
Figure 4

The network has grown much larger than the 1 hop network in Figure 1.  We have gone from 14 nodes to 185 nodes in three steps.  The suspect had 13 observed contacts in Figure 1.  Many would naively estimate the suspect's 3 step network to be 13 x 13 x 13 = 2,197.  But many friends/colleagues are also friends/colleagues with each other -- we have overlapping networks with those we are connected to.  Each node in the above networks represents one unique person.

Now that we have expanded the network out 3 steps, what do we do?  We shrink the network!  The NSA wants to find groups that the suspect may belong to, and find other key nodes in his/her network -- that is why they gather the 2-3 hop contact data.  Rather than investigate all 184 contacts of the suspect, we want to now reduce the network to its core, around the suspect.  The core network, of 47 nodes, is shown in Figure 5 below.  
Figure 5

We see that most of the 1 step nodes remain, with a good portion of the two-step nodes (green), but a very small percentage of the three-step nodes (blue).  The key nodes to focus on have been highlighted in yellow -- they are important to the structure and the flows in the network.  

Next, let's extract the clusters, and their overlaps, in the core.  It appears that there are 4 clusters with several of them overlapping via 4 nodes.  We erase the links and draw a Venn diagram in Figure 6 showing the four clusters and four nodes which act as linchpins holding the various clusters together.  

Figure 6

The four connecting nodes (linchpins) are probably the ones that will be investigated first, followed by the other nodes that were highlighted in yellow (see Figure 5).

So, we have gone from millions of nodes, to thousands of nodes, now to dozens of nodes affected by each terrorist suspect tracked.  According to an NSA slide released by Edward Snowden, the NSA currently has over 117,000 suspects.  With the previous 3 hop estimates estimates, 117,000 suspects would include most of the world population into the NSA dragnet(counting for overlaps).  With dozens of "persons of interest" we end up with about 1.5 million people within the sphere of analysis -- still, a lot of "false positives" (false alarms) to sort through.

The method I described is a logical approach for an experienced social network analyst.  It is probably not the method(s) used by the NSA.  Their methods may be similar, because they are looking for groups/clusters and trying to identify which nodes need closer scrutiny.

I applied a similar approach to mapping two initial suspects in the 9-11 attacks -- after the event.

Add to Flipboard Magazine.


Contact Chaining

The latest "term of interest" in the NSA surveillance discussions is "contact chaining". It is not a new process developed by the NSA.  It is adapted process, long used by management consultants and social network analysts.

 A quick definition of contact chaining: a graph of the (human) network neighborhood around any specific individual -- it shows everyone who is one and two steps/hops away from the individual of interest. A contact chaining map also shows how everyone in the network neighborhood is connected to each other. Contact chaining can easily be performed using the call and email meta-data the NSA is collecting.

Below is a 2-step contact chaining network of the 9/11 hijackers it shows how quickly a network expands from just one or two suspects. The network was created using a contact chaining procedure from two initial Al Qaeda suspects known to be living in the USA and spotted attending Al Qaeda meetings abroad.  For more information, see this detailed analysis of how social network analysis can be used to track known suspects.

Business consultants also use "contact chaining"— we just call it something else: network neighborhood.  We use it in a similar way — to see who is near and interconnected around a specific individual.

Below is the organizational network of one of our business clients.  Two nodes(people) are connected by a grey line if they have a strong work link between them (they exchange key information, documents and data on a frequent basis).  The nodes are colored by the type of work they do.
As with many organizations, they have many employees (baby-boomers: born 1946-1960) who are retiring, or about to.  While performing an organizational network analysis for this client, we also investigated what affect the upcoming retirements would have on their organization.  An employee (#128) who is about to retire is highlighted and shown in the above map with the large black arrow.  He appears to be well connected — many strong work ties throughout the organization.

Below is the same organizational map, but this time after contact-chaining, showing the how many other employees will be affected by this employee's retirement.  The affected employees are all highlighted in yellow.
We see that the affect of this retirement will reach into many parts of the company.  To see exactly who will be influenced by this retirement, we hide the non-affected nodes, to reveal the two-step network neighborhood around the retiring employee.  Since this is actual data from a real company, we have hidden the employees names and using non-associated numbers.

Who is about to retiree in, or leave, your organization?  What ripple effect will they have when they are gone?  What (work) chains will they break? Whose job will be affected by the vacancy?  Do you have an effective replacement?  These are all key questions that Management and Human Resources must have answers to, and a plan for, as many people prepare to leave their current employers.

Add to Flipboard Magazine.


Connecting the Dots

It appears that the US Government, via their National Security Agency (NSA), has collected a lot of data on who calls/emails whom both nationally and internationally (meta data is data about the communication: source, destination, time, place, duration).  The NSA's Prism program is truly big data.  But is it enough data?  Is it the right data?  Is it the data the USA needs to stop both domestic and international terror attacks?

Assume they had the Prism data before September 11, 2001... would the NSA be able to map out the Al Qaeda(AQ) network below?  The two red nodes are AQ operatives who were known to be living in Los Angeles in 1999, the blue nodes came to the USA sometime in 2000 or 2001, the green nodes are foreign operatives supporting the 9-11 attacks.  A link shows who interacts with whom, via regular phone or email contact.  Here is a more detailed analysis of this terror network.

This data was collected after the 9-11 attacks, and is a reasonable depiction of the AQ network in the USA before September 11, 2001.  Would this network map have stopped the 9-11 attacks?  

Good question... for unknown, unpredictable events how do you know you have the right data or enough data or too much data?  Maybe your data is good for the last attack, but what about the next one?  Will it be the same, or different?  The map does not reveal the mission/timetable (if any) of the clustered nodes.

And then there is the problem of false positives.  We often hear the phrase: "I have done nothing wrong, I have nothing to worry about!" Yet... What if one of the red nodes was a co-worker of yours?  What if one of the red nodes participated in pick-up basketball games with you and your brother?  What if your kids play with kids of a supected drug runner?  What if your sister dates the brother of a suspected domestic terrorist?  What if your phone/email records link up with those of suspected or known bad guys?  What if you are linked with the wrong person at the wrong time?  Life can get very miserable as a person of interest.

We are trying to solve a societal problem by throwing technology at the problem.  We seem to do that with many problems these days.  Yet, technology can help us make sense of complex dynamics... if mixed with the social sciences.  The map above would have been real useful during the early months of 2001.  The network layout algorithms in the software allows us to see emergent structures, including key nodes and clusters, in this human network.  Once we have the map, we can measure the network, to find which nodes/links keep the network together.

Would the intelligence community have taken this map seriously in 2001?  Would other agencies have ignored the map because "it was not invented here"?  Good data and good analysis is not often utilized correctly when it moves from one organization/context to another.  Would the ABC agency know what to do with the analysis from the XYZ agency?

Big data implies with more, you get certainty.  Instead of certainty you might get the opposite because "the more" might actually include noise or dirt.  Noisy/dirty data, with the appearance of certainty/accuracy is the worst case scenario.  Good analysis requires good data, but big data alone is not sufficient for complex analysis of events that have not yet happened.  We need to be careful what and whom gets caught in our nets of surveillance.

in June 2012 I was invited to do a TEDx talk in Riga, Latvia.  I chose to talk about the "tracks" we leave behind on the internet and via our computers and cell phones.  I talked about how this data could be mined and analyzed, and not always by those who had our best interests in mind.  Enjoy the video of the talk -- looks like I was 1 year early!

I don't believe mining the whole internet is the answer for our safety and security.  Making the haystack larger, does not help in finding the needles — which we must first identify!  Finding the network neighborhood around identified suspects is a proven method that should be continuously improved!

Add to Flipboard Magazine.


Big (Network) Data

The Excitement about "Big Data" is usually around the access to lots of data -- thousands, or millions, of records.  Below is a hairball diagram of lots of social data (nodes and links revealing a network of connections) from the WWW. Social data is relational/interdependent, not discrete/independent like most statistical data about individual people/objects.

This picture is not that interesting!  What we want is interesting and useful, not BIG.
When investigating social/relational data, it is usually not the forest that is useful, but the clusters of various trees, and their relationships, inside the ecosystem. We not only want to "see the forest for the trees", but also see the patterns/clusters of trees in the forest!

Big data often contains small clusters -- especially with social data.  Human networks usually contain dozens or hundreds of nodes -- we usually do not have time/energy for thousands or millions of friends/colleagues.  The goal is to find the significant clusters amongst all of the data.  When looking at at big social data it is important to set the bar correctly for what is a link of significance/importance. The first step in mining big social data is to eliminate the noise -- find the natural human groups in your sea of data.

Network components reveal much within interlinked data. As we zoom in, we can begin to answer some useful questions...
  • Who is here?  
  • How are they clustered? 
  • How are they connected? 
  • Who are the key connectors?  
  • Who is in the thick of things?
We put an MRI to our big data above.  We see various subsets of the above ecosystem.  These network maps show various slices/parts of the whole, and how they are connected. Notice the network components displayed below are all in the size range of dozens or hundreds of nodes. We now see patterns worth investigating. The networks below are all sub-sets of the hairball above.
and many more slices...

At 1000 meters, big data is not that interesting.  At 100 meters, we start to see interesting patterns/components. At 10 meters we can play with the patterns and really start to learn what is happening inside the social ecosystem.  In Big Data, the important numbers are not the millions, but the groups of dozens, and hundreds, that reveal meaning and give us insight.

What is happening in the "social forest" inside your ecosystem?

Add to Flipboard Magazine.


Arrows on Twitter

Twitter is a social network that network scientists refer to as an asymmetric network -- the links are directional, they are drawn with arrows.  Links between people on Twitter show direction of intent. The arrows are drawn from source to target.

Looking at a social graph from Twitter we can tell a lot by following the arrows...
  • who is aware of whom/what?
  • whom/what is getting attention?
  • who is involved in conversations on specific topics?
  • who is central, and who is peripheral to the discussions?
This past week I was invited to a Twitter Chat (a.k.a. Tweetchat) on the topic of Serendipity.  Two separate chat groups (#innochat and #ideachat) came together on a topic of overlapping interest.  Twitter chats last for 1 hour and use a pre-determined Twitter hash-tag to track all of the tweets in the on-line conversation. 

When we draw a network map (a.k.a. social graph) we see people as nodes and their connections/conversations as links.  In this case, the links have arrows showing who is referring to whom.

Let's first look at outgoing links -- who is linking out, to whom.  Links on Twitter can be of a broadcast nature (X announces something to all of her followers).  Links can also be directed at a specific target -- Y aims a message at a specific person, or Z re-tweets (RT) something X has posted.  Although an RT is a broadcast, it is also a message back to the originator of the tweet -- I am aware of what you tweeted, and I choose to pass it on to others (not necessarily an endorsement).  

The network map below shows participants from the Serendipity Tweetchat.  Two nodes are linked if the source node, RT'ed, MT'ed or @-messaged the target node more than once during the chat session.  The node colors show: blue - general chat participant, purple - chat facilitators, green - invited guest.  The Twitter ID of each participant is shown beneath their node.  The node size in this first map is determined by a network metric called Awareness -- it looks at all local, outgoing, direct and indirect, links surrounding a node.  The higher the awareness metric, the larger the node.  Larger nodes should be more aware of what is happening in the surrounding network, than the smaller nodes.

Next we look at the same map -- same nodes, same links -- but different node sizes.  This time the nodes are sized by incoming links.  The network metric used here is called Attention, it looks at all local, incoming, direct and indirect, links surrounding a node. It is good to have many incoming links, but it is even better to have incoming links from others who have many incoming links!  Those with a nice pattern of incoming links are what Malcolm Gladwell referred to as mavens in The Tipping Point.

Notice that some of the node sizes have changed drastically -- some with low Awareness, have high Attention and vice versa.  The metrics help reveal the roles people play on Twitter -- some engage many others, while some prefer/wait to be engaged (targeted).  The node of the lead chat organizer, @blogbrevity, is large for both network metrics -- a proper pattern for an effective facilitator!

Our third network graph shows node size based on both incoming and outgoing links.  The network metric, Integration, shows how "in the thick of things" a node is.  A Twitter node with a high Integration score is probably posting interesting tweets, noticing other peoples' tweets, getting retweeted  and participating in many conversations.

Notice how the both facilitators (purple nodes) have the largest nodes -- they were very active in moving this very rapid chat conversation forward.  They were interacting with the invited guest, with newcomers and regulars, all the while asking questions, and RT'ing key tweets of the ongoing conversation.  Those with a high integration metric can play the role of connectors, as described in Gladwell's Tipping Point.

Twitter is not just about person-to-person interactions, it is used to broadcast messages to large groups -- either followers or those tracking a hash-tag.  Many of the tweets in the chat, were aimed at no one in particular, they were broadcast to the whole group. This network map is different than the others, because it shows only broadcast messages to the whole group -- it does not show interactions between the participants.  The whole group of the chat participants are represented by the large red node in the center -- it is the hub.  The spokes around the hub represent various participants that shared more than one tweet with the whole gathering.  The thickness of the links indicate how many tweets each person/spoke sent to the group/hub.

This chat network formed, emerged, and disbanded all in the space of several hours.  Yet, it revealed the pattern of many long-term networks -- a core-periphery structure, mavens, connectors, and leaders.  Many of the participants in this chat, already knew each other on Twitter, especially through previous Twitter chat events.  This was really an old network reconvening -- with a few new members joining in.

The core members of a group are easy to spot, they are the ones with many arrows, all pointing to each other -- a sub-network where everyone seems to know, and interact with, everyone else.  The network map below shows the core of the network -- all nodes have at least 4 connections to everyone else and they all have incoming and outgoing arrows.  The core was so tightly packed that we removed the arrow heads so that the graph was easier to read.

Finally, we look at all of the data from this chat -- aggregate all of the arrows, and combine all of the maps above -- to find out which participants were most involved in this Twitter chat.  The list, sorted high to low, shows the fifteen (15) most connected people over the hour long chat on Serendipity.  


Next time you look at a map of a human network, look for the arrows.  Who are they going to?  Where are they coming from?  Where is a cluster of arrows, all pointing to each other?  Ask the analyst what the links mean?  What do the node colors/sizes mean?  Soon you will be able to make sense of the map and zero in on key clusters of activity, along with key connectors in getting things done.

To the connect the dots, follow the arrows!

Acknowledgements: One of the chat facilitators -- Andrew Marshall (@DrewCM) provided us the history of the chat from Tweetchat.com.  My friend, and colleague, Zee Spenser converted the PDF to CSV network mapping data.  Zee shares the data and his code on github.  

Add to Flipboard Magazine.