Mar 15, 2013

Big (Network) Data

The excitement about "Big Data" is usually around the access to lots of data -- thousands, or millions, of records.  Below is a hairball diagram of lots of social data (nodes and links revealing a network of connections) from the WWW.  Social data is not like most big data.   It is relational/interdependent, not discrete/independent like most big statistical data about individual people and objects.
Social data is not like most big data.  
This picture below is not that useful!  What we want is interesting and useful, not BIG.
When investigating social/relational data, it is usually not the forest that is useful, but the clusters of various trees, and their relationships, inside the ecosystem. We not only want to "see the forest for the trees", but also see the patterns/clusters of trees in the forest!

Big data often contains small clusters -- especially with social data.  Human networks usually contain dozens or hundreds of nodes -- we usually do not have time/energy for thousands or millions of friends/colleagues!  The goal is to find the significant clusters within all of the data.  When looking at at big social data it is important to set the bar correctly for what is a link of significance/importance. The first step in mining big social data is to eliminate the noise -- find the natural human groups in your sea of data.

Network components reveal much within interlinked data. As we zoom in, we can begin to answer some useful questions...
  • Who is here? 
  • What are they connected about? 
  • Which clusters formed from the Who and What? 
  • Who is in the thick of things?
  • Who connects the clusters?
We put an MRI to our big data hairball above.  We see various subsets of the above ecosystem.  These network maps show various slices/parts of the whole, and how they are connected. Notice the network components displayed below are all in the size range of dozens or hundreds of nodes. We now see patterns worth investigating. The networks below are all sub-sets of the hairball above.
...and many more sub-groups (i.e. MRI slices) from the big data hairball above.

At 10,000 meters, big data is not that interesting.  At 1000 meters, we start to see patterns/clumps. At 10 meters we can play with emergent clusters that have real meaning and we start to learn what is happening inside our social ecosystem.  In Big Data, the important numbers are not the millions, but the many sub-groups of dozens, and hundreds, that reveal meaning, and give us insight.

What is happening in the "social forest" inside your ecosystem?


Charles Cameron (hipbone) said...

That's a terrific post, Valdis, and in its own way both hilarious and saddening.

My own approach has been to focus on the "seven plus or minus two" factor (aka "Miller's Law") which means I want to keep my nodes within easy scanning distance of seven -- say, twelve nodes max, allowing the eye to slide from one end of a map to the other -- while making each one of them as rich as possible in qualitative data.

In my games, I do this by suggesting the use of quotes and anecdotes (along with the occasional statistic) as nodes, and enriching them further with mini essays on each topic, and stated explanations of the analogies and disjunctions between them, as represented by the graph’s edges.

I’m always fascinated by your work, Valdis – graph-based thinking seems to me to be the natural correlative of a networked world – and particularly appreciate this post for its focus on what we can grasp and work with, rather than what some vast machine can accomplish that may offer us little by way of insight in return.

Valdis Krebs said...


Yes, I like Miller's Law (7+/- 2) also! I use it to manage the complexity of network visualizations.

I find if we have 3 node shapes x 5 node colors x 2 node sizes x 3 link colors x 3 link thicknesses x 2 link directions on 1 network map ... we have plenty of variety to confuse people! We have to remember that we are trying to simplify with visualization (and make sense too!) and not trying to show everything at once. Intelligently filtered maps...


Blair Cook said...

"We not only want to "see the forest for the trees", but also see the patterns/clusters of trees in the forest! "

Excellent point! How do these clusters form? What brought them together? How do they come apart? You could lose your mind falling all the patterns so don't go so deep you can't get out again!

Valdis Krebs said...

Good questions, Blair!

Before you can answer them you need to be able to see the clusters... extract them from the hairball of data.

What are they composed of, where is the cutoff (link strength) where they grow wildly, and where do they fragment, and then disappear? This all gives insight into what, how and why they are there.