Mar 15, 2013

Big (Social) Data


The excitement about "Big Data" is usually around the access to lots of data -- thousands, or millions, of records.  Below is a hairball diagram of lots of social data (nodes and links revealing a network of connections) from the WWW.  Social data is not like most big data.   It is relational and interdependent, not discrete and independent like most big statistical data about individual people or objects.
Social data is not like most big data.  
The picture below, of a hairball network is not that useful.  Seeing everything, shows us nothing!  What we want is interesting and useful, not BIG.

Big Data Hairball

We apply a software x-ray to our big data hairball above.  We can now see various subsets of the data mass -- shown below.  Notice the network components displayed below are all in the size range of dozens or hundreds of nodes. We now see networks & communities worth investigating.  We can now answer useful questions.


  • Who is here? 
  • What are their connections? 
  • Where are the emergent communities?
  • Who is in the thick of things?
  • Who connects the communities?

The networks below are all sub-nets of the hairball above. They are communities that emerged around common attributes, affinities, and associations.  These clusters have something in common.

Sub-network #1

Sub-network #2

Sub-network #3

Sub-network #4


When investigating social / relational data, it is usually not the forest that is useful, but the clusters of various trees, and their relationships, within the forest. We not only want to "see the forest for the trees", but also see the patterns/clusters/relationships of trees in the forest!

Big data often contains small clusters -- especially with social data.  Human networks usually contain dozens or hundreds of nodes -- we usually do not have time/energy for thousands or millions of friends or colleagues!  Facebook's own research showed that people who claim hundreds or thousands of friends, regularly interact with only 4-5 dozen of them. The goal is not to analyze the universe of data, but to to find the significant clusters within all of your data.  

At 10,000 meters, big data is not that interesting.  At 1000 meters, we start to see patterns/clumps. At 10 meters we can play with emergent clusters that have real meaning and we start to learn what is happening inside our social ecosystem.  In Big Data, the important numbers are not the millions, but the many sub-groups of dozens and hundreds that reveal meaning, and give us insight.

What is happening in the emergent communities inside your hairball networks?


4 comments:

  1. That's a terrific post, Valdis, and in its own way both hilarious and saddening.

    My own approach has been to focus on the "seven plus or minus two" factor (aka "Miller's Law") which means I want to keep my nodes within easy scanning distance of seven -- say, twelve nodes max, allowing the eye to slide from one end of a map to the other -- while making each one of them as rich as possible in qualitative data.

    In my games, I do this by suggesting the use of quotes and anecdotes (along with the occasional statistic) as nodes, and enriching them further with mini essays on each topic, and stated explanations of the analogies and disjunctions between them, as represented by the graph’s edges.

    I’m always fascinated by your work, Valdis – graph-based thinking seems to me to be the natural correlative of a networked world – and particularly appreciate this post for its focus on what we can grasp and work with, rather than what some vast machine can accomplish that may offer us little by way of insight in return.

    ReplyDelete
  2. Charles,

    Yes, I like Miller's Law (7+/- 2) also! I use it to manage the complexity of network visualizations.

    I find if we have 3 node shapes x 5 node colors x 2 node sizes x 3 link colors x 3 link thicknesses x 2 link directions on 1 network map ... we have plenty of variety to confuse people! We have to remember that we are trying to simplify with visualization (and make sense too!) and not trying to show everything at once. Intelligently filtered maps...

    Valdis

    ReplyDelete
  3. "We not only want to "see the forest for the trees", but also see the patterns/clusters of trees in the forest! "

    Excellent point! How do these clusters form? What brought them together? How do they come apart? You could lose your mind falling all the patterns so don't go so deep you can't get out again!

    ReplyDelete
  4. Good questions, Blair!

    Before you can answer them you need to be able to see the clusters... extract them from the hairball of data.

    What are they composed of, where is the cutoff (link strength) where they grow wildly, and where do they fragment, and then disappear? This all gives insight into what, how and why they are there.

    ReplyDelete