This picture is not that interesting! What we want is interesting and useful, not BIG.
When investigating social/relational data, it is usually not the forest that is useful, but the clusters of various trees, and their relationships, inside the ecosystem. We not only want to "see the forest for the trees", but also see the patterns/clusters of trees in the forest!
Big data often contains small clusters -- especially with social data. Human networks usually contain dozens or hundreds of nodes -- we usually do not have time/energy for thousands or millions of friends/colleagues. The goals is to find the significant clusters amongst all of the data. When looking at at big social data it is important to set the bar correctly for what is a link of significance/importance. The first step in mining big social data is to eliminate the noise -- find the natural human groups in your sea of data. Where are the islands of interesting patterns?
Network components reveal much within interlinked data. As we zoom in, we can begin to answer some useful questions...
- Who is here?
- How are they clustered?
- How are they connected?
- Who are the key connectors?
- Who is in the thick of things?
and many more slices...
At 1000 meters, big data is not that interesting. At 100 meters, we start to see interesting patterns/components. At 10 meters we can play with the patterns and really start to learn what is happening inside the social ecosystem. In Big Data, the important numbers are not the millions, but the groups of dozens, and hundreds, that reveal meaning and give us insight.
What is happening in the "social forest" inside your ecosystem?
What is happening in the "social forest" inside your ecosystem?
Tweet







4 comments:
That's a terrific post, Valdis, and in its own way both hilarious and saddening.
My own approach has been to focus on the "seven plus or minus two" factor (aka "Miller's Law") which means I want to keep my nodes within easy scanning distance of seven -- say, twelve nodes max, allowing the eye to slide from one end of a map to the other -- while making each one of them as rich as possible in qualitative data.
In my games, I do this by suggesting the use of quotes and anecdotes (along with the occasional statistic) as nodes, and enriching them further with mini essays on each topic, and stated explanations of the analogies and disjunctions between them, as represented by the graph’s edges.
I’m always fascinated by your work, Valdis – graph-based thinking seems to me to be the natural correlative of a networked world – and particularly appreciate this post for its focus on what we can grasp and work with, rather than what some vast machine can accomplish that may offer us little by way of insight in return.
Charles,
Yes, I like Miller's Law (7+/- 2) also! I use it to manage the complexity of network visualizations.
I find if we have 3 node shapes x 5 node colors x 2 node sizes x 3 link colors x 3 link thicknesses x 2 link directions on 1 network map ... we have plenty of variety to confuse people! We have to remember that we are trying to simplify with visualization (and make sense too!) and not trying to show everything at once. Intelligently filtered maps...
Valdis
"We not only want to "see the forest for the trees", but also see the patterns/clusters of trees in the forest! "
Excellent point! How do these clusters form? What brought them together? How do they come apart? You could lose your mind falling all the patterns so don't go so deep you can't get out again!
Good questions, Blair!
Before you can answer them you need to be able to see the clusters... extract them from the hairball of data.
What are they composed of, where is the cutoff (link strength) where they grow wildly, and where do they fragment, and then disappear? This all gives insight into what, how and why they are there.
Post a Comment