It is easy for the common person to spot dense clusters of connection in a network visualization. Yet, this is a difficult problem for algorithms. Early cluster discovery and community detection algorithms took the easy way -- they forced every node into one, and only one cluster, because the math was easier. It was like the college physics course I took where all of the problems we did were in a vacuum and there was no friction to be accounted for. This taught the basic principles, but did not carry over into real life.
Sociologists where not happy with the early community detection algorithms because they did not reflect how humans naturally cluster, connect and group themselves. We are members of many clusters, and through that multiple/partial membership in many groups, we cause those clusters to overlap -- groups are not distinct with just unique members in each. Group boundaries and porous and fuzzy.
Today there are dozens of community detection algorithms, many allow for overlap, and multiple cluster membership. Community detection is still a hard problem. Smart network scientists don't always agree on what is in each community, as we shall see.
Figure 1 is a diagram of simple network of 16 people modeled in InFlow software. Symmetric (non-directional) connections are shown by the green links in the diagram. This first network layout is just a circle with nodes in numerical order clockwise.
Next we allow for cluster/group overlap -- multiple memberships -- and we are surprised. There is more than one answer! In complex systems, such as human groups, communities and organizations, there is usually no one right answer, or one best way of doing things -- there are often several good answers. It might be impossible to choose the best answer ahead of time! The next set of diagrams (Figure 3-5) all show reasonable clusterings found in the data above. They all show 4 clusters, each cluster enclosed in a gold frame. If a node shows up inside more than one frame, it is a member of more than one cluster.
Next we run another algorithm and find not 4 clusters, but 3 emergent communities.
Yet another algorithm gives us just two overlapping emergent groups.
Which of those above do you like? Which do you think best represents the natural groupings in this toy network?
My favorite patterns are next. There is no rule that says all nodes have to be assigned to at least one cluster! Some play a role of connector, or between many nodes in the shortest paths that connect them -- they have high betweenness -- maybe they are liaison between many groups without belonging to any one of them? One of the settings on the cluster analysis algorithm in InFlow, assigned all nodes to a group except for node 4 -- s/he is a connector of groups, but not a member of groups.
Adjusting the cluster algorithm a little more now get two nodes -- 4 and 5 -- that are not members of any cluster. They are the connectors in this emergent network. My favorite rendering of this emergent network is in Figure 9 below.
One of the properties of human relationships is that they are messy, inexact, and complex. We should not expect to find one perfect way to group or cluster a network of human relationships. If we do find such a perfect solution, maybe we have over-simplified the problem, like in Figure 1?
One thing we see in the various clusters above is that nodes 1, 4, and 5 are often the linch-pins that hold two or more clusters together (the clusters overlap around these nodes). If we run various network centrality metrics on this network, we consistently find nodes 1, 4, and 5 at the top of the list, no matter which metric we choose -- 4 being at the very top, most of the time.
Finding logical and plausible clusters in complex systems is not a simple task -- there is no one simple answer. This is not like accounting, where everything should add up correctly every time, and you do get one right answer. Finding clusters in networks is often about sense-making, what are the logical patterns we see and what might they tell us? In our human relationships, we always want "neat and clean", but we always get "messy and fuzzy." The right software will help you through the messy, and help you make sense of it -- it will not provide simple answers.
What patterns do you see?