Oct 22, 2014

Data Visualization: Seeing Hidden Relationships in Tables


Corporations and governments have lots of data in spreadsheets and data bases.  That is obvious.  What is not so obvious is that those data sets often hold hidden and non-obvious patterns and relationships!

Figure 1 shows a table of data from a recent Pew Research study on Trusted US News Sources.  The data is displayed in a format that people are used to viewing corporate and government data -- rows and columns.  Rows display the media source and the columns show whether liberal/mixed/conservative people trust the media source.



Figure 1 Data Table


Figure 1 is a great view of the data if you want to examine the details for each media source.  Figure 2 below is a network view of the same data.  We extracted the data from the table in Figure 1 and found some interesting hidden patterns in the data using Prolog. The patterns were then visualized in Figure 2 below using our network analysis software.  


Figure 2 - Network Patterns Hidden in Tabular Data 

Instead of the colors used in the Pew research table in Figure 1, we used the common US political party colors of red (conservative/Republican) and blue (liberal/Democratic).  We adjusted the shadings based on the conservative/liberal trust ratings in Figure 1 -- i.e. purple, a mix of red and blue, is a source trusted by both sides.  Node size was adjusted by the total trust exhibited across the spectrum for each media source.  Two media nodes are connected if they were similarly ranked by the Pew survey respondents.  

Whereas Figure 1 is good for examining details, Figure 2 immediately shows the larger, and hidden, patterns in the data.  We immediately see that the conservative media sources are far fewer than the ones tinged various shades of blue and purple.  We also see that the red cluster is isolated form the rest of the media sources.  More isolated yet, is Buzz Feed(tiny dot), which was ranked the lowest in trust, and did not have enough similarity in rankings to be connected to any other media source.  Communities of very similarly ranked news sources are all visually clustered in Figure 2 -- those sources in an obvious cluster are basically substitutable for each other.  Finally, we notice the largest nodes (those that received the highest trust rankings) are all purple -- The Economist, Wall Street Journal, BBC, and Google News.

The news sources trusted by the middle and liberal portions of the US political spectrum join together to form the large network component -- a continuum of more conservative to extremely liberal (looking left to right).

Both diagrams together provides a view of the forest and the trees -- much better together than apart!

Which spreadsheets and data bases in your organization might hold some hidden insights?


No comments:

Post a Comment