T N T : The Network Thinkers: Back of the Envelope Twitter Metrics

The Pareto principle — 80% of the effects/outputs come from 20% of the causes/inputs — works in a variety of situations. The 80:20 rule is useful because it allows us to make decisions based on rough data and a simple calculation. If we could get 80% of the right answers from just 20% of the inputs we would have a great Back-of-the-Envelope [BoE] metric!

Twitter has been swamped with many new metrics in the last year, most of them trying to figure out who the influential tweeters are. I wonder…

Are these scores significantly better than simple BoE metrics?

How accurate are these scores in predicting actual behavior?

How much sociological rigor do they contain? The math is easy, the sociology is hard.

Let's examine a progression of BoE twitter metrics.

The first quick influence metric was "number of followers." The thinking went, the more followers you have the more influential you are. On the surface this appears too simplistic and it is easily gamed and distorted. For a while many early adopters on Twitter where in a mad rush to get more followers — quantity over quality. Some vendors still make money by selling "followers" to fools. But most people on Twitter understand that having random people, who do not know you, follow you, does neither party any good. Followers are a much better estimate of popularity rather than interpersonal influence — Hollywood stars have many followers.

The next BoE Twitter influence measure that gathered some interest was the ratio of Followers to Following. This measure had some background in the various prestige measures in social network analysis. A person in a network is prestigious if they are sought out by many others, while at the same time they do not seek out many others themselves. We divide the number of Followers by the number of Following. Let's call this the FFR ratio. The higher the FFR number, the higher the prestige. A ratio [FFR number] greater than 5 is starting to show some real prestige.

This ratio is a step forward, but it can also be deceiving. Often when people first join Twitter they start by following many people -- those they recognize and those who are recommended by friends. Newbies on Twitter often have the FFR ratio reversed. They follow 5 people for everyone that follows them — until people get used to having the newbie on-line and start following them back. This ratio is a good indicator of not just real new users of Twitter, but can also indicate spambots. Spambots follow many -- often at random -- but are not followed back by any except those who automatically follow everyone who follows them -- a bad idea.

The FFR can be gamed, by keeping the Following count artificially low, though there are often good reasons for being selective in who you follow and keeping that group from getting too large. FFR also gets in the way of people who are very social on Twitter. Some people follow many, but not all, of their followers so that they can carry on direct conversations [DM] with them — only people that both follow each other[have a symmetric tie] can DM reach other. The Following number for this strategy can get high quickly, resulting in their FFR rarely exceeding 5, even though these very social folks may be influential.

Once Twitter Lists started, many believed that List membership would be what number of Followers never was -- a better indicator of influence, of those we listen to. Lists are much harder to create, and people that make them usually put some thought into who belongs in what category. Lists do make the transition from popularity to influence, but they do not eliminate the problem of pure popularity. We known popular tweeters still end up on more Lists than their less well known brethren. How do we eliminate/reduce this overbearing popularity factor?

We can not eliminate the popularity factor, but we can minimize it by using it as the divisor in a ratio — the greater the popularity, the less affect it has on your Listed score. Dividing the number of Lists one is on, by the number of followers gives us a pretty decent BoE that anyone can calculate -- all of the numbers are under each person's profile. Let's call this the Lists to Follower's Ratio[LFR].

LFR = Lists / Followers

LFR gives us a number less than one, I use the first four digits to the right of the decimal for the LFR score. Using my numbers from the graphic above we calculate an LFR of 1056.

Let's take a look at several people I know on Twitter. We will look at Clay Shirky [@cshirky], Venessa Miemis [@VenessaMiemis] and myself [@valdiskrebs]. Looking at Followers only, Clay wins easily, followed by Venessa, a distant second, and then me. Next we look at the FFR. Here again the number of Followers dominates the calculation and again Clay gets the highest score by a great margin. Clay is a popular author and blogger and is very popular on Twitter also. Many newcomers to Twitter probably automatically follow Clay. Several newbie Twitter apps recommend to follow the popular people on Twitter such as @scobelizer, @timoreilly, @cshirky @techcrunch and others. This just increases the popularity of already popular people. Maybe most newbies don't know why they are following these people, nor know who these people are? They just follow them.

Next we will look at List membership by itself. Here again popularity skews our number. Same results as before -- Clay is first with Venessa and Valdis a distant second and third.

I have talked about hidden and local influencers before. They do not have the reach of an Oprah in recommending books, but they have a smaller, focused, audience that seeks their advice and opinions on one or more topics. How do we find them? I think the LFR BoE metric gives a decent indicator of who is one of these hidden/focused influencers. Running the LFR metric [looks at Lists and reduces for Popularity] we get a totally different result! With popularity diminished, we get a different view into who is looked to advice, ideas, opinions and expertise. Now Venessa is number one with an LFR [as of August 21, 2010] of 1679 [first four numbers after the decimal]. Valdis is second with an LFR of 1056 and Clay is last, for the first time, with an LFR of 0614.

Clay is popular, but Venessa and Valdis are listed as topic experts/advisors/influencers beyond their generic popularity.

Both Venessa and Valdis are more focused on topics they address and they attract a more focused, yet much smaller, audience. People put them on many Lists according to their speciality. People get to see Venessa's and Valdis' opinions and ideas — their following grows more slowly. It is an active choice by their followers — they know why they chose to follow/list either of these two. The decision to follow an already popular person is often a passive choice... "everyone else is doing it, they must know something I don't" goes the thinking. An active decision to listen/follow someone is probably a much stronger basis for the transfer of ideas and influence.

The LFR is also a good metric in uncovering spammers or tweeters pushing an agenda — whether it is ads or only their own content. The difference between Followers and Lists will result in a very low LFR, with two or more leading zeroes.

A very quick way to gauge your LFR is to drop the last digit from your followers [i.e. 1234 followers becomes 123] and compare that Follower proxy your Listed count [i.e. 321]. Is the new Follower proxy LESS THAN your Listed count? If so, you are probably more influential, than popular. Anyone can do this calculation while staring at a person's Twitter Home page and deciding whether to follow, or not.

Another quick way is to calculate a "batting average" like in American baseball. Take two people, Rita and Ralph, they each appear on 200 lists. Rita has 1000 followers while Ralph has 4000. Rita's batting average is 200/1000 or .200, while Ralph's is 200/4000 or .050. Rita get's a "hit" more often than Ralph.

There is a new Twitter metric that looks promising — PeerIndex. I know the person behind it, Azeem Azhar. He has been thinking about social networks since the mid 1990s. He and I had a wonderful discussion on this issue of Release 1.0 [large PDF] that I wrote about social networks in, and between, organizations in 1996. I like the way that PeerIndex divides one's influence by topic. Look at the "key authorities" in social media on PeerIndex — they are not the usual suspects that other Twitter metrics seem to echo. Azeem understands that in addition to the technology, he has to get the sociology right!

BTW, almost all of those listed as highly influential by PeerIndex for the topic "social media" [as of 8/21/2010] also score high in the BoE metric LFR — 4/5 or 80%, just like the Pareto Principal predicted!

Update: Here is another good use of lists! Create a Twitter List of those who have listed you and check that regularly for serendipitous opportunities! Hat tip to social media guru @arturs in Latvia.

Update #2: Further thinking on LFR and a simplification of the scale -- Who gets Attention on Twitter? Here is a Twitter List of people with high LFR scores.

6 comments:

Ned KumarAugust 23, 2010 at 3:22 AM
Hi Valdis,
Great post and enjoyed the read. I totally agree with the biases, issues, and gaming of #followers and FFR as influence metrics.

I like your LFR metric and it is definitely the best Twitter metric I have seen. And you are absolutely right that it is a great way to separate out popularity factor.

However, not sure of it being an influence metric. Here are a couple of my main reasons:

* The scalability of the list number is of a different magnitude than the scalability of the followers. This can bias the read. As an example, @jowyang is a very well known, respected and influencial person in the social domain with #followers=67,810 and #list=5,837. His LFR of 861 is lower than some of the lesser known names I know.

* Lists do not capture the loyalty effect and so folks can appear 'less influencial' based on the LFR. One of the best examples here is @umairh - almost everyone will agree that he is an influential blogger. His #followers=63360 and #list=1618 with a very low LFR of 255. One of the reasons I think the LFR is low because many of his followers are devoted readers of his blog (even before the lists were started) and did not specifically listed him.

* I think the domain plays a role. For example, my hunch is that academics in general will get a higher LFR than non-academics for example. Haven't tested out this theory but just randomly checked a couple of folks. For e.g @timkastelle has a whopping LFR of 1731. Now I love Tim's work and know he is influential but again there are some other "influentials" who are way below this mark.

* Which bring me to my last point for now - my thought is that LFR comparison should be limited to a certain domain. The LFR profile of folks in analytics is different from those in academics which is again different from those in the creative fields etc. This in itself is not a negative - as influence, authority, power are all restricted by boundaries.

Thanks again for a great read. Enjoyed.

Regards,
Ned
Valdis KrebsAugust 23, 2010 at 2:25 PM
Great network thinking Ned!

Remember LFR is just a quick metric which you can easily estimate in your head. LFR is a concept that has been around for a while.

Yes, "influence" is probably going too far in describing LFR. Maybe "active attention" is a better indicator of what LFR may be... as opposed to the "passive attention" given to popularity.

Not surprised about Tim K. -- he is well regarded and paid-attention-to within the topic clusters of innovation and networks. Both Umair H and Jeremiah O have been slotted into the rock star/popular category on Twitter. Their followers have far exceeded their listers. How many people are passively following them because they feel they must?
Ned KumarAugust 23, 2010 at 4:23 PM
Valdis,
Thanks for the response - agree about the rock star status for some of these folks.

I have been playing around a bit and realized there has to be clear boundaries set on the profile. As an example, well known publications like @HarvardBiz has a lot of followers and a lot of lists. And then there are folks like @Dalailama and @JosephCarrabis who has 0 followers.

Anyway, there are multiple ways to skin the cat but I agree with your approach of using lists. Here is one score I came out with (can be divided by 10000 or something to make the numbers smaller):
(#List*#Tweeets*FFR)

The logic here is that each factor (followers, following, tweets, lists) mean something - popularity, power, activity, trust and so a combination should yield a better result for most.

Anyway, this was more for fun. Enjoyed the read and conversation.

Regards,
Ned
Valdis KrebsAugust 23, 2010 at 4:33 PM
Maybe a social science graduate student could write a paper on the correlation between various Twitter metrics and the BoE estimates???
dajaNovember 7, 2010 at 4:34 PM
Perhaps we should consider twitter in Gladwellian terms:

Those who follow many and are followed by a few are the mavens who collect knowledge:

Those who follow a few but are followed by many: these are the salesmen, the persuaders

Those who follow many and are followed by many: these are the connectors

All three types are important in the spread of information through the network (what Gladwell would call the social epidemic)
DaniMay 26, 2011 at 4:00 PM
Nice post!

Certainly, some of the services computing "influence" scores seem to be converging, probably because of using similar heuristics.

You have also pointed out an important question: i.e. that influence is a too broad concept to be measured with a single score.

For instance, influence could be seen as:

1. Get other people to accept your ideas and spread them (e.g. getting retweets in Twitter or Likes in Facebook);

2. Get people to consume your contents or the contents you promote (e.g. getting people to click in the URLs you publish);

3. Or get people to behave in a certain way in real world (e.g. buying a product, attending a concert or voting a given candidate).

In the sense of attention gathering I think that a recent work by Romero et al. is very, very interesting: http://arxiv.org/abs/1008.1253

Of course, I've missed a mention to TunkRank: http://tunkrank.com and http://thenoisychannel.com/2009/01/13/a-twitter-analog-to-pagerank/

Finally, two slide decks providing some extra comparison of Twitter rank scores (http://thenoisychannel.com/2009/01/13/a-twitter-analog-to-pagerank/) and a new --and better-- influcence score (http://www.slideshare.net/daniel.gayo/retibus-socialibus-et-legibus-momenti-on-social-networks-and-the-laws-of-influence-v2)

Cheers, Dani

Aug 21, 2010

Back of the Envelope Twitter Metrics

6 comments: