Elsevier

Social Networks

Volume 27, Issue 1, January 2005, Pages 39-54
Social Networks

A measure of betweenness centrality based on random walks

https://doi.org/10.1016/j.socnet.2004.11.009Get rights and content

Abstract

Betweenness is a measure of the centrality of a node in a network, and is normally calculated as the fraction of shortest paths between node pairs that pass through the node of interest. Betweenness is, in some sense, a measure of the influence a node has over the spread of information through the network. By counting only shortest paths, however, the conventional definition implicitly assumes that information spreads only along those shortest paths. Here, we propose a betweenness measure that relaxes this assumption, including contributions from essentially all paths between nodes, not just the shortest, although it still gives more weight to short paths. The measure is based on random walks, counting how often a node is traversed by a random walk between two other nodes. We show how our measure can be calculated using matrix methods, and give some examples of its application to particular networks.

Introduction

Over the years, network researchers have introduced a large number of centrality indices, measures of the varying importance of the vertices in a network according to one criterion or another Wasserman and Faust, 1994, Scott, 2000. These indices have proved of great value in the analysis and understanding of the roles played by actors in social networks, as well as by vertices in networks of other types, including citation networks, computer networks, and biological networks. Perhaps the simplest centrality measure is degree, which is the number of edges incident on a vertex in a network—the number of ties an actor has in social network parlance. Degree is a measure in some sense of the popularity of an actor. A more sophisticated centrality measure is closeness, which is the mean geodesic (i.e., shortest-path) distance between a vertex and all other vertices reachable from it.1 Closeness can be regarded as a measure of how long it will take information to spread from a given vertex to others in the network.

Another important class of centrality measures is the class of betweenness measures. Betweenness, as one might guess, is a measure of the extent to which a vertex lies on the paths between others. The simplest and most widely used betweenness measure is that of Freeman, 1977, Freeman, 1979, usually called simply betweenness. (Where necessary, to distinguish this measure from other betweenness measures considered in this paper, we will refer to it as shortest-path betweenness.) The betweenness of a vertex i is defined to be the fraction of shortest paths between pairs of vertices in a network that pass through i. If, as is frequently the case, there is more than one shortest path between a given pair of vertices, then each such path is given equal weight such that the weights sum to unity. To be precise, suppose that gi(st) is the number of geodesic paths from vertex s to vertex t that pass through i, and suppose that nst is the total number of geodesic paths from s to t. Then, the betweenness of vertex i isbi=s<tgi(st)/nst(1/2)n(n1),where n is the total number of vertices in the network.2 We may, or may not, according to taste, consider the end-points of a path to fall on that path; the choice makes only the difference of an additive constant in the values for bi. In this paper, we will generally include the end-points.

Betweenness centrality can be regarded as a measure of the extent to which an actor has control over information flowing between others. In a network in which flow is entirely or at least mostly along geodesic paths, the betweenness of a vertex measures how much flow will pass through that particular vertex. Betweenness can be calculated for all vertices in time O(mn) for a network with m edges and n vertices Newman, 2001, Brandes, 2001.

In most networks, however, information (or anything else) does not flow only along geodesic paths Stephenson and Zelen, 1989, Freeman et al., 1991. News or a rumor or a message or a fad does not know the ideal route to take to get from one place to another; more likely it wanders around more randomly, encountering who it will. And even in a case such as the famous small-world experiment of Milgram (1967) and Travers and Milgram (1969), or its modern-day equivalent (Dodds et al., 2003), in which participants are explicitly instructed to get a message to a target by the most direct route possible, there is no evidence that people are especially successful in this task. Thus, we would imagine that in most cases a realistic betweenness measure should include non-geodesic paths in addition to geodesic ones.

Furthermore, by giving all the weight to the geodesic paths, and none to any other paths, no matter how closely competitive they are, the shortest-path betweenness measure can produce some odd effects. Consider the network sketched in Fig. 1a, for instance, in which two large groups are bridged by connections among just a few of their members. Vertices A and B will certainly get high betweenness scores in this case, since all shortest paths between the two communities must pass through them. Vertex C, on the other hand, will get a low score, since none of those shortest paths pass through it, taking instead the direct route from A to B. It is plausible, however, that in many real-world situations C would have quite a significant role to play in information flows. Certainly, it is possible for information to flow between two individuals via a third mutual acquaintance, even when the two individuals in question are themselves well acquainted.

To address these problems, Freeman et al. (1991) suggested a more sophisticated betweenness measure, usually known as flow betweenness, that includes contributions from some non-geodesic paths. Flow betweenness is based on the idea of maximum flow. Imagine each edge in a network as a pipe that can carry a unit flow of some fluid. We can ask what the maximum possible flow then is between a given source vertex s and target vertex t through these pipes. In general, the answer is that more than a single unit of flow can be carried between source and target by making simultaneous use of several different paths through the network. The flow betweenness of a vertex i is defined as the amount of flow through vertex i when the maximum flow is transmitted from s to t, averaged over all s and t.3 Maximum flow from a given s to all reachable targets t can be calculated in worst-case time O(m2) using, for instance, the augmenting path algorithm (Ahuja et al., 1993), and hence the flow betweenness for all vertices can be calculated in time O(m2n).4

In practical terms, one can think of flow betweenness as measuring the betweenness of vertices in a network in which a maximal amount of information is continuously pumped between all sources and targets. Necessarily, that information still needs to “know” the ideal route (or one of the ideal routes) from each source to each target, in order to realize the maximum flow. Although the flow betweenness does take account of paths other than the shortest path (and indeed need not take account of the shortest path at all), this still seems an unrealistic definition for many practical situations: flow betweenness suffers from some of the same drawbacks as shortest-path betweenness, in that it is often the case that flow does not take any sort of ideal path from source to target, be it the shortest path, the maximum flow path, or another kind of ideal path.

Moreover, like the shortest-path measure, flow betweenness can give counterintuitive results in some cases. Consider, for example, the network sketched in Fig. 1b, which again has two large groups joined by a few contacts. In this case, the maximum flow from one group to the other is clearly limited to two units, one unit flowing through each of vertices A and B. Vertex C will, in this case, get a low betweenness score, even though the path through C may be as short or shorter than that through A or B. Once again, it is plausible that in practical situations C would actually play quite a significant role.

In this paper, therefore, we propose a new betweenness measure, which might be called random-walk betweenness. Roughly speaking, the random-walk betweenness of a vertex i is equal to the number of times that a random walk starting at s and ending at t passes through i along the way, averaged over all s and t. This measure is appropriate to a network in which information wanders about essentially at random until it finds its target, and it includes contributions from many paths that are not optimal in any sense, although shorter paths still tend to count for more than longer ones since it is unlikely that a random walk becomes very long without finding the target. In some sense, our random-walk betweenness and the shortest-path betweenness of Freeman (1977) are at opposite ends of a spectrum of possibilities, one end representing information that has no idea of where it is going and the other information that knows precisely where it is going. Some real-world situations may mimic these extremes while others, such as perhaps the small-world experiment, fall somewhere in between. In the latter case, it may be of use to compare the predictions of the two measures to see how and by how much they differ: if they differ little, then either is a reasonable metric by which to characterize the system; if they differ by a lot, then we may need to know more about the particular mode of information propagation in the network to make meaningful judgments about betweenness of vertices.

Our random-walk betweenness can, as we will show, be calculated for all vertices in a network in worst-case time O((m+n)n2) using matrix methods, making it comparable in its computational demands with flow betweenness.

Some other centrality measures based on random walks merit a mention in this context, although none of them are betweenness measures. Bonacich’s power centrality (Bonacich, 1987) can be derived in a number of ways, but one way of looking at it is in terms of random walks that have a fixed probability β of dying per step. The power centrality of vertex i is the expected number of times such a walk passes through i, averaged over all possible starting points for the walk. The random-walk centrality introduced recently by Noh and Rieger (2002) is a measure of the speed with which randomly walking messages reach a vertex from elsewhere in the network—a sort of random-walk version of closeness centrality. The information centrality of Stephenson and Zelen (1989) is another closeness measure, which bears some similarity to that of Noh and Rieger. In essence it measures the harmonic mean length of paths ending at a vertex i, which is smaller if i has many short paths connecting it to other vertices. Guimerà et al. (2002) consider search processes on networks and define generalized betweenness indices that measure the number of times a search passes through a vertex, although they do not explicitly consider the case of a random walk. Finally, in recent unpublished work, Borgatti has considered a spectrum of possible betweenness measures, including measures based on random walks, although he does not derive expressions for their evaluation (S. Borgatti, personal communication).

The outline of this paper is as follows. In Section 2, we define in detail our random-walk betweenness and show how it is calculated. We introduce the measure first using an analogy to the flow of electrical current in a circuit, and then show that this is equivalent also to the flow of a random walk. In Section 3, we give a number of examples of applications of our measure, first to networks artificially designed to pose a challenge for the calculation of betweenness, and then to various real-world social networks, including a collaboration network of scientists, a network of sexual contacts, and Pagdett’s network of intermarriages between prominent families in 15th century Florence. In Section 4, we give our conclusions.

Section snippets

Random-walk betweenness

In this section, we give the definition of our random-walk betweenness measure and derive matrix expressions that allow it to be calculated rapidly using a computer. For pedagogical purposes, we will take a slightly circuitous route in developing our ideas. We start by introducing a definition of our betweenness measure that does not use random walks but instead is based on current flow in electrical circuits. This definition is simple and intuitive and makes for easy calculations. Later, we

Examples and applications

In this section, we give a number of examples to illustrate the properties and use of our random-walk betweenness measure in the analysis of network data.

Conclusions

Betweenness is a measure of network centrality that counts the paths between vertex pairs on a network that pass through a given vertex. Vertices with high betweenness lie on paths between many others and may thus have some influence over the spread of information across the network. One can define a variety of different betweenness measures, depending on which paths one counts and how they are weighted. The most widely used measure, first proposed by Freeman (1977), counts only shortest paths,

Acknowledgments

The author thanks Steven Borgatti and Michelle Girvan for useful comments and conversations, and Richard Rothenberg and Stephen Muth for providing the data for the example of Fig. 5. This work was funded in part by the James S. McDonnell Foundation and by the National Science Foundation under grant number DMS–0234188.

Following the submission of this paper, the author learned of a recent conference presentation by Borgatti in which a random-walk-based betweenness measure similar in spirit,

References (22)

  • L.C. Freeman

    Centrality in social networks: conceptual clarification

    Social Networks

    (1979)
  • L.C. Freeman et al.

    Centrality in valued graphs: a measure of betweenness based on network flow

    Social Networks

    (1991)
  • K.A. Stephenson et al.

    Rethinking centrality: methods and examples

    Social Networks

    (1989)
  • R.K. Ahuja et al.

    Network Flows: Theory, Algorithms and Applications

    (1993)
  • P.F. Bonacich

    Power and centrality: a family of measures

    American Journal of Sociology

    (1987)
  • U. Brandes

    A faster algorithm for betweenness centrality

    Journal of Mathematical Sociology

    (2001)
  • P.S. Dodds et al.

    An experimental study of search in global social networks

    Science

    (2003)
  • L.C. Freeman

    A set of measures of centrality based upon betweenness

    Sociometry

    (1977)
  • M. Girvan et al.

    Community structure in social and biological networks

    Proceedings of the National Academy of Sciences of the United States of America

    (2002)
  • K.-I. Goh et al.

    Betweenness centrality correlation in social networks

    Physical Review E

    (2003)
  • R. Guimerà et al.

    Optimal network topologies for local search with congestion

    Physical Review Letters

    (2002)
  • Cited by (1849)

    View all citing articles on Scopus
    View full text