Breadth First Search
Starting from an initial node, a breadth-first search [BFS] calculates the distances (number of hops) to every other reachable node in the network. The result is assigned to all the other nodes as a new property.
In the following network we plot the initial node (red) and color the other nodes according to its BFS value. We can observe that, for our choice of initial node, most of the network is two or three hops away (orange and blue nodes) and the further node is 5 hops away (purple nodes).
The clustering coefficient is a measurement of the interconnection level of the neighbors of a node. It is a kind of all my friends know each other coefficient. A high clustering coefficient indicates that the node is part of a well-interconnected group (cluster), whereas if it is close to zero means that its neighbors do not connect to each other (star structure). Nodes with high clustering coefficient are, in some sense, expendable because they do not really affect the connectivity of their surroundings.
The following figure shows the clustering coefficient for a small network. Red nodes have high clustering coefficient and form a well-connected region of the network. On the other hand, blue nodes have zero clustering coefficient, so their neighbors do not know each other. Nodes with small clustering coefficient and high degree are key elements because they act as hubs on a network.
Divides the network into connected components, i.e. topologically connected subnetworks. In directed networks, we distinguish between weak and strong connectivity. Weak connectivity reduces the directed network to undirected and strong connectivity strictly follows the link direction.
The following social network (N = 30,000 L=120,000) is colored according to its components, so we can identify independent parts of the network. In this case, there are two main components (red and blue), and half a dozen smaller parts.
The degree of a node is its number of neighbors. It is one of the simplest metrics in a network, yet it always provides useful information. Hubs in a network are nodes with a high degree. We distinguish between InDegree and OutDegree for directed networks, and link weights can be considered in the computation. In the latter case, the degree is not necessarily an integer number.
The following picture shows the degree of the nodes of the well known social network Zachary’s Karate Club. Node size and color warmth represent the degree of each node. We can clearly identify the two most popular nodes in this network.
Node betweenness is a number in the [0, 1] interval equal to the fraction of all the shortest paths in the network that pass through that node. A value close to 1 indicates that the node has a key role in the transmission of information across the network. Nodes with high betweenness, if removed, reduce noticeably the rate of spreading phenomena. A link property considered as distance and parallel edges can be taken into account in the shortest paths calculation.
Similarly, we define the link betweenness as the number of shortest paths that include that link, ranking the links according to their ability to transmit information. Since their computation is very similar, node and link betweenness are calculated together.
The following plot shows the node betweenness for a social network. It identifies (surrounded by a black dashed line) ego nodes that are in the center of a community, and bridge nodes (surrounded by a pink line) that act as bridges between different groups of people. Those are the key nodes in terms of gossip spreading and ability to introduce unknown people.
The next picture shows the same network with the links colored according to their betweenness values. Blueindicates low values (the majority of them), while red highlights the most important links according to betweenness. We may interpret them as the most important channels for information transmission.
Measures the mean distance from a node to all remaining nodes. It is a non-normalized quantity. A high value of closeness centrality indicates that the node is a central node of the network, i.e. it can be reached easily from most of the other nodes. Parallel edges are allowed and may be used in the geodesic distance calculation. A link property, considered as distance can be taken into account in the calculation of the distances.
The next plot shows closeness centrality for a little network, where small values are colored in blue and large values in yellow/red. We can easily identify the most central node, which would be the best-connected person in the network. For example, marketing strategies that start on him will spread very fast compared with other starting nodes.
Finds a network partition that maximizes the modularity using an agglomerative procedure. A community is a set of nodes with more edges inside than connections with other communities. Since the problem of modularity maximization is NP-complete, the Louvain method is not guaranteed to obtain the absolute maximum, but usually gives a good approximation to the optimal value. this method is one of the most advanced algorithms available on community detection, both for its performance and the quality of its results.
The following three pictures represent three different partitions in communities of a social network. They correspond to three resolution parameters: 0.2, 1.0 and 3.0 from left to right, reducing the total number of communities.
Similarly, with the standard resolution parameter we obtain 2 communities in the Zachary’s Karate Club network, but if we reduce it to t=0.5 we obtain three different clusters.
The Eigenvector Centrality is a centrality measure that takes into account the importance (both in weighted or unweighted networks) of the selected node and also the importance of his neighbors in the network. Well connected nodes that also connect to well-connected nodes have higher Eigenvector Centrality. It is defined as the components of the eigenvector of the adjacency matrix with the largest eigenvalue. We normalize the Eigenvector components to a maximum value of 1. This centrality measure tends to highlight a few important nodes, assigning a value close to zero to most of the other nodes.
BeGraph uses the well-known power method to calculate the Eigenvector. Indeed, it may present convergence problems if the network is directed because the adjacency matrix is no longer symmetric and lacks useful algebraic properties.
We illustrate this centrality in the following plots. In the left network, the Eigenvector Centrality points to the red node as the most central node in the network. Orange and green indicate medium values of this centrality, and blue/purple mean close to zero. In the right-hand side plot, we choose the color scheme to highlight a region of the network as the most important according to the Eigenvector Centrality.
It is a centrality measure similar to Eigenvector Centrality but solves the zero problem of that measure by assigning a small amount of centrality to each node. This amount is controlled by an external parameter called alpha. If it is very small, all nodes have the same centrality. As we increase alpha, nodes acquire centrality in a similar way as in the eigenvector centrality. Mathematically, the amount of centrality given to each node is with the largest eigenvalue of the adjacency matrix. Link weights may be considered. All Katz centrality values are normalized to a maximum value of 1.
In the case of a directed network, the Katz Centrality calculation may present numerical instabilities and give an error, in a similar way as the Eigenvector Centrality.
The next picture shows the Katz Centrality for a social network. We indicate the most important nodes according to the Katz Centrality.
Minimum Spanning Tree
The Minimum Spanning Tree (MST) of a network is the minimum (according to the sum of some user-defined link weights) tree that contains all the nodes in the network. This tree is not necessarily unique, i.e. a network can have several minima spanning trees. It can be used to eliminate redundant links and to perform network optimization. The network is considered undirected.
A new link property is created, with a value equal to 0 if the link does not belong to the MST. Each link in a MST in each connected component is identified by an integer number >0. The integer value also identifies the connected component the link belongs to, i.e., the MST in a component has the same value.
The next network presents two connected components. A MST in the smaller component is identified with value equal to 1, whereas in the big component the link property is equal to 2. Links not belonging to any MST are colored in blue.
It is a measure based on the likelihood that a random walker following links arrives to each node. It also contains the possibility of random access to a node via the damping factor in order to have an ergodic random walk. Page Rank is designed for directed networks, although it would work for undirected. It is used by Google in its search engine and it is named after Larry Page (Google CEO and co-founder). BeGraph normalizes the Page Rank to a maximum value of 1.
Here we color the nodes of a social network according to their Page Rank (red means high value).
Leading Eigenvector Communities
Detects communities using a divisive process based on the leading eigenvector of the modularity matrix. A community is a set of nodes with more edges inside than with other communities. We use the power method to calculate that eigenvector for each network division, maximizing the modularity of the network partition. As in many community detection algorithms, this method does not guarantee the optimal solution of the problem, only an approximate solution. Moreover, it has some clear limitations in particular networks.
The following two plots show the partition in communities of the Zachary’s Karate Club network using resolution parameters 1.0 and 0.5, giving two different partitions.
The eccentricity of a node is the greatest geodesic distance (shortest path) between the node and all the remaining nodes in the network. In BeGraph we calculate it as the greatest geodesic distance between the node and all its reachable nodes in the network. Geodesic distances may be calculated with a link weight that may represent a real distance or cost.
In the following figures we show two networks. On the left-hand side we plot a book characters network we color the nodes according to their eccentricity. The minimum value (3) corresponds to the green node, meaning that any other node is at most three hops away from it. Nodes colored in blue, pink, and red have a higher eccentricity (4, 5, and 6 respectively) and are not so well connected with the whole network.
The network on the right also shows this quantity on the color nodes, red means high eccentricity (equal to 10), and purple corresponds to the lowest (6).
This centrality measure combines pure topological features of the network (node betweenness) with a user-defined node property called Percolation State. The Percolation State defines a source of any kind, and the Percolation Centrality ranks all the nodes according to their ability to transfer or transport this source through the network. Obviously, the Percolation Centrality depends strongly on the chosen Percolation State of the nodes, which must be provided by the user.
An example of usage would be the study of fire spreading in a forest. Imagine that trees are nodes, and two trees are linked when the fire can jump from one to another. Given that some trees are initially burning (their percolation state would be 1.0, non-burning-trees would have 0.0), the percolation centrality of any tree would be its ability to spread the fire through the forest. So, if you are a fireman, you should focus on trees with high Percolation Centrality. This centrality can also be used in the field of rumor or information spreading in social networks.
The following example shows a social network with the initial Percolation State (left) and the Percolation Centrality (right). As usual, warm colors indicate large values. We observe that the Percolation State is focused on one cluster and that the Percolation Centrality highlights nodes that are bridges between groups (high betweenness) and also emphasizes nodes that are close to the source. Some small nodes that are marked with a red line are nodes with medium-high centrality that are not easy to find by visual inspection.
Minimum Cut Set
The Minimum Cut Set algorithm finds a partition of the network into two disconnected parts such as the sum of the weights of the cut is minimum. If no weight is provided, a default value of 1 is assumed and the algorithm searches the minimum number of links whose removal disconnects the graph. The network must be undirected and connected with no parallel edges. The Minimum Cut Set of a network may not be unique. Since the algorithm starts from a single node, varying this initial node can lead to different cut sets.
This metric creates a new node property to identify the two parts of the partition. It also creates a link property that identifies the links that belong to the Minimum Cut Set (the links that would have to be removed).
The next plot shows the minimum cut set (red nodes) and the links that should be removed to disconnect the network (red links). The network has around 1,500 nodes and 2,100 links.
The Hyperlink-Induced Topic Search (HITS) is a link centrality algorithm designed to rate Web pages, but it is applicable to any directed network. Ranks the nodes according to their authority and hub roles in the network.
- Hubs: nodes with large in-degree that collect information from authorities (compile information).
- Authorities: nodes with large out-degree, which produce and send information to the hubs (create information).
The network must be directed. Parallel edges are, in principle, allowed and added up. Self-links are meaningless and should not be present.
The following picture shows the hub and authority values for a susbset of the internet network, where nodes are webpages and a directed link indicates a hyperlink between two webpages.