It is certain that modern Data Analytics is changing business strategies and how decisions are being made in big companies. Due to the exponential growth of information technologies, connectivity channels and data gathering, Big Data is a trending topic and every company is looking to get the most out of it. Wherever a connectivity pattern or relation exists in the data, network science offers methodologies and tools to gain better comprehension.
Networks are a type of data structure (called graph in Mathematics) composed by two distinct elements: a set of entities called nodes and a collection of binary relations between nodes called links. This generalist structure can have multiple realizations depending on the nature of the collected data. The next table shows a few examples of networks:
Additionally, networks appear in diverse fields of interest such as biotechnology, national security, computer science, semantic, ecology, industry, epidemiology and telecommunications. Thus, networks are ubiquitous in the 21st century.
The term Network is very wide and general. Depending on the nature of their nodes and links, networks can be classified in the following categories:
- Undirected versus directed networks: whether the directionality of the link has meaning or not. If the link means friendship, the directionality is irrelevant. On the other hand, the directionality is necessary if the link represents money flow, a like in a social network or a hyperlink between two web pages.
- Multipartite: when there are several groups of distinct nodes and there are no links between nodes in the same group. A network of buyers and products where links mean buyer A has bought product B falls into this category because there are no links between two buyers or between two products.
- Multigraph: when the network contains more than one category of links with different significance. For example, a social network can have three types of links that represent familiar, friendship or professional relationships.
Real life networks may have amazing mathematical properties and usually exhibit a very complex and irregular structure, as we can see in the picture on the right. Features like power-law distributions in their attributes, sparsity and small-world phenomena appear on a day-to-day basis. With sparsity we mean that the actual number of links in the network is much smaller than the maximum theoretical number, which is quadratic with the number of nodes. The small-world phenomena reflects the fact that the distance in real networks for any pair of nodes (number of hops) is much smaller than what one naively imagines.
Surprisingly, the major challenge in the early stages of real life network analysis is the network creation. In some cases this step is straightforward, such as with a social network where nodes are people and links denote friendship. But in many cases this relatinship is not clear at all. A dataset may not have explicit network structure and one must conceive which elements are the nodes and how they link to each other. Here the imagination and Mathematical background of the Data Scientist is decisive and is what actually makes the difference.
As final remarks, we can say that complex networks contain non-trivial information that structured tables lack. Both mathematical algorithms and visualization techniques play an important role in network analysis because a quick look at a spatial disposition of a network reveals clusters of interconnected nodes, bottlenecks, distances and hubs. And that latent information cannot be discovered with other tools than modern Network Science!