Student Book Notes: Network Science by Albert-L. Barabasi - A powerful new field

Attending the 2015 summer conferences organised by @cigtr I came accross a book authored by Albert-László Barabási titled "Network Science". Actually it was Mathematics Professor Regino Criado who hinted me the name of Barabasi.

The book opens minds and new knowledge fields using Mathematics. It is worth reading and studying! Actually all chapters and other resources can be found in the book's site. Thanks to the author for making it freely available under the Creative Commons licence.

I read all ten chapters. I highlighted some sentences from each of the chapters. I enumerate some of those highlighted points as if this post were a brief collection of notes on the book, hoping that more than one of my blog's readerr will decide to embark on reading the book after going through this introductory post. Network Science students could use this post as a quick (and very initial) cheat sheet.

Happy networking! 

Chapter X - Preface

Understanding networks today is required to understand today's world. This section describes how the text book is used in a Network Science class.

Chapter 0 - Personal Introduction

This chapter describes how the author got trapped by the beauty and the importance of networks. He already mentions contributions such as Bella Bollobas' on random graphs and the work of Erdos and Renyi. It talks also about the difference between social scientists and graph theorists.

Key introductory statements:

- "A simple model, realying on growth and preferential attachment could explain the power laws spotted on some networks".

- "Scale-free networks proved to be surprisingly resistant to failures but shockingly sensitive to attacks".


Chapter 1 - Intro

- "The interplay between network structure and dynamics affects the robustness of the system".
- In a complex system it is difficult to derive the collective behaviour from the knowledge of the system's components.
- "Most networks are organised by the same principles".
- "The most succesful companies of the 21st Century base their technology and business model on networks".
- Epidemic transmission is one real example of the applicability of this new maths-based science.

Chapter 2 - Graph Theory

- "Graph theory is the mathematical scaffold behind network science".
- A path goes through all nodes only once. "A path cannot exist on a graph that has more than two nodes with an odd number of links". 
- Network parameters: Number of nodes, number of links, directness or undirectness of links.
- The choice of nodes and links is important when describing a network.
- Node degree is the number of links to other nodes.
- Average degree in a network: An important variable to play with.
- In directed networks, we talk about incoming degree and outgoing degree.
- Total number of links is denoted by L.
- Average degree k= 2L/N being N total number of nodes.
- "Degree distribution provides the probability that a randomly selected node in the network has degree k".
- "The number of degree-k nodes can de obtained from the degree distribution as N(k)=Np(k)".
- "The adjancency matrix of an undirected network is symmetric".
- "For weighted networks the elements of the adjancency matrix carry the weight of the link".
- Metcalfe's law states that the value of a network is proportional to the square of the number of its nodes.
- Bipartite networks can be divided into two disjoints sets of nodes such that each link connect a node from a set to a node from the other set.
- A path length consists of the number of links it contains.
- In networks physical distance is replaced by path length.

Note: I will not use "" signs in this post anymore. All points are extracted from the mentioned book. Please consider the existence of "" signs i.e. literal or almost literal words coming from the reviewed book in all points. I also informed Albert-László Barabási about the publicacion of this post.

- Distance between nodes changes if the network is directed i.e. d(A,B) maybe is not equal to d(B,A).
- Connected and disconnected networks (disconnected if there is at least a pair of nodes with infinite distance).
- A bridge is any link that, if cut, disconnects the network.
- The clustering coefficient measures the network's local link density.
- The maximal distance in a network is the diameter. The breadth-first-search algorithm helps finding it.

Chapter 3 - Random networks

- A random network is a collection of N nodes where each node pair is connected with probability p.
- A cocktail party chitchat scenario is an example of a random network.
- The degree distribution of a random network has the form of a Poisson distribution.
- The random network model does not capture the degree distribution of real networks. Nodes in random networks have comparable degrees, forbidding hubs (highly connected nodes).
- We have a giant component if and only if each node has on average more than one link.
- Evolution of a random network in function of the average degree k: Subcritical, critical, supercritical and connected.
- The small world phenomenon implies that the distance between two randomly chosen nodes in a network is short.
- Most real networks are in the supercritical regime.
- Real networks have a much higher clustering coefficient than expected for a random network of similar N and L.
- Real networks are not random.
- The random network model is important in network science. Features of real networks not present in random networks may represent some signature of order.

Chapter 4 - The scale-free property

- Let's remember that in a random network there are no highly connected nodes (hubs).
- The existence of hubs (e.g. in the WWW) is a signature of a network organising principle called the scale-free property.
- The degree distribution of a scale-free network follows a power law, and not a Poisson distribution like in random networks
- A scale-free network has a large number of small degree nodes, larger than in a random network.
- In a Poisson distribution (random network), most of the nodes have the same amount of links (the size of the largest node grows logarithmically or slower with N, the number of nodes).
- In a power-law distribution (scale-free network) many nodes have only a few links and there are a few hubs with a large number of links (widely different degrees, spanning several orders of magnitude).
- The larger the network, the larger the degree of its biggest hub (it grows polynomially with the network size).
- Random networks have a scale: Nodes have comparable degrees and the average degree serves as the scale of a random network.
- The scale-free property is missing in those networks that limit the number of links that a node can have.
- Ultra-small world property: Distances in a scale-free network are smaller that in a equivalent random network.
- The bigger the hubs, the more effectively they shrink distances between nodes.
- Scale-free networks are ultra-small when the value of the degree exponent is between 2 and 3.
- The configuration model, the degree-preserving randomization and the hidden parameter model can generate networks with a pre-defined degree distribution.
 - Erdos-Renyi and Watts-Strogatz described exponentially bounded networks. They lack outliers. Most nodes have comparable degrees (e.g. the power grid and highway networks). In these networks, a random network model is a starting point.
- In fat-tailed distributions, a scale-free network offers a better approximation.

Chapter 5 - The Barabasi-Albert model

- In scale-free networks, nodes prefer to link with the most connected nodes (preferential attachment).
- Growth and preferential attachment are responsible, and simultaneoulsy needed, for the emergence of scale-free networks.
- Older nodes have an advantage to become hubs over the time.
- The Barabasi-Albert model generates a scale-free network with degree exponent =3.
- To date all known models and real systems that are scale-free have preferential attachment.

Chapter 6 - Evolving networks

- The Bianconi-Barabasi model can account for the fact that nodes with different internal characteristics acquire links at different rates.
- The growth rate of a node is determined by its fitness. This model allows us to calculate the dependence of the degree distribution on the fitness distribution.
- Fitness distribution is typically exponentially bounded. That means that fitness differences between different nodes are small. With time these differences are magnified resulting in a power law degree distribution.
- Bose-Einstein condensation: That means that the fittest node grabs a finite fraction of the links, turning into a super hub creating a hub and spoke topology (the rich-gets-richer process or winner takes all phenomenon) and losing the network its scale-free nature.
- In most networks, nodes can disappear.
- As long as it continues to grow, its scale-free nature can persist.

Chapter 7 - Degree correlation

- A way to go deeper into understanding network structures based on maths.
- In some networks, hubs tend to have ties to other hubs. That is an assortative network. In disassortative networks, hubs avoid each other.
- A network displays degree correlations if the number of links between the high and low-degree nodes is systematically different from what is expected by chance.
- There is a conflict between degree correlation and the scale-free property. Hubs should be linked among each other with more that one link. 
- Assortative mating reflects the tendency of individuals to date or marry individuals that are similar to them.

Chapter 8 - Network robustness

Once the fraction of removed nodes reaches a critical threshold in a random network, the network abruptly breaks into disconnected components. Percolation theory can be used to describe the transition in random or Erdos-Renyi networks i.e. networks with equal or comparable number of nodes.

Real networks show robustness against random failures. Scale-free networks show a greater degree of robustness against random failures. However, an attack that targets a hub can easily destroy a scale-free network. Depending on the network (the WWW, or a disease propagation), this can be bad or good news.

The failure propagation model and the branching model (plus the overload model and the sandpile model in the critical regime) captures the behaviour of cascading failures. All these models predict the existence of a critical state in which the avalanche sizes follow a power law.

A network that is robust to both random failures and attacks has a hub and many nodes with the same degree i.e a hub-and-spoke topology.

Chapter 9 - Communities

A community is a locally dense connected subgraph in a network. There are weak and stron communities depending on the internal and external number of links of the nodes.

The number of potential partitions in a network grow faster than exponentially with the network size.

The higher a node's degree, the smallest is its clustering coefficient.

Randomly wired networks lack an inherent community structure.

Modularity measures the quality of each partition. Modularity optimization offers a novel approach to community detection.

For a given network the partition with maximum modularity corresponds to the optimal community structure.

A node is rarely confined to a single community. However links tend to be community specific.

The development of the fastest and the most accurate community detection tool remains an active arms race.

The community size distribution is typically fat-tailed, indicating the existence of many small communities with a few large ones.

Community finding algorithms run behind many social networks to help discover potential friends, posts of interests and target advertising.

Chapter 10 - Spreading phenomena

 A super-spreader is an individual responsible for a disproportionate number of infections during an epidemic.

Network epidemics offer a model to explain the spread of infectious diseases.

The homogenous mixing hypothesis (also named fully mixed or mass action approximation) assumes that each individual has the same chance of coming into contact with an infected individual.

Different models capture the dynamics of an epidemic outbreak (Suceptible-Infected, Susceptible-Infected-Susceptible and the Susceptible-Infected-Recovered).

In a large scale-free network a virus can reach instantaneously most nodes and even viruses with small spreading rate can persist in the population.

The human sexual encounter network is a scale-free network.

Bursty interactions are observed in a number of contact processes of relevance for epidemic phenomena.

In assortative netoworks, high degree nodes tend to link with high degree nodes.
Equally, strong ties tend to be within communities while weak ties are between them.

Several network characteristics can affect the spread of a pathogen in a network (e.g. degree correlations, link weights or a bursty contact pattern).

In a scale-free network, random immunization does not erradicate a desease. Selective immunization targeting hubs help eradicate the disease.

The friendship paradox: On average the neighbours of a node have a higher degree than the node itself. So, let's immunize neighbours of randomly selected nodes.

Travel restrictions do not decrease the number of infected individuals. They only delay the outbreak, giving maybe time to expand local vaccinations.

We can use the effective distance (different from the physical distance) to determine the speed of a pathogen.

All in all, a recommendable reference for those willing to get introduced into the Network Science field.

I will be happy to extend this post with comments coming from readers of the "Network Science" book.


Let's network!