Network science

Graphs \(G = G(A, X)\) contain an adjacency definition and a feature matrix.

Components of complex systems

Complex systems are ones which

  1. have many interactions

  2. There exists no central point

Interactions

Interacting occur between nodes and with their environment

Emergence

Sometimes we see something happen that we dont expect but its the result of the interaction of all of us.

In simple systems, the properties of the whole can be understood or predicted from the addition or aggregation of its components. If you have knowledge of the microscopic properties of your system, you can predict macroscopic properties of the system.

In complex systems, knowing the knowledge of the components will not be enough to know what happens at the end. Here, the whole is more than the sum of its parts. These systems change their states dynamically and its likely hard to expect what will happen.

Self-organization

Complex systems may self-organize to produce non-trivial patterns spontaneously without a blueprint. For example,

  • local tree growth with random forest fires caused by lightning yield complex spatio-temporal patterns

Adaption

Complex systems may adapt and evolve. Complex systems are often active and responding to the environment. For example,

  • immune system continuously learning about pathogens

  • evolution theory

Interdisciplinary

Complex systems appear in all scientific and professional domains.

Universality means that many systems in different domains display phenomena with common underlying features that can be described using the same scientific models. This is a common idea in transfer learning

Methods

Complex science involves many variables and configurations, needs advanced mathematical and computational modeling, analysis and simulations, and needs to use computers to check ifa set of hypothetical rules could lead to a behavior observed in nature. For example,

  • Agent-based modeling for the flocking of birds

  • mathematical and computer models of the brain

  • climate forecasting computer models

  • computer models of pedestrian dynamics

Two main methods:

  • advanced mathemaatical models

  • computer-based mathematical techniques

Complexity Theory

  • System Theory: Mother of complexity theory, a framework which can be investigated and/or develops any group of objects that work together to produce some result

  • Chaos Theory: Concerns deterministic systems whose behavior can, in principle, be predicted. They are highly sensitive to initial conditions (butterfly effect). Chaotic systems are predictable for a while and then ‘appear’ to become random.

  • Adaptive System Theory: Organization of things without any central control. This deals with action and behavior of others and operate on simple rules. Examples: evolutionary adaption in biology, robotics

  • Network Theory: study of graphs as a representation of relations between discrete objects. Studies real-world data. Examples: social networks, gene interactions, pandemics, transporation networks, citation networks

Anatomy

Node degree: the number of links connected to the node

Directed networks define an in-degree (\(k_i^{in}\)) and out-degree (\(k_i^{out}\)) of node \(i\). The total degree is the sum of in- and out-degree. The source is a node with \(k^{in} = 0\). A sink is a node with \(k^{out} = 0\)

Average degree

Undirected:

  • we define the total number of links as \(L = \frac{1}{2} \sum_{i=1}^N k_i\). We divide by two to not double count.

  • the average link is \(<k> = \frac{1}{N} \sum_{i=1}^N k_i = \frac{2L}{N}\)

where \(N\) is the number of nodes in the graph.

Directed:

  • A node’s total degree is \(k_i = k_i^{in} + k_i^{out}\).

  • The total number of links is \(L = \sum_{i=1}^N k_i^{in} = \sum_{i=1}^{N} k_i^{out}\)

  • The average degree is \(<k^{in}> = \frac{1}{N} \sum_{i=1}^N k_i^{in} = <k^{out}> = \frac{1}{N} \sum_{i=1}^N k_i^{out} = \frac{L}{N}\)

Degree Distribution

A probability \(p_k\) that a randomly chosen node has degree \(k\).

\[p_k = \frac{N_k}{N}\]

Where \(N_k\) is the number of nodes with degree \(k\).

\[\sum_{k=1}^{\infty} p_k = 1\]

The average degree is simply the weighted average (expected value of a discrete variable).

\[<k> = \sum_{k=0}^{\infty} k p_k\]

log-log plots are helpful when defining degree distribution with wide range of probabilities

“hubs” are the highly-connected nodes in a network

Bipartite networks

Nodes can be divided into two disjoint sets \(U\) and \(V\) such that each link connects a U-node to a V-node. We can generate two projections for each bipartite network. The first projection connects two \(U\)-nodes if they are linked to the same \(V\)-node. The second projection connects the \(V\)-nodes if they connect to the same \(U\)-node.

This is used in (say) the hollywood actor network, where a movie is connected to an actor if the actor plays in that movie. We can decompose a \(U\) (movie network) and \(V\) (actor network).

The Human disease network connects diseases to the genes whose mutations are known to cause or effect the corresponding disease. We can decompose a \(U\) (diseases network) and \(V\) (genes network).

Shortest path

Never loops, never overlaps.

Shortest path between two nodes can be calculated by A^2. But this is not computationally efficient.

Instead, we can use Breadth-first-search.

Step 1: Start at node i that we label with “0”.

Step 2: Find the nodes directly linked to i, and label them distance “1” and put them in a queue.

Step 3: Take the first node, labeled n, out of the queue (n=1 in the first step), find the unlabeled nodes adjacent ot it in the graph, and label them with \(n+1\) and put them in the queue.

Step 4: Repeat step 2 until you find the target \(j\) or there are no more nodes in the queue.

Step 5: The distance between i and j is the label of j. if j does not have a label, then \(d_{ij} = \infty\).

Network Diameter

Network diameter, \(d_max\), is the maximum distance between any pair of nodes in the graph.

The average path length is the average number of steps along the shortest paths for all possible pairs of networks nodes.

\[<d> = \frac{1}{2 L_{max}} \sum_{i, j \neq i} d_{ij}\]

here \(d_{ij}\) is the distance from node \(i\) to node \(j\).

These are the measure of efficiency of information or mass transport on a network.

Connectedness

Subgraphs in an adjacency matrix are defined by \(k \times k\) matrices with at least one connection (nonzero)

Clustering coefficient

A clustering coefficient ( in [0,1]) tells you what fraction of your neighbors are connected.

\[C_i = \frac{2 e_i}{k_i (k_i - 1)}\]

from Watts & Strogatz, Nature 1998.

For random networks, the diameter of \(G(n,m)\) would be the diameter of a graph G averaged over the ensemble

\[<l> = \sum_{G} p(G) l(G) = \frac{1}{\Omega} \sum_G l(G)\]

For instance for the \(G(n,m)\) model, the average number of edges is \(m\) and the mean node degree is \(<k>=2m/n\). However, other properties are hard to analytically obtain.

If, instead, the number nodes and the probability of making connections is fixed, notated by \(G(n,p)\), can do other things.

Question:

\(p(L)\): What is the probability that a random network has exactly L links?

The number of nodes \(N\) has the total number of paris of nodes \(n(n-1)/2\). The probability that \(L\) of the attempts to connect the \(N(N-1)/2\) pairs of nodes is \(p^L\). The probability that the remaining \(N(N-1)/2 - L\) attempts have not resulted in a link: \((1-p) []\) <- CHECK ON THIS

The probability to have exactly \(L\) links in a network of \(N\) nodes and probability \(p\) is

Small World

Distance is not related to N. It is related to \(\ln N\). As a note, real networks are \(\ln \ln N\).

Expected number of nodes up to distance \(d\) from our starting node is

\[N(d) \approx 1 + <k> + <k>^2 + ... + <k>^d = \frac{<k>^{d+1} - 1}{<k> - 1}\]

Obviously, the \(N(d)\) must not exceed \(N\). So, \(N(d_{max}) \approx N\). Assuming that \(<k>\) is much greater than \(1\), \(<k>^{d_{max}} \approx N\). Therefore, \(d_{max} \approx \frac{\ln N}{\ln <k>}\). This changes slowly because of the natural logs.

Clustering coefficient

Degree of a node contains no information about the relationship between a node’s neighbors.

We estimate the expected number of links \(L_i\) between the node’s \(k_i\) nieghbors

\[<L_i> = p \frac{k_i (k_i - 1)}{2}\]

where \(p\) is the probability that two of \(i\)’s neighbors link to each other.

The local clustering coefficient of a random network is

\[C_i = \frac{2 <L_i>}{k_i (k_i - 1)} = p = \frac{<k>}{N}\]

Two predictions (regarding real networks) can be formed:

  1. For fixed , the larger the network, the smaller is a node’s clustering coefficient. Note that also follows the same formula for \(C_i\). THIS PREDICTION IS NOT CORRECT. IT SHOULD GO UP.

  2. The local clusteirng coefficient of a node is independent of the node’s degree.

Summary: Random network model does not apture the clustering of real networks.

Real networks have a much higher clustering coefficient than expected

Average clustering coefficient of real networks is much higher

In real networks, the number of nodes at distance \(d > <d>\) drops rapidly

Universality

Problem with small world - distance between arbitray nodes is very small - but, the clustering coefficient cannot be explained in small world model. So what kind of model can explain a real network?

Scale free property - we can see hub.

Fit the power law, we can figure out if it is scale free.

Algorithm: Step 1: Plot the degree distribution

Step 2: Measure the degree exponent

Step 3: Exponent

As the value of the degree exponent plays an important role in predicting various network properties, we need tools to fit the \(p_k\) distribution and to estimate \(\gamma\)

A power grid is not scale free because it is a grid - there arent these huge hubs that exist.

Barabasi-Albert (BA) Model