Overview
Revisitting the sparse matrix technology.
Reverse Cuthill-McKee Algorithm (George, Liu, etc. 1981)
Terminology
Let A be an $n\times n$ symmetric positive matrixm with entries $a_{ij}$
- $f_i(A)$ denotes the column substrcipt of the first non-zero component in row $i$ of A: $$ f_i(A) = min{j|a_{ij}\neq 0} $$
- $\beta_i(A)$ denotes the i-th bandwidth of A: $$ \beta_i(A) = i -f_i(A) $$
- $\beta(A)$ denotes the bandwith of A: $$ \beta(A) = \max {\beta_i(A)|1\le i \le n} $$
$Env(A)$ denotes the envelop of A: $$ Env(A) = {{i,j}|0<i-j\le\beta_i(A} $$
Envelop size = $|Env(A)|$
$w_i(A)$ denotes the i-th frontwidth of A, the number of “active” rows at i-th step: $$ w_i(A) = |{j>i|{i,j}\in Env(A)}| $$
Frontwidth : $w(A) = \max{w_i(A)|1\le i\le n }$
Define Eccentricity $l(x)$ as the maximum distance that root $x$ could reach within the graph: $$ l(x) = max{d(x,y)|y\in X} $$
A node $x\in X$ is said to be a peripheral nodes if its eccentricity is equal to the diameter of the graph.
Theorem
The Adjacent set $Adj({x_1,\cdots,x_i})$ shall be referred to as the i-th front of the labelled graph, and its size the i-th frontwidth $$ For\ i<j,{i,j}\in Env(A)\ if\ and\ only\ if\ x_j\in Adj({x_1,\cdots,x_i}) $$
Algorithm
Discussion
How the bandwidth and envelop of the matrix affects the fill-in ?
When a system of linear equations has a band matrix of coefficients and the system is solved by Guassian elimination, with pivots taken from the diagonal, all arithmetic is confined to band and no new zero elements are generated outside of the band.
To minimize the bandwidth of the row associated with z, node z should be ordered as soon as possible after y.
Greedy policy: The Cuthill-McKee scheme can be regarded as a method that reduces the bandwidth of a matrix via a local minimization of the $\beta_i$’s. This suggests that the scheme can be used as a method to reduce the profile/envelope
Why reverse ?
- Reversing the Cuthill-McKee ordering ofren turns out to be much superior to the original ordering in terms of profile reduction, although tha bandwidth remains unchanged.
Heuristic searching for peripheral node
- Peripheral node is the ideal starting point in RCM and many other algorithms, while it is expensive to find with the time complexity bound of $O(|X|^2)$. Instead we search the pseudo-peripheral node by iterating in the level strcture of the graph.
Implementation
RCM algorithm
Code
def RCM(G):
"""
RCM algorithm
:param G: Graph
:return: a list of reordered index
"""
def single_seach(G):
# Breadth first search
root = diameter_pair_algo(G)
que = deque()
que.append(root)
visited = [root]
while que:
for _ in range(len(que)):
root = que.popleft()
adjacencies = G.adj[root].keys()
subsidiaries = [node for node in adjacencies if node not in visited]
for sub in subsidiaries:
que.append(sub)
visited.append(sub)
return visited[::-1]
reordered = []
sub_graphes = subGraphs(G)
for sub_graph in sub_graphes:
if len(list(sub_graph.nodes)) <= 1:
reordered += list(sub_graph.nodes)
else:
reordered += single_seach(sub_graph)
return reordered
Finding pseudo-peripheral nodes
Code
def diameter_pair_algo(G) -> set:
"""
pseudo-peripheral algorithsm (George, Liu, 1994)
find one pseudo-diameter pair
:param G: Networkx undirected graph
:return: a pseudo-peripheral vertex
"""
s = np.random.choice(G.nodes)
while True:
level_set_s = level_set_algo(G,s)
# choose a node in the last level set of minimum degree
nodes = [(i,G.degree[i]) for i in level_set_s[-1]]
t = min(nodes, key = lambda x: x[1])[0]
level_set_t = level_set_algo(G,t)
if len(level_set_t) > len(level_set_s):
s = t
return t