Find the number of connected components in a graph

Graph with some nodes connected

Suppose that we have a graph of unknown structure and we want to find the number of connected components in it. On the right you can see how an example graph could look like.

We can easily see that there are five distinct components (1,5,7,9), (4, 6), (3, 8), (10, 11, 12) and (2), but this required visualization and manual estimation, which are not feasible with large graphs.

One way to do this would be to compute the Laplacian matrix of the graph as a difference between its degree matrix and adjacency matrices, then normalize the Laplacian, find its eigenvalues and eigenvectors and check the multiplicity of the zero-valued eigenvalue.

The degree matrix tells how many nodes are connected to node i and the number is available at position (i, i) (only the diagonal contains non-zero values). The adjacency matrix tells us which nodes are connected, having elements at position (i, j) equal to 0 if there is no connection between node i and node j and 1 otherwise. To normalize the Laplacian, we multiply it with the degree matrix, raised to exponent (-1/2) on both sides, keeping values that were previously zero.

import numpy as np degree_matrix = np.array([ [1,0,0,0,0,0,0,0,0,0,0,0], [0,0,0,0,0,0,0,0,0,0,0,0], [0,0,1,0,0,0,0,0,0,0,0,0], [0,0,0,1,0,0,0,0,0,0,0,0], [0,0,0,0,2,0,0,0,0,0,0,0], [0,0,0,0,0,1,0,0,0,0,0,0], [0,0,0,0,0,0,2,0,0,0,0,0], [0,0,0,0,0,0,0,1,0,0,0,0], [0,0,0,0,0,0,0,0,1,0,0,0], [0,0,0,0,0,0,0,0,0,1,0,0], [0,0,0,0,0,0,0,0,0,0,2,0], [0,0,0,0,0,0,0,0,0,0,0,1], ]) adjacency_matrix = np.array([ [0,0,0,0,1,0,0,0,0,0,0,0], [0,0,0,0,0,0,0,0,0,0,0,0], [0,0,0,0,0,0,0,1,0,0,0,0], [0,0,0,0,0,1,0,0,0,0,0,0], [1,0,0,0,0,0,1,0,0,0,0,0], [0,0,0,1,0,0,0,0,0,0,0,0], [0,0,0,0,1,0,0,0,1,0,0,0], [0,0,1,0,0,0,0,0,0,0,0,0], [0,0,0,0,0,0,1,0,0,0,0,0], [0,0,0,0,0,0,0,0,0,0,1,0], [0,0,0,0,0,0,0,0,0,1,0,1], [0,0,0,0,0,0,0,0,0,0,1,0], ]) laplacian_matrix = degree_matrix - adjacency_matrix n = len(laplacian_matrix) # Avoid the problem that raising 0 to -1/2 gives infinity degree_matrix_neg_sqrt = np.array([el**(-1/2) if el !=0 else 0 for el in degree_matrix.ravel()]).reshape(n, n) normalized_laplacian_matrix = degree_matrix_neg_sqrt @ laplacian_matrix @ degree_matrix_neg_sqrt eigvals = np.linalg.eigvals(normalized_laplacian_matrix) eps = 10**-12 """ The multiplicity of the eigenvalue 0 is equal to the number of connected components in the graph. Treat extremely small, positive eigevalues as if they were zeros. """ zero_eigenvalue_multiplicity = sum(1 if eigv < eps else 0 for eigv in eigvals) # Validate our result with a graph library import networkx as nx nodes = range(1,13) edges = [(1,5), (5,7), (7,9), (4,6), (8, 3), (10, 11), (11, 12)] G = nx.Graph() G.add_nodes_from(nodes) G.add_edges_from(edges) # Compare normalized Laplacian matrices accounting for floating point error L = nx.normalized_laplacian_matrix(G).todense() print(np.all(np.abs(L - normalized_laplacian_matrix) < eps)) # True print(nx.number_connected_components(G)) # 5 print(zero_eigenvalue_multiplicity == nx.number_connected_components(G)) # True for c in nx.connected_components(G): print(c) """ {1, 5, 9, 7} {2} {8, 3} {4, 6} {10, 11, 12} """

Finding the number of connected components seems to be similar to finding the number of clusters in a dataset of many observations. Interestingly, they work equally well. OpenCV also makes use of connected components, allowing us to find find regions in images. And scikit-learn allows us to take an image and convert it to a graph, having a Scipy COO sparse matrix representation.