Clustering coefficient



Duncan J. Watts and Steven Strogatz (1998) introduced the clustering coefficient graph measure to determine whether or not a graph is a small-world network.

First, let us define a graph in terms of a set of $$n$$ vertices $$V={v_1,v_2,...v_n}$$ and a set of edges $$E$$, where $$e_{ij}$$ denotes an edge between vertices $$v_i$$ and $$v_j$$. Below we assume $$v_i$$, $$v_j$$ and $$v_k$$ are members of V.

We define the neighbourhood N for a vertex $$v_i$$ as its immediately connected neighbours as follows:

$$N_i = \{v_j\} : e_{ij} \in E.$$

The degree $$k_i$$ of a vertex is the number of vertices, $$|N_i|$$, in its neighbourhood $$N_i$$.

The clustering coefficient $$C_i$$ for a vertex $$v_i$$ is the proportion of links between the vertices within its neighbourhood divided by the number of links that could possibly exist between them. For a directed graph, $$e_{ij}$$ is distinct from $$e_{ji}$$, and therefore for each neighbourhood $$N_i$$ there are $$k_i(k_i-1)$$ links that could exist among the vertices within the neighbourhood. Thus, the clustering coefficient is given as:

$$C_i = \frac{|\{e_{jk}\}|}{k_i(k_i-1)} : v_j,v_k \in N_i, e_{jk} \in E.$$

An undirected graph has the property that $$e_{ij}$$ and $$e_{ji}$$ are considered identical. Therefore, if a vertex $$v_i$$ has $$k_i$$ neighbours, $$\frac{k_i(k_i-1)}{2}$$ edges could exist among the vertices within the neighbourhood. Thus, the clustering coefficient for undirected graphs can be defined as:

$$C_i = \frac{2|\{e_{jk}\}|}{k_i(k_i-1)} : v_j,v_k \in N_i, e_{ij} \in E.$$

Let $$\lambda_G(v)$$ be the number of triangles on $$v \in V(G)$$ for undirected graph $$G$$. That is, $$\lambda_G(v)$$ is the number of subgraphs of $$G$$ with 3 edges and 3 vertices, one of which is $$v$$. Let $$\tau_G(v)$$ be the number of triples on $$v \in G$$. That is, $$\tau_G(v)$$ is the number of subgraphs (not necessarily induced) with 2 edges and 3 vertices, one of which is $$v$$ and such that $$v$$ is incident to both edges. Then we can also define the clustering coefficient as:

$$C_i = \frac{\lambda_G(v)}{\tau_G(v)}$$

It is simple to show that the two preceding definitions are the same, since $$\tau_G(v) = C({k_i},2) = \frac{1}{2}k_i(k_i-1)$$.

These measures are 1 if every neighbour connected to $$v_i$$ is also connected to every other vertex within the neighbourhood, and 0 if no vertex that is connected to $$v_i$$ connects to any other vertex that is connected to $$v_i$$.

The clustering coefficient for the whole system is given by Watts and Strogatz as the average of the clustering coefficient for each vertex:

$$\overline{C} = \frac{1}{n}\sum_{i=1}^{n} C_i.$$