Kosaraju's Strongly Connected Components

29 Apr 2016

Background

This algorithm, Kosaraju’s Strongly Connected Components, is the point in the algorithms course where I started to lose a lot of steam the first time around. I somehow got through the Karger Minimum Cut with the right answer, but I don’t trust that code I wrote the first time. Kosaraju’s algorithm really stopped me. I never understood why until an acquaintance told me very recently:

“Graph APIs suck in all programming languages.”

Going through the algorithms course a second time with that understanding put me in the right perspective. Write my own graph API. Karger’s minimum cut was much better the second time around. Kosaraju’s algorithm still took work, but I made it through and I will share my thoughts and tests.

Why??

Strongly connected components are defined as: for a subset of a directed graph, any one set of components are reachable to any other component.

directed graph

A set of strongly connected components in this graph are vertices: A,B,C. From any vertex in A, B, or C, any other vertex can be reached by following the edges in that group. Vertices D and E are connected to A,B,C, but are not strongly connected as no vertex will be accessible from vertex E.

Where is this useful? Some ideas to get started:

networking: understanding a network topology and where servers are connected to each other.
programming: recently I saw this concept applied to finding logic loops and resolving them (i.e. A depends on B which depends on C)

How??

Well, to find a strongly connected component of a graph, I can do it easily visually on the graph earlier.

I see in the top group, a cycle, which any component will reach the other.

When the graph is larger… I have a harder time.

large directed graph

That’s where having an algorithm is useful!

Kosaraju’s Algorithm

I found using Kosaraju’s algorithm to find strongly connected components to be very elegant. So elegant, it’s almost unbelievable in the general description: run depth first search twice.

Run depth first search on the graph normally.
Record the finishing times of the verticies in the search.
Run depth first search on the transpose of the graph using the finishing times as the search order.
Strongly connected components are verticies where the depth first search stops.

Depth first search

This is the main driver of Kosaraju’s algorithm. Implementing depth first search properly will pay off later because it will be used again on the transpose of the graph.

Remember, depth first search is visiting all the edges connected to a vertex by continuously exploring new edges until there are no more. If you ever see one of those maze solving screen savers, that’s how depth first search operates.

maze solving

source: https://rosettacode.org/wiki/Maze_solving

Output of depth first search is the search order of the verticies visited. So, on a simple cyclic graph:

simple cyclic graph

1
2
3
graph = { a => [b],
          b => [c],
          c => [a] }

Depth first search result depends on which vertex is the starting vertex. For the above cyclic graph, three possible outputs depending on the starting vertex:

1
2
3
puts dfs(graph, a) == [a,b,c]
puts dfs(graph, b) == [b,c,a]
puts dfs(graph, c) == [c,a,b]

Aside: this probably threw me off of graphs the most: what is the output of a graph search???

On a little more complicated graph:

more complicated cyclic graph

1
2
3
4
5
complicated_graph = { a => [b],
                      b => [c,d],
                      c => [a],
                      d => [e],
                      e => [b] })

the tests would be:

1
2
3
4
5
puts dfs(complicated_graph, a) == [a,b,c,d,e]
puts dfs(complicated_graph, b) == [b,c,a,d,e]
puts dfs(complicated_graph, c) == [c,a,b,d,e]
puts dfs(complicated_graph, d) == [d,e,b,c,a]
puts dfs(complicated_graph, e) == [e,b,c,a,b]

Finishing time for a graph

The finishing time for a graph is basically the order which DFS finishes traversing a vertex until there is nothing left to explore from the vertex. (Finishing order might be a better name for it.)

Let’s have the same simple graph, but this time I will highlight the finishing order.

simple cyclic graph

1
2
3
graph = { a => [b],
          b => [c],
          c => [a] }

now, when DFS is performed starting at vertex a, the search order will still be:

puts dfs(graph, a) == [a,b,c]

The finishing time for this DFS:

finishing_time = { a => 3, b => 2, c => 1 }

The way finishing times for each vertex is computed is:

DFS starts from a, then goes to b, then goes to c, because DFS just keeps digging further and further.
At c, there’s no more new vertices to search, so the search is finished there.
The search goes back to b, and there’s also no new vertices to explore, so it’s finished with b.
Finally, the search returns to a and there’s nothing left, so it’s finished with a.

In this case, the finishing time happens to be the inverse of the search order.

Finishing time for a more complex cyclic graph

more complicated cyclic graph

1
2
3
4
5
complicated_graph = { a => [b],
                      b => [c,d],
                      c => [a],
                      d => [e],
                      e => [b] })

Let’s start DFS from vertex a, and the search order will be:

puts dfs(complicated_graph, a) == [a,b,c,d,e]

The finishing time for the complicated graph will be…

finishing_time == {a => 5, b => 4, c => 1, d => 3, e => 2 }

This time, the finishing time is:

At a, there’s one edge to b, goto b.
At b, there are two edges, take the first edge to c, goto c.
At c, there’s an edge to a, but a is a vertex being explored. There are no more edges to explore, so c is done, record the finishing time. Go back to b.
At b, are there any edges to explore? yes, an edge to d. goto d.
At d, there is one edge to e, goto e.
At e, there is one edge to b, but b is a vertex being explored. There are no more edges at e to explore, so e is done and record the finishing time. Go back to d.
At d, there are no more edges to explore. d is done and record the finishing time. Go back to b.
At b, there are no more edges to explore, b is done and record the finishing time. Go back to a.
At a, there are no more edges to explore, a is done and record the finishing time. This is the start of the search so there is nothing to explore.
The search is done.

Disconnected vertices

Having the finishing time of every vertex is crucial to finding strongly connected components. Even if a vertex is disconnected, a search must be performed to obtain its finishing time.

disconnected graph

disconnected_graph = { a => [b], b => [c], c => [a], d => [e], e => [d] }

Here, vertex d and e are not part of a, b, or c, so the graph’s finishing time when searching from vertex a is:

1
2
puts dfs(disconnected_graph, a) == [a,b,c]
finishing_time == { a => 3, b => 2, c => 1 }

For Kosaraju’s algorithm to work, the finishing times for all the verticies are needed. DFS must be performed on d and e for its finishing times:

1
2
puts dfs(disconnected_graph, d) == [d,e]
finishing_time == { a => 3, b => 2, c => 1, d => 5, e => 4 }

I spent a lot of time describing the finishing time for a graph (much more than DFS, which is the main driver of this algorithm.) For me, I was really caught up implementing Kosaraju’s algorithm because I didn’t have a good grasp of the finishing time and found very little resources for it. These are some resources I used: to help me understand the finishing time concept better.

Transpose graph

The transpose of a directed graph is just reversing all the edges of the graph.

Going back to our simple graph:

simple cyclic graph

simple_graph = { a => [b], b => [c], c => [a] }

I work out the transpose of the simple graph by hand and get:

transpose of simple graph

simple_graph_transposed = { a => [c], b => [a], c => [b] }

so, the the test would look like:

puts transpose_graph(simple_graph) == simple_graph_transposed

Transposing a graph is only used once in Kosaraju’s algorithm but it is an essential element to the algorithm. This is one of the more straight forward parts of the algorithm.

DFS on transposed graph

In the last part of Kosaraju’s algorithm and combine all elements used so far:

depth first search
finishing time
transpose graph

and apply these elements in reverse order:

transpose the graph
use verticies with the lowest the finishing times
to perform depth first search

What will happen is magical: depth first search will be performed but starting at the end of the strongly connected components. As the search is performed, all the connected components will be discovered and the next finishing time will be at the start of the next connected component. The order which DFS explores the transposed graph will be the strongly connected components of the graph.

more complicated cyclic graph

1
2
3
4
5
6
7
complicated_graph = { a => [b],
                      b => [c,d],
                      c => [a],
                      d => [e],
                      e => [b] })

puts kosaraju(graph) == [5]

Conclusion

Whew. This was one of the most challenging algorithms for me. Challenging because it’s so elegant and I underestimated its complexity.

This algorithm uses depth first search heavily and basically, it’s DFS with accounting.

For me, I tripped up on understanding the finishing times which is a crucial part of this algorithm.