{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "(Graph_algorithms)=\n",
    "# Graph Algorithms\n",
    "``` {index} Graph Algorithms\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Graph\n",
    "``` {index} Graph\n",
    "```\n",
    "In broad terms. A graph $(V,E)$ is an abstract structure consisting of nodes (*vertices*) and edges connecting some of them. Graphs can be directed or undirected which means that their edges can be traversed in one or both directions. Graphs can be unweighted or not, their edges can have assigned weights. There are several ways of representing graphs in computers, two of which are:\n",
    "\n",
    "* Adjecency matrix $A$: Which are matrices consisting of $1$ and $0$. A $1$ at a coordinates $(i,j)$ means that there is an edge from node $i$ to node $j$. For undirected graphs $A[i,j]$ = $A[j,i]$. Adjacency matricex are a compact way of representing *dense* graphs where the number of edges is close to $|V|^2$. \n",
    "\n",
    "* Adjacency list $L$: which is a 2D array where an entry $L[i]$ is a list with the ids of nodes to which $i$ is connected to. It is a space-efficient representation for *sparse* graphs where not many nodes are connected.\n",
    "\n",
    "```{figure} algo_images/Graphs.png\n",
    ":width: 60%\n",
    "```\n",
    "\n",
    "Depending on the algorithm we will use one of those representations."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Graph traversal\n",
    "``` {index} Graph traversal\n",
    "```\n",
    "*Traversing* a graph means visiting all if its nodes in some order. There are two basic algorithms to which achieve that goal.\n",
    "\n",
    "\n",
    "### Breadth-first search (BFS)\n",
    "\n",
    "The name of this graph traversal stems from the way it visits the nodes. It starts at some arbitrarily chosen node and then visits all the nodes that are distance 1 away from it, then distance 2 etc. BFS will not visit a node that is distance $d+1$ until it visits all nodes at a distance $d$. When two nodes are equal distance from the first node, the choice is also arbitrary (we will go for the numerical order).\n",
    "\n",
    "<img src=\"https://upload.wikimedia.org/wikipedia/commons/4/46/Animated_BFS.gif\" width=\"40%\">\n",
    "\n",
    "Source: [Wikimedia Commons](https://upload.wikimedia.org/wikipedia/commons/4/46/Animated_BFS.gif)\n",
    "\n",
    "We will divide the nodes of the graph into three kinds: white, grey and black. White nodes are unvisited, grey are being considered and black are already visited. Grey nodes are a queue, first in last out.\n",
    "In words, the algorithm will work as follows:\n",
    "\n",
    "1) Add all nodes to the white list.\n",
    "\n",
    "2) Choose an arbitrary white node, put it in the grey queue.\n",
    "\n",
    "3) Consider all nodes connected to the node, that are white, add them to the grey queue one by one.\n",
    "\n",
    "4) Move the node considered to the black list.\n",
    "\n",
    "5) If the grey queue is not empty, take another grey node and come back to the 3rd step.\n",
    "\n",
    "6) If the white list is not empty, come back to the 2nd step, otherwise terminate.\n",
    "\n",
    "The following implementations print the node id as they are added to the black list."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0\n",
      "2\n",
      "3\n",
      "1\n",
      "\n",
      "0\n",
      "2\n",
      "3\n",
      "1\n",
      "\n",
      "0\n",
      "2\n",
      "3\n",
      "1\n",
      "\n",
      "0\n",
      "2\n",
      "3\n",
      "1\n"
     ]
    }
   ],
   "source": [
    "from queue import Queue\n",
    "# adjacency matrix implementation\n",
    "# indexing from 0, different from the diagram\n",
    "def matrixBFS(graph):\n",
    "    assert len(graph) and len(graph) == len(graph[0])\n",
    "    white = list(range(len(graph)))\n",
    "    grey = Queue()\n",
    "    black = []\n",
    "    \n",
    "    # initalise the queue\n",
    "    grey.put(0)\n",
    "    white.remove(0)\n",
    "    while True:\n",
    "        # consider a node\n",
    "        curr = grey.get()\n",
    "        black.append(curr)\n",
    "        # consider the nodes connected to the current node\n",
    "        for i in range(len(graph[curr])):\n",
    "            # add to grey if not previously seen\n",
    "            if graph[curr][i] and i in white:\n",
    "                grey.put(i)\n",
    "                white.remove(i)\n",
    "        print(curr)\n",
    "        # check if all components if the graph is no connected\n",
    "        # if all nodes checked terminate\n",
    "        if grey.empty():\n",
    "            for n in list(range(len(graph))):\n",
    "                if n not in black:\n",
    "                    grey.put(n)\n",
    "                    white.remove(n)\n",
    "                    continue\n",
    "            if grey.empty():\n",
    "                break\n",
    "\n",
    "def listBFS(graph):\n",
    "    assert len(graph)\n",
    "    white = list(range(len(graph)))\n",
    "    grey = Queue()\n",
    "    black = []\n",
    "    \n",
    "    grey.put(0)\n",
    "    white.remove(0)\n",
    "    while True:\n",
    "        curr = grey.get()\n",
    "        black.append(curr)\n",
    "        for i in graph[curr]:\n",
    "            if i in white:\n",
    "                grey.put(i)\n",
    "                white.remove(i)\n",
    "        print(curr)\n",
    "        if grey.empty():\n",
    "            for n in list(range(len(graph))):\n",
    "                if n not in black:\n",
    "                    grey.put(n)\n",
    "                    white.remove(n)\n",
    "                    continue\n",
    "            if grey.empty():\n",
    "                break\n",
    "# Undirected            \n",
    "# 0   1\n",
    "# | \\\n",
    "# 2 - 3\n",
    "graph = [[0,0,1,1],[0,0,0,0],[1,0,0,1],[1,0,1,0]] \n",
    "matrixBFS(graph)\n",
    "print(\"\")\n",
    "graph = [[2,3],[0,3],[0,2],[]]\n",
    "listBFS(graph)\n",
    "print(\"\")\n",
    "# Directed\n",
    "# 0    1\n",
    "# /\\ \\  /\\\n",
    "# ||  \\ |\n",
    "# \\/  \\/| \n",
    "# 2     3\n",
    "graph = [[0,0,1,1],[0,0,0,0],[1,0,0,0],[0,1,0,0]] \n",
    "matrixBFS(graph)\n",
    "print(\"\")\n",
    "graph = [[2,3],[],[0],[1]]\n",
    "listBFS(graph)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "There are many uses of the BFS algorithm, some of which are:\n",
    "\n",
    "1) Finding a MST (see below) in an unweighted graph.\n",
    "\n",
    "2) Fining a minimum path in an unweighted graph (initial node has to be at the end of the searched path).\n",
    "\n",
    "3) Detecting cycles in graphs (by encountering a visited node).\n",
    "\n",
    "4) Finding all nodes within a connected component."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Depth-first search (DFS)\n",
    "```{index} Depth-first search (DFS)\n",
    "```\n",
    "In DFS we will always go *deeper* into the graph whenever possible. We start at an arbitrary node and then go to the first possible via an edge. This is repeated until we reach a node which has only edges to already visited nodes (or no edges). We then trace back to the previous node in this procedure. We continue until all nodes are visited.\n",
    "\n",
    "<img src=\"https://upload.wikimedia.org/wikipedia/commons/7/7f/Depth-First-Search.gif\" width=\"40%\">\n",
    "\n",
    "Source: [Wikimedia Commons](https://upload.wikimedia.org/wikipedia/commons/7/7f/Depth-First-Search.gif)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "3\n",
      "2\n",
      "0\n",
      "1\n",
      "\n",
      "2\n",
      "1\n",
      "3\n",
      "0\n"
     ]
    }
   ],
   "source": [
    "# DFS for matrix representation\n",
    "def matrixDFS(graph):\n",
    "    assert len(graph) and len(graph) == len(graph[0])\n",
    "    # initalise\n",
    "    white = list(range(len(graph)))\n",
    "    black = []\n",
    "    while white:\n",
    "        root = white[0]\n",
    "        white.remove(root)\n",
    "        black.append(root)\n",
    "        # recursive helper function\n",
    "        DFShelper(graph,root,black,white)\n",
    "\n",
    "def DFShelper(graph, root, black, white):\n",
    "    for i in range(len(graph[root])):\n",
    "        if graph[root][i] and i in white:\n",
    "            white.remove(i)\n",
    "            black.append(i)\n",
    "            DFShelper(graph, i, black, white)\n",
    "    print(root)\n",
    "            \n",
    "# Undirected            \n",
    "# 0   1\n",
    "# | \\\n",
    "# 2 - 3\n",
    "graph = [[0,0,1,1],[0,0,0,0],[1,0,0,1],[1,0,1,0]] \n",
    "matrixDFS(graph)\n",
    "print(\"\")\n",
    "\n",
    "# Directed\n",
    "# 0    1\n",
    "# /\\ \\  /\\\n",
    "# ||  \\ |\n",
    "# \\/  \\/| \n",
    "# 2     3\n",
    "graph = [[0,0,1,1],[0,0,0,0],[1,0,0,0],[0,1,0,0]] \n",
    "matrixDFS(graph)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Time complexity of DFS is $O(|V|+|E|)$. It can be used in the following problems:\n",
    "\n",
    "1) Detecting cycles\n",
    "\n",
    "2) Augmented DFS can find the MST of a graph\n",
    "\n",
    "3) [Topological sorting](https://en.wikipedia.org/wiki/Topological_sorting)\n",
    "\n",
    "4) [Finding strongly connected components of a graph](https://en.wikipedia.org/wiki/Strongly_connected_component)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Minimum spanning tree (MST)\n",
    "```{index} Minimum spanning tree (MST)\n",
    "```\n",
    "A *spanning tree* of a graph $G$ is a subgraph of $G$ that includes all its nodes. The concept is meaningful for connected graphs only (if the graph is not connected we are considering its *spanning forest*). The problem of MST is considered for weighted graphs $(V,E$ with $w$ - a function that returns the weight of an edge in $E$. A *minimum spanning tree* is one in which the sum of weights of the edges is minimal. Surprisingly, the problem can be solved by using a *greedy* approach. Here are two possible approaches:\n",
    "\n",
    "### Kruskal's algorithm\n",
    "\n",
    "Kruskal's approach is based on the following procedure:\n",
    "\n",
    "1) Create a single element set for every node.\n",
    "\n",
    "2) Sort the edges in $E$ by weight in non-decreasing order, put them in a priority queue.\n",
    "\n",
    "3) Take the edge of least weight and if it connects elements from different sets, use it. Make the connected sets one (by setting the same group leader). \n",
    "\n",
    "`find(u)` method will find the leader of a group. `u,v` are in the same group if `find(u) == find(v)`.\n",
    "`union(u,v)` will join two sets. The leader will be the leader of the set of more elements, otherwise arbitrary."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[0, 3, 1], [1, 5, 3], [0, 2, 4], [0, 1, 6], [0, 4, 7]]\n"
     ]
    }
   ],
   "source": [
    "# we will use a matrix representation as a list of edges [source, dest, weight]\n",
    "# we are also given the number of nodes\n",
    "# assuming undirected graph\n",
    "def mst_kruskal(edges,n):\n",
    "    sets = {}\n",
    "    mst =[]\n",
    "    for i in range(n):\n",
    "        # [leader, number of elems]\n",
    "        sets[i] = [i,1]\n",
    "    edges.sort(key = lambda x : x[2])\n",
    "    \n",
    "    for edge in edges:\n",
    "        if find(edge[0],sets) != find(edge[1],sets):\n",
    "            mst.append(edge)\n",
    "            if union(edge[0],edge[1],sets) == n:\n",
    "                return mst\n",
    "    \n",
    "    \n",
    "def find(u,sets):\n",
    "    # go up in parents\n",
    "    while u != sets[u][0]:\n",
    "        u = sets[u][0]\n",
    "    return u\n",
    "        \n",
    "def union(u,v,sets):\n",
    "    leader_u = sets[find(u,sets)]\n",
    "    leader_v = sets[find(v,sets)]\n",
    "    #compare which set is bigger\n",
    "    if leader_u[1] >= leader_v[1]:\n",
    "        leader_v[0] = leader_u[0]\n",
    "        leader_u[1] += leader_v[1]\n",
    "        return leader_u[1]\n",
    "    else:\n",
    "        leader_u[0] = leader_v[0]\n",
    "        leader_v[1] += leader_u[1]\n",
    "        return leader_v[1]\n",
    "    \n",
    "edges = [[0,1,6],[0,2,4],[1,5,3],[1,3,6],[0,4,7],[3,4,10],[0,3,1]]\n",
    "print(mst_kruskal(edges,6))\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Kruskal's algorithm runs in $O(|E|\\log|E|)$ time, which is also $O(|E|\\log|V|)$ as $|E| \\leq |V|^2$. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Prim's algorithm\n",
    "\n",
    "Prim's algorithm is similar to Kruskal's but at each step, there is only one tree that grows. The idea is as follows:\n",
    "\n",
    "1) Start at an arbitrary node in $G$\n",
    "\n",
    "2) Take the minimum edge incident on one of the already considered nodes (that connects it to the node not previously seen). This way the tree expands.\n",
    "\n",
    "3) When all nodes are in the tree, terminate.\n",
    "\n",
    "Of course, the 2nd step may be a bit tricky to implement efficiently. We will use a *priority queue* from the `heapq` interface to efficiently choose the next edge to add. We also need to upgrade our adjacency list representation of the graph. Now it is a dictionary which maps to a dictionary with incident nodes as keys and their corresponding edge weights as values. So `graph[from][to]` gives the desired edge if it exists. It assumes that there is at most one edge between any two nodes (undirected graph), if this is violated, we would always choose the edge of minimum weight."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "defaultdict(<class 'set'>, {'A': {'B'}, 'B': {'D', 'C'}, 'D': {'E'}, 'E': {'F'}, 'F': {'G'}})\n"
     ]
    }
   ],
   "source": [
    "from collections import defaultdict\n",
    "import heapq\n",
    "\n",
    "# we will work on an adjacency list representation\n",
    "def mst_prim(graph, starting_vertex):\n",
    "    # defaultdict returns set() in case the key is not in the dict\n",
    "    # this is an adjacency list representation\n",
    "    mst = defaultdict(set)\n",
    "    # at the beginning only the starting_node is visited\n",
    "    visited = set([starting_vertex])\n",
    "    # get the edges incident on the starting_vertex\n",
    "    edges = [\n",
    "        (cost, starting_vertex, to)\n",
    "        for to, cost in graph[starting_vertex].items()\n",
    "    ]\n",
    "    # create a min-heap based on the cost of the esge\n",
    "    heapq.heapify(edges)\n",
    "    \n",
    "    while edges:\n",
    "        # take the minimum cost edge\n",
    "        cost, frm, to = heapq.heappop(edges)\n",
    "        if to not in visited:\n",
    "            # add the new node to the visited\n",
    "            visited.add(to)\n",
    "            # expand the tree\n",
    "            mst[frm].add(to)\n",
    "            # add the edges incident on the new node\n",
    "            # which lead to the not visited nodes\n",
    "            for to_next, cost in graph[to].items():\n",
    "                # add to the heap\n",
    "                if to_next not in visited:\n",
    "                    heapq.heappush(edges, (cost, to, to_next))\n",
    "\n",
    "    return mst\n",
    "\n",
    "example_graph = {\n",
    "    'A': {'B': 2, 'C': 3},\n",
    "    'B': {'A': 2, 'C': 1, 'D': 1, 'E': 4},\n",
    "    'C': {'A': 3, 'B': 1, 'F': 5},\n",
    "    'D': {'B': 1, 'E': 1},\n",
    "    'E': {'B': 4, 'D': 1, 'F': 1},\n",
    "    'F': {'C': 5, 'E': 1, 'G': 1},\n",
    "    'G': {'F': 1},\n",
    "}\n",
    "\n",
    "print(mst_prim(example_graph,'A'))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The algorithm runs in $O(|E|\\log|V|)$ time. Why?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "tags": [
     "remove-cell"
    ]
   },
   "source": [
    "## Shortest path\n",
    "\n",
    "### Dijkstra's algorithm\n",
    "\n",
    "### Bellman-Ford algorithm"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "tags": [
     "remove_cell"
    ]
   },
   "source": [
    "## Floyd cycle detection algorithm"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "tags": [
     "remove_cell"
    ]
   },
   "source": [
    "## Graph colouring"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "tags": [
     "remove_cell"
    ]
   },
   "source": [
    "## Maximum flow "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## References\n",
    "* Cormen, Leiserson, Rivest, Stein, \"Introduction to Algorithms\", Third Edition, MIT Press, 2009\n",
    "* GeeksforGeeks, [Applications of Breadth First Traversal](https://www.geeksforgeeks.org/applications-of-breadth-first-traversal/?ref=rp)\n",
    "* Vijini Mallawaarachchi, Medium, Towards Data Science, [10 Graph Algorithms Visually Explained](https://towardsdatascience.com/10-graph-algorithms-visually-explained-e57faa1336f3), 2020\n",
    "* Brad Filed, Practical Algorithms and Data Structures, [Prim's Spanning Tree Algorithm](https://bradfieldcs.com/algos/graphs/prims-spanning-tree-algorithm/), 2020"
   ]
  }
 ],
 "metadata": {
  "celltoolbar": "Tags",
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}