Greedy Algorithms
Contents
Greedy Algorithms#
Greedy Algorithms are one of the paradigms of algorithmic problem-solving. All greedy algorithms share the assumption that local optimal steps lead to the globally optimal solution. Quite surprisingly, this is often a valid assertion.
If you want to have a good grasp on the topic, please have a look at Dynamic Programming section beforehand.
Basic Examples#
Greedy Scheduling We are given a list
timeslots
with each element being a tuple \((s,f)\) where \(s\) is the starting time of the event and \(f\) its the finish time. We need to choose a subset of non-overlappingtimeslots
that has the most elements. Return the number of elements in that subset.
First, let us assume that timeslots
is sorted by \(f\) (non-decreasing).
Let us show that the problem has an optimal substructure. Let \(S_{ij}\) denote a set of events which start after event \(a_i\) and finish before the event \(a_j\) and \(A_{ij}\) be the optimal subset of \(S_{ij}\). We can see that \(A_{ij} = A_{ik} + a_k + A_{kj}\) for \(i\leq k \leq j\). Now \(A_{ik}\) is the solution for \(S_{ik}\) and \(A_{kj}\) solves \(S_{kj}\). These are two independent subproblems, so the problem has an optimal substructure.
Now, you can have a thought about how, based on this information, one can implement Dynamic solution. However, let us jump straight into greedy:
Our hypothesis is to always choose the task that ends the earliest. We will show that (some optimal) \(A_{ij}\) always contains the earliest ending task \(a_m\) of \(S_{ij}\). Then, inductively, we know that this leads to a globally optimal arrangement.
Proof: Let \(a_n\) be the earliest ending task in \(A_{ij}\) (does not have to be the same as \(a_m\)). If \(a_n = a_m\) then we are done. Otherwise, it is possible to swap \(a_n\) with \(a_m\) keeping \(A_{ij}\) valid. As \(A_{ij}\) was optimal, such swap still makes it optimal.
Have a look at the following code:
from random import randint
def greedyScheduling(timeslots):
timeslots.sort(key = lambda x: x[1])
#edge case
if not len(timeslots):
return 0
# setting up the initial vars
curr = timeslots[0]
res = 1
for i in range(1,len(timeslots)):
if curr[1] <= timeslots[i][0]:
curr = timeslots[i]
res += 1
return res
# brute force for testing
def bruteForce(timeslots):
res = 0
# all combinations of the elements
for i in range(2**len(timeslots)):
temp = 0
already_chosen = []
for j in range(len(timeslots)):
if (i >> j) & 1:
s, e = timeslots[j]
temp += 1
already_chosen.append(timeslots[j])
# checking if a valid combination of timeslots
for k in range(len(already_chosen)-1):
if not (s >= already_chosen[k][1] or e <= already_chosen[k][0]):
temp = 0
break
res = max(temp, res)
return res
# testing
for _ in range(20):
timeslots = []
for _ in range(12):
s = randint(0,99)
e = randint(s+1,100)
timeslots.append((s,e))
br = bruteForce(timeslots)
gr = greedyScheduling(timeslots)
assert gr == br
The time complexity is \(O(n\log(n))\) but this is due to sorting the events. If the events were sorted initially it would be \(O(n)\).
Cover a set of points with given length intervals You are given a list
points
of \(x\)-coordinates of points andl
- the length of the interval. How many intervals do we need to cover the points?
Let \(A_{ij}\) denote the optimal set of intervals that cover the points. Take the element from this set \((s_k,e_k)\) then \(A_{ij} = A_{ik} + a_k + A_{kj}\). Similair analysis as in the previous example leads to the conclusion that the problem has an optimal substructure.
Now let us show that greedy choices lead to the optimal solution:
Let us sort the points in the non-decreasing order. We insert an interval where the first begins (an can cover other points) and continue such insertion until we are done.
Now, consider the optimal solution \(A_{ij}\) and its first interval. If it does start where the first point, we can replace it with such interval and the solution remains optimal. Done!
Here is the code implementation:
def intervals(points,l):
# sort the points
points.sort()
if not len(points):
return 0
# set the initial vars
res = 1
border = points[0] + l
# go through the list
for i in range(1,len(points)):
if points[i] > border:
res += 1
border = points[i] + l
return res
Similarly depending whether the points
are sorted we have \(O(nlog(n))\) or \(O(n)\).
Conditions needed for Greedy Solution#
There are two conditions which need to be met for the greedy solution to work:
Optimal substructure: the problem should be expressed in terms of independent subproblems, just like in Dynamic Programming!
Greedy choices lead to optimal solution: This is usually done by the copy and paste argument that shows that there exists an optimal solution that involves the proposed Greedy hypothesis. It has an inductive flavour, which means that the Greedy choice leaves us with the same smaller problem. You can assume that the input is sorted somehow, it might be very helpful!
Advanced Examples#
Here are several examples which might be a little more involving:
Fractional Knapsack You are given two lists
vals
andweights
as well as an integerW
- the total weight of a backpack. You are allowed to take any fraction of any item in the list so that the sum of their weights is no bigger thanW
. All numbers in the input are positive. Return the maximum possible value:
Optimal substructure Let \(val(i, w_j,w)\) denote the maixmum value we can get by taking \(w_j \leq w_i \) of item \(i\) (\(w_i\) is the weight of the object), considering the object up to the \(i\)th. \(w\) is the weight left in the Knapsack. The solution uses the following recursion:
Greedy choices lead to optimal solution We will be taking the aveliable items of the greatest density \(\frac{v_i}{w_i}\).
Imagine a sequence of the optimal \(w_j\)s. Let us take such item \(k\) that \(\frac{v_k}{w_k}\) is no smaller than any such ratio in the sequence. If such item is already in the sequence, we check if the weight taken is maximum possible, if not we can improve or keep the result equal. Since the solution is optimal we cannot improve. If the item is not, we can replace with other items and improve or stay the same, again the solution is optimal so the result can only stay the same. Therefore such replacement will result in an optimal solution:
# vals and weights have positive values only
# W is positive as well
def fractionalKnapsack(vals,weights, W):
assert len(vals) and len(weights) and W and len(vals) == len(weights)
wDensity = list(zip(weights,vals))
# sort according to density
wDensity.sort(key = lambda x: x[1] / x[0])
#set up the inital values
i = 0
res = 0
while W > 0 and i < len(wDensity):
if W >= wDensity[i][0]:
res += wDensity[i][1]
W -= wDensity[i][0]
else:
res += W * wDensity[i][1]/wDensity[i][0]
W = 0
i+=1
return res
print(fractionalKnapsack([1,2,3],[4,5,6],12))
4.5
Coin Changing: We are given an integer list of coin denominations
m
with each entry in cents/pence. For the British system it would be[1,2,5,10,20,50,100,200]
. We are also givenW
- the sum we would like to receive having an infinite supply of all the coins. Return the minimum amount of coins we need to return such amount.
This problem can be solved using Dynamic Programming, but for many cases (called canonical coin systems greedy approach works as well. Many modern coin systems, such as EUR, GBP or USD are canonical.
Optimal substructure Try to imagine the following: each solution to the problem for given amount of money \(W\) can be represented as a sorted (non-increasing) seqence of coin denominations which sums up to \(W\). Now let \(coins(n,w)\) denote the minimum number of coins needed to return \(W\), using the first \(n\) coin denominations. Optimal substructure is exhibited in the following equation:
Where \(c_n\) is the vaule of the \(n\)th coin.
Greedy approach We will we be supplying the coin of the greatest denomination until it is possible (the amount left remains non-negative) and then we will continue to smaller denominations until completion. The proof that this is optimal is beyond the scope of this course. However, you might think of a simplified case when coin denominations are consecutive powers of a positive integer \(c\) greater than \(1\) so \(c^0,c^1,c^2,c^3...\). The proof should be challenging in this case. Feel free to give it a go!
# we assume that1 is in m, so the algorithm can always terminate
# W should be positive
def coinChangingGreedy(m,W):
assert len(m) and 1 in m and W > 0
# sort non-increasing
m.sort()
m = m[::-1]
# set up the vars
res = 0
i = 0
while W > 0:
if W >= m[i]:
W -= m[i]
res += 1
else:
i += 1
return res
print(coinChangingGreedy([1,2,5,10,20],65))
4
Exercises#
Mice to Holes There are
N
mice andN
holes on a straight line. You are given a listmice
with mice coordinates andholes
with holes’ coordinates. A mouse can move 1 unit to the left or right per time step. Calculate the minimum number of timesteps for all mice to reach the hole.
HINT: You want to sort both lists.
Answer
The greedy approach pairs the holes with mice by their order in the sorted arrays. Let \(m_1 < m_2\) and \(h1 < h2 \). We will look for the longest mouse-hole distance as this will be our anwser. Clearly:
as \(|m_1 - h_2|\) is the longest of all four. We can therefore make any mice-holes assignment better by a series of swaps until it is sorted. This leads to an optimal solution.
def miceToHoles(mice, holes):
assert len(mice) and len(mice) == len(holes)
mice.sort()
holes.sort()
# you can read about the python zip function
# very useful functional untility
return max([abs(m - h) for (m,h) in zip(mice, holes)])
# test on some cases!
print(miceToHoles([-1,7,8],[5,6,9]))
Hotel booking You are given a list of
arrivals
anddepartures
of guests in a hotel. The hotel hasK
rooms. Decide if the hotel can accommodate all bookings.
Answer
We will mark the arrivals and departures and keep the counter of how many rooms are occupied. Then we will merge the two lists and sort them. We will increase this counter when an arrival happens and decrease it upon departure. If the counter exceeds K
we will return False
.
def hotelBooking(arrivals, departures,K):
# marking the lists
arrivals = [(x,1) for x in arrivals]
departures = [(x,-1) for x in departures]
moves = arrivals + departures
moves.sort()
counter = 0
for i in moves:
counter += i[1]
if counter > K:
return False
return True
print(hotelBooking([1,3,5],[2,6,8],1))
References#
Cormen, Leiserson, Rivest, Stein, “Introduction to Algorithms”, Third Edition, MIT Press, 2009
GeeksforGeeks, Assign Mice to Holes
Sebastian Raschka, Github, Introduction to Greedy Algorithms