Big O notation
Contents
Big O notation#
Introduction and notation#
In the notebook on Taylor series we arrived at the infinite Taylor series about (or around) the point \(x_0\):
and we pointed out that an equivalent way of writing this expansion is
or replace \(h\) by the notation \(\Delta x\) or \(\delta x\).
When talking about terms in infinite series (terms we will often be forces to truncate, i.e. throw away), or talking about errors, convergence, complexity, run times etc, so-called Big-O is very useful.
What this means is that mathematically we have some notation that allows us to write these infinite expansions in forms such as
or
But what does this mean?
Mathematical explanation#
Well simply \(\mathcal{O}\) here means “order”.
More formally it is used to signify (place bounds on) the limiting behaviour of the magnitude of a mathematical term (or an algorithm’s runtime).
So based on the the example above we have simply replaced the terms
with
“in the limit” (and implicitly for this use application we are interested in the limit as \(h\rightarrow 0\))
i.e. we have written that
this is stating that as \(h\) tends to zero the LHS can bounded in magnitude by a term of the form
where \(C\) is a constant, i.e. there exists a constant \(C\) such that for all sufficiently small \(h\)
In our case the point is that, assuming \(f\) is a well-behaved function, i.e. it’s derivatives are bounded, then there will be a small enough \(h\) such that the terms dependent on the 5th and higher powers or \(h\) are very small relative to the first term
They can’t be ignored, but we can select a \(C\) which is a bit larger than original factor \({h^4}/{4!}\) so that the above holds.
The point here isn’t to actually find \(C\), this is just notation to help convey a point.
It’s important when analysing errors in algorithms.
Note that for this case we were considering the limit of small \(h\) and so the lowest power eventually becomes dominant, the opposite situation would occur if we were considering the limit of large \(h\).
With errors we are generally thinking about the former case, with run-times (where \(h\) might be replaced with some measure of the size of the problem) we are in the latter case.
Example#
Consider
in the limit as \(y\rightarrow 0\).
We can show that
as we are considering the case as \(y\) gets small. Hence we can write
If we were considering the case of \(y\rightarrow \infty\), then we would write instead
When dealing with numerical errors you may see things like
For larger values of \(\Delta t\) the second term will clearly dominate due to the relative size of the constant in front of it compared to the first term.
But when performing a convergence analysis, i.e. investigating how the error drops as \(\Delta t\) is reduced, there will come a point for small enough \(\Delta t\) that the first term starts to dominate, and from the point onwards the error will decay linearly rather than quadratically, i.e. halving \(\Delta t\) leads to a reduction in the error by a factor of 2 rather than 4.
So we would say that
even though at larger \(\Delta t\) values the error would be observed to decay quadratically as we reduce \(\Delta t\).
We use the expression “in the asymptotic limit” to refer to region of parameter space where the leading order behaviour dominates. For the above example if we observe something that looks like second order behaviour (when we expect first) we would explain this away by saying that we are not in the asymptotic limit. If we see very close to first order behaviour then we would say we are in the asymptotic limit.
Use in Computer/Computational Science#
For the cost of algorithms we use the notation in a similar manner.
An algorithm is said to have (time or algorithmic or computational) complexity of \(\mathcal{O}(n^2)\) for example, if for large enough \(n\), where \(n\) is a measure of the problem size, the computational cost grows quadratically - for every doubling of the problem size the cost grows by a factor 4.
When it comes to complexity it’s quite common to see things like \(\mathcal{O}(n \log n)\), i.e. the algorithm scales worse than linearly, but not as bad as quadratically.
For some examples see https://en.wikipedia.org/wiki/Computational_complexity_of_mathematical_operations