Definition
The Big-O notation is at its heart a mathematical notation, used to compare the rate of convergence of functions. Let n -> f(n)
and n -> g(n)
be functions defined over the natural numbers. Then we say that f = O(g)
if and only if f(n)/g(n)
is bounded when n approaches infinity. In other words, f = O(g)
if and only if there exists a constant A, such that for all n, f(n)/g(n) <= A
.
Actually the scope of the Big-O notation is a bit wider in mathematics but for simplicity I have narrowed it to what is used in algorithm complexity analysis : functions defined on the naturals, that have non-zero values, and the case of n growing to infinity.
What does it mean ?
Let's take the case of f(n) = 100n^2 + 10n + 1
and g(n) = n^2
. It is quite clear that both of these functions tend to infinity as n tends to infinity. But sometimes knowing the limit is not enough, and we also want to know the speed at which the functions approach their limit. Notions like Big-O help compare and classify functions by their speed of convergence.
Let's find out if f = O(g)
by applying the definition. We have f(n)/g(n) = 100 + 10/n + 1/n^2
. Since 10/n
is 10 when n is 1 and is decreasing, and since 1/n^2
is 1 when n is 1 and is also decreasing, we have ̀f(n)/g(n) <= 100 + 10 + 1 = 111
. The definition is satisfied because we have found a bound of f(n)/g(n)
(111) and so f = O(g)
(we say that f is a Big-O of n^2
).
This means that f tends to infinity at approximately the same speed as g. Now this may seem like a strange thing to say, because what we have found is that f is at most 111 times bigger than g, or in other words when g grows by 1, f grows by at most 111. It may seem that growing 111 times faster is not "approximately the same speed". And indeed the Big-O notation is not a very precise way to classify function convergence speed, which is why in mathematics we use the equivalence relationship when we want a precise estimation of speed. But for the purposes of separating algorithms in large speed classes, Big-O is enough. We don't need to separate functions that grow a fixed number of times faster than each other, but only functions that grow infinitely faster than each other. For instance if we take h(n) = n^2*log(n)
, we see that h(n)/g(n) = log(n)
which tends to infinity with n so h is not O(n^2), because h grows infinitely faster than n^2.
Now I need to make a side note : you might have noticed that if f = O(g)
and g = O(h)
, then f = O(h)
. For instance in our case, we have f = O(n^3)
, and f = O(n^4)
... In algorithm complexity analysis, we frequently say f = O(g)
to mean that f = O(g)
and g = O(f)
, which can be understood as "g is the smallest Big-O for f". In mathematics we say that such functions are Big-Thetas of each other.
How is it used ?
When comparing algorithm performance, we are interested in the number of operations that an algorithm performs. This is called time complexity. In this model, we consider that each basic operation (addition, multiplication, comparison, assignment, etc.) takes a fixed amount of time, and we count the number of such operations. We can usually express this number as a function of the size of the input, which we call n. And sadly, this number usually grows to infinity with n (if it doesn't, we say that the algorithm is O(1)). We separate our algorithms in big speed classes defined by Big-O : when we speak about a "O(n^2) algorithm", we mean that the number of operations it performs, expressed as a function of n, is a O(n^2). This says that our algorithm is approximately as fast as an algorithm that would do a number of operations equal to the square of the size of its input, or faster. The "or faster" part is there because I used Big-O instead of Big-Theta, but usually people will say Big-O to mean Big-Theta.
When counting operations, we usually consider the worst case: for instance if we have a loop that can run at most n times and that contains 5 operations, the number of operations we count is 5n. It is also possible to consider the average case complexity.
Quick note : a fast algorithm is one that performs few operations, so if the number of operations grows to infinity faster, then the algorithm is slower: O(n) is better than O(n^2).
We are also sometimes interested in the space complexity of our algorithm. For this we consider the number of bytes in memory occupied by the algorithm as a function of the size of the input, and use Big-O the same way.