Algorithmic efficiency enables better software. The efficiency of the algorithms used in an application can impact its overall performance; hence, the importance of measuring the performance and complexity of algorithms through means that are accessible not only to mathematicians but also to any software engineer who aims to excel in algorithm design.
Algorithm complexity measurement should be part of the design process. In order to measure how complex an algorithm is, and to be able to spot the appropriate parts of the algorithm composition in order to solve inconsistencies in algorithm design and analysis, the software engineer has to have an accessible way to understand an algorithm’s complexity in simple terms and come up with viable and efficient solutions to simplify it where possible.
In this series on algorithm design, we will explore Big O notation as a comprehensive way to understand and avoid falling into algorithm design errors that can impact the efficiency of our system. We will explore the fundamentals of algorithm complexity analysis when it comes to constant, linear and quadratic problem sizes. We will also demonstrate (by implementing pseudocode) how to spot unnecessary complexity so that we can prevent it from appearing in our code.
Follow along to explore the world of Big O notation and start implementing the knowledge in real-world scenarios.
Big O Notation
Big O notation (with a capital letter O, not a zero), also called Landau’s symbol, is a symbolism used in complexity theory, computer science, and mathematics to describe the asymptotic behavior of functions. Basically, it tells you how fast a function grows or declines. 1
Big O notation comes in to simplify the complexity measurement process (the approximation). The reason (or at least one of the main reasons) why Big O notation has become widespread as a valid and reliable way to measure complexity may be due to the simplicity of the notation itself.
In Big O notation, you must ignore constants. Constants are determined by (very often) uncontrollable factors such as network, hardware, memory and storage. Therefore, it would make sense to “trim” them from the context since the algorithm will be executed in an uncontrollable environment, isolated from direct and indirect intervention; in other words, algorithms must be measured while keeping in mind the worst-case scenario.
Algorithm Complexity and Performance
Algorithm complexity can be translated as the processing power implemented by an algorithm in a certain amount of time. Such complexity is tightly related to the central processing unit (CPU) usage. Perhaps here a point needs to be clarified regarding complexity and performance: the former is the amount of resources needed for a given program to execute a set of tasks, while the latter refers to how well the program executes these tasks.
The relationship between the two is a one-way relationship, with complexity affecting performance. We can then say that performance is a function of complexity, and therefore it is critical to understand the complexity of an algorithm before calculating its performance. This is a relationship between the number of operations and the problem size.
A constant-time algorithm is less complex than a linear-time algorithm, which, in turn, is less complex than a quadratic-time algorithm: O(1) < O(N) < O(N2).
1. Steps should be added. That is, if you have two steps in your algorithm, that means you end up having O(a+b).
2. Constants must be dropped. Constants are not taken into consideration for Big O since these do not alter the complexity of the algorithm, at least not during the worst-case scenario, which is the one that matters.
3. Different inputs require different problem sizes to represent them. If we needed, for instance, to evaluate the intersection of two arrays, each array should be represented by its own immutable variable.
4. Non-dominant terms should be discarded. The lesser dominant process should be discarded as the complexity’s grade.
The Pigeon and the Internet Bandwidth
There is a story about an experiment (I’m doubtful it really happened) whose main goal was to demonstrate that a pigeon was faster than the Internet available in South Africa. In order to demonstrate this, the company (Unlimited IT) attached a 4GB USB drive storage device and compared the time it took the bird to reach the destination to the speed of the Internet provided at that time (ADSL) by the company Telkom.
While this experiment may sound a little surreal, the example clearly illustrates the difference between constant-time and quadratic-time algorithms. It did not matter how much data the USB drive had in it; it would always take the bird the same amount of time (approximately) to fly between the same two points, while the slow ADSL connection may take longer depending on the problem size (i.e. the bandwidth and amount of data).
In order to better understand the notation itself and the relationship between performance and complexity, the pseudocode algorithms below offer practical examples to understand each algorithm based on its complexity.
Constant-time – O(c)
Linear-time – O(N)
Quadratic-time – O(N^2)
Sum of the problem sizes – O(a+b)
Drop the constants v1 – O(N)
Drop the constants v2 – O(N)
Different inputs use different variables – O(a+b)
Discard non-dominant terms – O(N2)
Big O notation is certainly a very good first step when it comes to producing high-quality software. Performance and availability can be greatly improved by simply understanding the complexity of the algorithms we create; this can also be translated into business results with systems that are fast and responsive, positively impacting the end-user experience and producing cost-efficient outcomes.
1 MIT – Big O notation – http://web.mit.edu/16.070/www/lecture/big_o.pdf
Martin, R. C., Coplien, J. O., Wampler, K., Grenning, J. W., Schuchert, B. L., Langr, J., . . . Feathers, M. C. (2016). Clean code: A handbook of agile software craftsmanship. Upper Saddle River, NJ: Prentice Hall.