How Choosing the Best-Fit C# Collections Can Impact Performance
If you want to create and manage groups of related objects when building an application, you can either use arrays or collections. Unlike arrays, which are most useful for working with a fixed number of strongly typed objects, collections are useful for working with groups of objects that grow and shrink dynamically. C# comes with many built-in collections, so how do you know which one to use? In this article, we’ll explore C# collections to learn how to make the best use of them.
What is a C# collection, and which one should you use?
A C# collection is a way to group different objects so that you can interact with them more easily. For instance, if you’re building an e-commerce app for a supermarket, you might want to group different categories of product inventory to make it easy to perform operations, such as adding or removing those products from stock.
Microsoft .NET provides built-in C# collections that have been optimized for certain operations. There are three classes of collections:
- System.Collections classes
- System.Collections.Generic classes
- System.Collections.Concurrent classes
The C# collection you choose can impact your application’s performance; it can even impact code readability and maintainability. You might be tempted, like many developers, to choose a generic collection, such as List<T>, because you’re already familiar with it or because it’s versatile enough to fit most use cases.
A better approach is to match a collection to your use case. Writing in Code Project, Arthur Minduca advises, “When you use a more dedicated collection, you explicitly specify how it should be used. A mere declaration naturally reduces the set of usage possibilities and keeps its affordance clear.”
How collections impact your applications
Consider this example. You’re building an application to sell tickets to events. To ensure a first-in, first-processed approach that treats customers fairly, the app will need to process orders from a very large number of customers while maintaining the order in which each order was registered. To handle this use case, you could choose a Queue<T> or a List<T> collection.
Below, you can see an overly simplified example of the code for each implementation:
At first glance, each example appears valid, and neither has much complexity. Look closer and you can see how each collection results in different impacts to performance and affordance.
Understanding how collections can impact performance
In terms of performance, Microsoft’s documentation specifies that the Queue<T>.Dequeue() method (line 30 in the example above) offers better performance because it;s a O(1), or constant, operation. The alternative List<T>.RemoveAt(Int32) method (line 43) is a O(n), or linear, operation.
In this example, the Queue<T> method will take the same amount of time to process orders, regardless of the size of the collection. This would be an ideal scenario for the ticket app use case.
On the other hand, the List<T> method would increase the amount of time to process orders depending directly on the volume of orders. In general, it’s not recommended to use linear operations inside loops as in the previous List<T> example.
Understanding how affordance impacts code readability and maintainability
An affordance is what a user can do with an object based on the user’s capabilities; it is the action possible between a user and an object. For example, a door affords opening, but only if you can reach the handle or have a key to unlock it. Affordance is an important consideration in choosing a collection.
In the ticket app example, purchases must be processed in the order in which they were registered. Conveniently, here is one collection created for just that purpose: the Queue<T>. The affordance of the Queue<T> method makes it clear to other potential developers, without ever reading the project’s original specifications, that the app must process purchases in the order in which they are registered. This also makes clear that the app isn’t allowed to insert elements at the end or in the middle of the collection. This affordance makes it easy for other developers to quickly understand how an application must function.
Evaluating how a collection fits your use case
As you evaluate which collection to use in your application, ask these questions:
- Does the order in which I’m going to insert and retrieve the elements matter?
- How many elements am I expecting in my collection (hundreds, thousands, millions)?
- Do I need to perform search/lookup operations?
- Do I need to iterate over all the elements in that collection?
- Is sorting the elements crucial to my application?
- Is my collection going to be consumed by multiple clients at the same time (concurrency)?
For example, if the order in which you insert or retrieve the elements matters, you might choose a collection that is optimized to perform these types of operations very fast, such as Stack or Queue.
If you need to iterate through all the elements in a collection, a List<T> could be a good fit.
If you require fast lookup operations, Dictionary<T> is a better option than List<T>.
You can also use a method like Enumerable.ToDictionary() to get the best of both operations, as shown in the example below:
The .Net framework also provides specialized collections for expensive operations like sorting:
- SortedDictionary <TKey, TValue>
- SortedList <TKey, TValue>
These collections both automatically enumerate and sort items while maintaining the order. However, SortedList<TKey, TValue> uses less memory than SortedDictionary <TKey, TValue>, but it performs less well when inserting new elements. According to Microsoft documentation, “Insertion and removal are generally O(n) [for SortedLists]; however, insertion is O(log n) for data that are already in sort order, so that each element is added to the end of the list.”
Understanding collection efficiencies with Big O notations
To understand the relative efficiencies of s C# collection, you can use Big Order, or Big O, notations. These are the most common, listed here in order of preference:
- O(1) or constant: an operation takes the same amount of time to be performed, regardless of collection size. This is the ideal situation.
- O(log (n)) or logarithmic: the number of operations grows slowly as the input size grows.
- O(n) or linear: an operation will take more time to process the more data you have, with the amount of processing time increasing in a linear fashion. This means that each item in the algorithm will have to be processed one at a time.
- O(n2) or quadratic: an algorithm whose performance is directly proportional to the squared size of the input data set.
In Big O notation, summarized in the table below, n is the size of the input and f(n) is the running time of the algorithm relative to input size. In general, aim to avoid linear and quadratic orders for large collections because the amount of time required grows according to the number of elements.
|Lookup||O(n)||O(n)||O(1)||O(log n)||O(n)||O(n)||O(n)||O(log n)|
|Insert / Add||–||O(n)||O(1)||O(n)||O(1)||O(1)||O(1)||O(log n)|
Aiming for the best fit C# collection to improve your application performance
Armed with a general understanding of C# collections equips you to ask the right questions and make informed decisions about which collection best suits your needs.Choosing the best collection can cause a significant performance improvement in a project, especially when you are working with large sets of data.
Always check Microsoft’s documentation for a method’s order type, particularly if you’re going to call that method inside of a loop. Keep in mind that software development is collaborative, and other developers can benefit from your ability to choose the right collection, making it easier for them to understand your code’s intent, as well as helping to keep the code base more readable and maintainable.