C# Collections that Every Developer Must Know ~ Dotnet Guru ~

C# Collections that Every C# Developer Must Know

List
Dictionary
HashSet
Stack
Queue

List<T>

Represents a list of objects that can be accessed by an index. <T> here means this is a generic list.

Unlike arrays that are fixed in size, lists can grow in size dynamically. That’s why they’re also called dynamic arrays or vectors. Internally, a list uses an array for storage. If it becomes full, it’ll create a new larger array, and will copy items from the existing array into the new one.

These days, it’s common to use lists instead of arrays, even if you’re working with a fixed set of items.

To create a list:

var list = new List<int>();

If you plan to store large number of objects in a list, you can reduce the cost of reallocations of the internal array by setting an initial size:

// Creating a list with an initial size
var list = new List<int>(10000);

Here are some useful operations with lists:

// Add an item at the end of the list
list.Add(4);
 
// Add an item at index 0
list.Insert(4, 0);
 
// Remove an item from list
list.Remove(1);
 
// Remove the item at index 0
list.RemoveAt(0);
 
// Return the item at index 0
var first = list[0];
 
// Return the index of an item
var index = list.IndexOf(4);
 
// Check to see if the list contains an item
var contains = list.Contains(4);
 
// Return the number of items in the list 
var count = list.Count;
 
// Iterate over all objects in a list
foreach (var item in list)
    Console.WriteLine(item);

Now, let’s see where a list performs well and where it doesn’t.

Adding/Removing Items at the Beginning or Middle

If you add/remove an item at the beginning or middle of a list, it needs to shift one or more items in its internal array. In the worst case scenario, if you add/remove an item at the very beginning of a list, it needs to shift all existing items. The larger the list, the more costly this operation is going to be. We specify the cost of this operation using Big O notation: O(n), which simply means the cost increases linearly in direct proportion to the size of the input. So, as n grows, the execution time of the algorithm increases in direct proportion to n.

Adding/Removing Items at the End

Adding/removing an item at the end of a list is a relatively fast operation and does not depend on the size of the list. The existing items do not have to be shifted. This is why the cost of this operation is relatively constant and is not dependent on the number of items in the list. We represent the execution cost of this operation with Big O notation: O(1). So, 1 here means constant.

Searching for an Item

When using methods that involve searching for an item(e.g. IndexOf, Contains and Find), List performs a linear search. This means, it iterates over all items in its internal array and if it finds a match, it returns it. In the worst case scenario, if this item is at the end of the list, all items in the list need to be scanned before finding the match. Again, this is another example of O(n), where the cost of finding a match is linear and in direct proportion with the number of elements in the list.

Accessing an Item by an Index

This is what lists are good at. You can use an index to get an item in a list and no matter how big the list is, the cost of accessing an item by index remains relatively constant, hence O(1).

Dictionary<TKey, TValue>

Dictionary is a collection type that is useful when you need fast lookups by keys. For example, imagine you have a list of customers and as part of a task, you need to quickly look up a customer by their ID (or some other unique identifier, which we call key).

When storing or retrieving an object in a dictionary, you need to supply a key. The key is a value that uniquely identifies an object and cannot be null. For example, to store a Customer in a Dictionary, you can use CustomerID as the key.

To create a dictionary, first you need to specify the type of keys and values:

var dictionary = new Dictionary<int, Customer>();

Here, our dictionary uses int keys and Customer values. So, you can store a Customer object in this dictionary as follows:

dictionary.Add(customer.Id, customer);

Later, you can look up customers by their IDs very quickly:

// Return the customer with ID 1234 
var customer = dictionary[1234];

So, why are dictionary look ups so fast? A dictionary internally stores objects in an array, but unlike a list, where objects are added at the end of the array (or at the provided index), the index is calculated using a hash function. So, when we store an object in a dictionary, it’ll call the GetHashCode method on the key of the object to calculate the hash. The hash is then adjusted to the size of the array to calculate the index into the array to store the object. Later, when we lookup an object by its key, GetHashCode method is used again to calculate the hash and the index. As you learned earlier, looking up an object by index in an array is a fast operation with O(1). So, unlike lists, looking up an object in a dictionary does not require scanning every object and no matter how large the dictionary is, it’ll remain extremely fast.

So, in the following figure, when we store this object in a dictionary, the GetHashCode method on the key is called. Let’s assume it returns 1234. This hash value is then adjusted based on the size of the internal array. In this figure, length of the internal array is 6. So, the remainder of the division of 1234 by 6 is used to calculate the index (in this case 4). Later, when we need to look up this object, its key used again to calculate the index.

HashSet<T>

A HashSet represents a set of unique items, just like a mathematical set (e.g. { 1, 2, 3 }). A set cannot contain duplicates and the order of items is not relevant. So, both { 1, 2, 3 } and { 3, 2, 1 } are equal.

Use a HashSet when you need super fast lookups against a unique list of items.

A HashSet, similar to a Dictionary, is a hash-based collection, so look ups are very fast with O(1). But unlike a dictionary, it doesn’t store key/value pairs; it only stores values.

To create a HashSet:

var hashSet = new HashSet<int>();

Stack<T>

Stack is a collection type with Last-In-First-Out (LIFO) behaviour. We often use stacks in scenarios where we need to provide the user with a way to go back.

Think of your browser. As you navigate to different web sites, these addresses that you visit are pushed on a stack. Then, when you click the back button, the item on the stack (which represents the current address in the browser) is popped and now we can get the last address you visited from the item on the stack. The undo feature in applications is implemented using a stack as well.

Here is how you can use a Stack in C#:

var stack = new Stack<string>();
             
// Push items in a stack
stack.Push("http://www.google.com");
 
// Check to see if the stack contains a given item 
var contains = stack.Contains("http://www.google.com");
 
// Remove and return the item on the top of the stack
var top = stack.Pop();
 
// Return the item on the top of the stack without removing it 
var top = stack.Peek();
 
// Get the number of items in stack 
var count = stack.Count;
 
// Remove all items from stack 
stack.Clear();

Queue<T>

Queue represents a collection with First-In-First-Out (FIFO) behaviour. We use queues in situations where we need to process items as they arrive.

Three main operations on queue include:

Enqueue: adding an element to the end of a queue
Dequeue: removing the element at the front of the queue
Peek: inspecting the element at the front without removing it.

Here is how you can use a queue:

var queue = new Queue<string>();
 
// Add an item to the queue
queue.Enqueue("transaction1");
 
// Check to see if the queue contains a given item 
var contains = queue.Contains("transaction1");
 
// Remove and return the item on the front of the queue
var front = queue.Dequeue();
 
// Return the item on the front without removing it 
var top = queue.Peek();
             
// Remove all items from queue 
queue.Clear();
 
// Get the number of items in the queue
var count = queue.Count;

Difference Between Hash Table and Dictionary

Hash table and Dictionary are collection of data structures to hold data as key-value pairs. The Hashtable is a weakly typed data structure, so you can add keys and values of any Object Type to the Hashtable. The Dictionary class is a strongly types <T Key, T Value > and you must specify the data types for both the key and value.

Coming to difference between HashTable & Dictionary, Dictionary is generic whereas Hastable is not Generic. We can add any type of object to HashTable, but while reteriving we need to Cast it to the required Type. So it is not type safe. But to dictionary, while declaring itself we can specify the type of Key & Value ,so no need to cast while retrieving.

Let me explain it with an Example.

HashTable Program:

public class HashTableProgram
{
    static void Main(string[] args)
    {
        Hashtable ht = new Hashtable();
        ht.Add(1, "One");
        ht.Add(2, "Two");
        ht.Add(3, "Three");
        foreach (DictionaryEntry de in ht)
        {
            int Key = (int)de.Key; //Casting
            string value = de.Value.ToString(); //Casting
            Console.WriteLine(Key + " " + value);
        }
    }
}

Dictionary Example :

class DictionaryProgram
    {
        static void Main(string[] args)
        {
            Dictionary<int, string> dt = new Dictionary<int, string>();
            dt.Add(1, "One");
            dt.Add(2, "Two");
            dt.Add(3, "Three");
            foreach (KeyValuePair<int, String> kv in dt)
            {
                Console.WriteLine(kv.Key + " " + kv.Value);
            }
        }
    }

Dictionary
Trying to access an in existent key gives runtime error in Dictionary.
Dictionary is a generic type
Generic collections are a lot faster as there's no boxing/unboxing
Dictionary public static members are thread safe, but any instance members are not guaranteed to be thread safe.
Dictionary is preferred than Hash table

Hash table
Trying to access an in existent key gives null instead of error.
Hash table is a non-generic type
Hash table also have to box/unbox, which may have memory consumption as well as performance penalties.
Hash table is thread safe for use by multiple reader threads and a single writing thread
This is an older collection that is obsoleted by the Dictionary collection. Knowing how to use it is critical when maintaining older programs.

Summary

Lists are fast when you need to access an element by index, but searching for an item in a list is slow since it requires a linear search.
Dictionaries provide fast lookups by key. Keys should be unique and cannot be null.
HashSets are useful when you need fast lookups to see if an element exists in a set or not.
Stacks provide LIFO (Last-In-First-Out) behaviour and are useful when you need to provide the user with a way to go back.
Queues provide FIFO (First-In-First-Out) behaviour and are useful to process items in the order arrived.

Dotnet Guru ~

Monday, 20 April 2015

C# Collections that Every Developer Must Know

0 comments:

Post a Comment

Topics

Pages

Dotnet Guru Archives