Video: Dictionaries, Hash Tables and Sets (Bulgarian)

August 01, 2020
Nikolay Kostov (Nikolay.IT)

Topics covered:

Dictionaries
Hash Tables

Hashing
Collisions

Dictionary<TKey, TValue> Class
Sets: HashSet<T> and SortedSet<T>

Video (in Bulgarian)

Presentation Content

The Dictionary (Map) ADT

The abstract data type (ADT) "dictionary" maps key to values
- Also known as "map" or "associative array"
- Contains a set of (key, value) pairs
Dictionary ADT operations:
- Add(key, value)
- FindByKey(key) → value
- Delete(key)
Can be implemented in several ways
- List, array, hash table, balanced tree, …

Hash Table

A hash table is an array that holds a set of(key, value) pairs
The process of mapping a key to a positionin a table is called hashing
T
h(k)
Hash table of size m
Hash function h: k → 0 … m-1

Hash Functions and Hashing

A hash table has m slots, indexed from 0 to m-1
A hash function h(k) maps keys to positions:
- h: k → 0 … m-1
For any value k in the key range and some hash function h we have h(k) = p and 0 ≤ p < m
T
h(k)

Hashing Functions

Perfect hashing function (PHF)
- h(k) : one-to-one mapping of each key k to an integer in the range [0, m-1]
- The PHF maps each key to a distinct integer within some manageable range
Finding a perfect hashing function is in most cases impossible
More realistically
- Hash function h(k) that maps most of the keys onto unique integers, but not all

Collisions in a Hash Table

A collision is the situation when different keys have the same hash value
- h(k1) = h(k2) for k1 ≠ k2
When the number of collisions is sufficiently small, the hash tables work quite well (fast)
Several collisions resolutionstrategies exist
- Chaining in a list
- Using the neighboring slots (linear probing)
- Re-hashing (second hash function)
- …

Hash Tables and Efficiency

Hash tables are the most efficient implementation of ADT “dictionary”
Add / Find / Delete take just few primitive operations
- Speed does not depend on the size of the hash-table (constant time)
  - Amortized complexity O(1)
- Example: finding an element in a hash-table with 1 000 000 elements, takes just few steps
  - Finding an element in array of 1 000 000 elements takes average 500 000 steps

Dictionary<TKey,TValue>

Implements the ADT dictionary as hash table
- The size is dynamically increased as needed
- Contains a collection of key-value pairs
- Collisions are resolved by chaining
- Elements have almost random order
  - Ordered by the hash code of the key
Dictionary<TKey,TValue> relies on
- Object.Equals() – for comparing the keys
- Object.GetHashCode() – for calculating the hash codes of the keys

Dictionary<TKey,TValue> (2)

Major operations:
- Add(TKey,TValue) – adds an element with the specified key and value
- Remove(TKey) – removes the element by key
- this[] – get / add / replace of element by key
- Clear() – removes all elements
- Count – returns the number of elements
- Keys – returns a collection of the keys
- Values – returns a collection of the values

Dictionary<TKey,TValue> (3)

Major operations:
- ContainsKey(TKey) – checks whether the dictionary contains given key
- ContainsValue(TValue) – checks whether the dictionary contains given value
  - Warning: slow operation – O(n)
- TryGetValue(TKey, out TValue)
  - If the key is found, returns it in the TValue
  - Otherwise returns false

Dictionary<TKey,TValue> – Example

Dictionary<string, int> studentsMarks =
  new Dictionary<string, int>();
studentsMarks.Add("Ivan", 4);
studentsMarks.Add("Peter", 6);
studentsMarks.Add("Maria", 6);
studentsMarks.Add("George", 5);
int peterMark = studentsMarks["Peter"];
Console.WriteLine("Peter's mark: {0}", peterMark);
Console.WriteLine("Is Peter in the hash table: {0}",
  studentsMarks.ContainsKey("Peter"));
Console.WriteLine("Students and grades:");
foreach (var pair in studentsMarks)
{
  Console.WriteLine("{0} --> {1}", pair.Key, pair.Value);
}

Counting the Words in a Text

string text = "a text, some text, just some text";
IDictionary<string, int> wordsCount = 
  new Dictionary<string, int>(); 
string[] words = text.Split(' ', ',', '.');
foreach (string word in words)
{
  int count = 1;
  if (wordsCount.ContainsKey(word))
    count = wordsCount[word] + 1;
  wordsCount[word] = count;
}
foreach(var pair in wordsCount)
{
  Console.WriteLine("{0} -> {1}", pair.Key, pair.Value);
}

Data structures can be nested, e.g. dictionary of lists: Dictionary<string, List<int>>

Nested Data Structures

static Dictionary<string, List<int>> studentGrades = 
    new Dictionary<string, List<int>>();
private static void AddGrade(string name, int grade)
{
    if (! studentGrades.ContainsKey(name))
    {
        studentGrades[name] = new List<int>();
    }
    studentGrades[name].Add(grade);
}

SortedDictionary<TKey,TValue>

SortedDictionary<TKey,TValue> implements the ADT “dictionary” as self-balancing search tree
- Elements are arranged in the tree ordered by key
- Traversing the tree returns the elements in increasing order
- Add / Find / Delete perform log2(n) operations
Use SortedDictionary<TKey,TValue> when you need the elements sorted by key
- Otherwise use Dictionary<TKey,TValue> – it has better performance

Counting Words (Again)

string text = "a text, some text, just some text";
IDictionary<string, int> wordsCount = 
  new SortedDictionary<string, int>(); 
string[] words = text.Split(' ', ',', '.');
foreach (string word in words)
{
  int count = 1;
  if (wordsCount.ContainsKey(word))
    count = wordsCount[word] + 1;
  wordsCount[word] = count;
}
foreach(var pair in wordsCount)
{
  Console.WriteLine("{0} -> {1}", pair.Key, pair.Value);
}

IComparable<T>

Dictionary<TKey,TValue> relies on
- Object.Equals() – for comparing the keys
- Object.GetHashCode() – for calculating the hash codes of the keys
SortedDictionary<TKey,TValue> relies on IComparable<T> for ordering the keys
Built-in types like int, long, float, string and DateTime already implement Equals(), GetHashCode() and IComparable<T>
- Other types used when used as dictionary keys should provide custom implementations

Implementing Equals() and GetHashCode()

public struct Point
{
  public int X { get; set; }
  public int Y { get; set; }
  public override bool Equals(Object obj)
  {
    if (!(obj is Point) || (obj == null)) return false;
    Point p = (Point)obj;
    return (X == p.X) && (Y == p.Y);
  }
  public override int GetHashCode()
  {
      return (X << 16 | X >> 16) ^ Y;
  }
}

Implementing IComparable<T>

public struct Point : IComparable<Point>
{
  public int X { get; set; }
  public int Y { get; set; }
  public int CompareTo(Point otherPoint)
  {
    if (X != otherPoint.X)
    {
      return this.X.CompareTo(otherPoint.X);
    }
    else
    {
      return this.Y.CompareTo(otherPoint.Y);
    }
  }
}

Set and Bag ADTs

The abstract data type (ADT) "set" keeps a set of elements with no duplicates
Sets with duplicates are also known as ADT "bag"
Set operations:
- Add(element)
- Contains(element) → true / false
- Delete(element)
- Union(set) / Intersect(set)
Sets can be implemented in several ways
- List, array, hash table, balanced tree, …

HashSet<T>

HashSet<T> implements ADT set by hash table
- Elements are in no particular order
All major operations are fast:
- Add(element) – appends an element to the set
  - Does nothing if the element already exists
- Remove(element) – removes given element
- Count – returns the number of elements
- UnionWith(set) / IntersectWith(set) – performs union / intersection with another set

HashSet<T> – Example

ISet<string> firstSet = new HashSet<string>(
   new string[] { "SQL", "Java", "C#", "PHP"  });
ISet<string> secondSet = new HashSet<string>(
   new string[] { "Oracle", "SQL", "MySQL"  });
ISet<string> union = new HashSet<string>(firstSet);
union.UnionWith(secondSet);
PrintSet(union); // SQL Java C# PHP Oracle MySQL
private static void PrintSet<T>(ISet<T> set)
{
   foreach (var element in set)
   {
      Console.Write("{0} ", element);
   }
   Console.WriteLine();
}

SortedSet<T>

SortedSet<T> implements ADT set by balanced search tree (red-black tree)
- Elements are sorted in increasing order
Example:

ISet<string> firstSet = new SortedSet<string>(
   new string[] { "SQL", "Java", "C#", "PHP" });
ISet<string> secondSet = new SortedSet<string>(
   new string[] { "Oracle", "SQL", "MySQL" });
ISet<string> union = new HashSet<string>(firstSet);
union.UnionWith(secondSet);
PrintSet(union); // C# Java PHP SQL MySQL Oracle

Summary

Dictionaries map key to value
- Can be implemented as hash table or balanced search tree
Hash-tables map keys to values
- Rely on hash-functions to distribute the keys in the table
- Collisions needs resolution algorithm (e.g. chaining)
- Very fast add / find / delete – O(1)
Sets hold a group of elements
- Hash-table or balanced tree implementations

Exercises

Write a program that counts in a given array of double values the number of occurrences of each value. Use Dictionary<TKey,TValue>.
- Example: array = {3, 4, 4, -2.5, 3, 3, 4, 3, -2.5}
- -2.5 → 2 times
- 3 → 4 times
- 4 → 3 times
Write a program that extracts from a given sequence of strings all elements that present in it odd number of times. Example:
- {C#, SQL, PHP, PHP, SQL, SQL } → {C#, SQL}

Exercises (2)

Implement the data structure "hash table" in a class HashTable<K,T>. Keep the data in array of lists of key-value pairs (LinkedList<KeyValuePair<K,T>>[]) with initial capacity of 16. When the hash table load runs over 75%, perform resizing to 2 times larger capacity. Implement the following methods and properties: Add(key, value), Find(key)→value, Remove( key), Count, Clear(), this[], Keys. Try to make the hash table to support iterating over its elements with foreach.
Implement the data structure "set" in a class HashedSet<T> using your class HashTable<K,T> to hold the elements. Implement all standard set operations like Add(T), Find(T), Remove(T), Count, Clear(), union and intersect.