Video: Dictionaries, Hash Tables and Sets (Bulgarian)
August 01, 2020
Nikolay Kostov (Nikolay.IT)
Topics covered:
Dictionaries
Hash Tables
Hashing
Collisions
Dictionary<TKey, TValue> Class
Sets: HashSet<T> and SortedSet<T>
Video (in Bulgarian)
Presentation Content
The Dictionary (Map) ADT
The abstract data type (ADT) "dictionary" maps key to values
Also known as "map" or "associative array"
Contains a set of (key, value) pairs
Dictionary ADT operations:
Add(key, value)
FindByKey(key) → value
Delete(key)
Can be implemented in several ways
List, array, hash table, balanced tree, …
Hash Table
A hash table is an array that holds a set of(key, value) pairs
The process of mapping a key to a positionin a table is called hashing
T
h(k) Hash table of size m Hash function h: k → 0 … m-1
Hash Functions and Hashing
A hash table has m slots, indexed from 0 to m-1
A hash function h(k) maps keys to positions:
h: k → 0 … m-1
For any value k in the key range and some hash function h we have h(k)=p and 0≤p<m
T
h(k)
Hashing Functions
Perfect hashing function (PHF)
h(k) : one-to-one mapping of each key k to an integer in the range [0, m-1]
The PHF maps each key to a distinct integer within some manageable range
Finding a perfect hashing function is in most cases impossible
More realistically
Hash function h(k) that maps most of the keys onto unique integers, but not all
Collisions in a Hash Table
A collision is the situation when different keys have the same hash value
h(k1)=h(k2) for k1≠k2
When the number of collisions is sufficiently small, the hash tables work quite well (fast)
Several collisions resolutionstrategies exist
Chaining in a list
Using the neighboring slots (linear probing)
Re-hashing (second hash function)
…
Hash Tables and Efficiency
Hash tables are the most efficient implementation of ADT “dictionary”
Add / Find / Delete take just few primitive operations
Speed does not depend on the size of the hash-table (constant time)
Amortized complexity O(1)
Example: finding an element in a hash-table with 1 000 000 elements, takes just few steps
Finding an element in array of 1 000 000 elements takes average 500 000 steps
Dictionary<TKey,TValue>
Implements the ADT dictionary as hash table
The size is dynamically increased as needed
Contains a collection of key-value pairs
Collisions are resolved by chaining
Elements have almost random order
Ordered by the hash code of the key
Dictionary<TKey,TValue> relies on
Object.Equals() – for comparing the keys
Object.GetHashCode() – for calculating the hash codes of the keys
Dictionary<TKey,TValue> (2)
Major operations:
Add(TKey,TValue) – adds an element with the specified key and value
Remove(TKey) – removes the element by key
this[] – get / add / replace of element by key
Clear() – removes all elements
Count – returns the number of elements
Keys – returns a collection of the keys
Values – returns a collection of the values
Dictionary<TKey,TValue> (3)
Major operations:
ContainsKey(TKey) – checks whether the dictionary contains given key
ContainsValue(TValue) – checks whether the dictionary contains given value
Warning: slow operation – O(n)
TryGetValue(TKey,outTValue)
If the key is found, returns it in the TValue
Otherwise returns false
Dictionary<TKey,TValue> – Example
Dictionary<string, int> studentsMarks =
new Dictionary<string, int>();
studentsMarks.Add("Ivan", 4);
studentsMarks.Add("Peter", 6);
studentsMarks.Add("Maria", 6);
studentsMarks.Add("George", 5);
int peterMark = studentsMarks["Peter"];
Console.WriteLine("Peter's mark: {0}", peterMark);
Console.WriteLine("Is Peter in the hash table: {0}",
studentsMarks.ContainsKey("Peter"));
Console.WriteLine("Students and grades:");
foreach (var pair in studentsMarks)
{
Console.WriteLine("{0} --> {1}", pair.Key, pair.Value);
}
Counting the Words in a Text
string text = "a text, some text, just some text";
IDictionary<string, int> wordsCount =
new Dictionary<string, int>();
string[] words = text.Split(' ', ',', '.');
foreach (string word in words)
{
int count = 1;
if (wordsCount.ContainsKey(word))
count = wordsCount[word] + 1;
wordsCount[word] = count;
}
foreach(var pair in wordsCount)
{
Console.WriteLine("{0} -> {1}", pair.Key, pair.Value);
}
Data structures can be nested, e.g. dictionary of lists: Dictionary<string, List<int>>
Nested Data Structures
static Dictionary<string, List<int>> studentGrades =
new Dictionary<string, List<int>>();
privatestaticvoidAddGrade(string name, int grade)
{
if (! studentGrades.ContainsKey(name))
{
studentGrades[name] = new List<int>();
}
studentGrades[name].Add(grade);
}
SortedDictionary<TKey,TValue>
SortedDictionary<TKey,TValue> implements the ADT “dictionary” as self-balancing search tree
Elements are arranged in the tree ordered by key
Traversing the tree returns the elements in increasing order
Add / Find / Delete perform log2(n) operations
Use SortedDictionary<TKey,TValue> when you need the elements sorted by key
Otherwise use Dictionary<TKey,TValue> – it has better performance
Counting Words (Again)
string text = "a text, some text, just some text";
IDictionary<string, int> wordsCount =
new SortedDictionary<string, int>();
string[] words = text.Split(' ', ',', '.');
foreach (string word in words)
{
int count = 1;
if (wordsCount.ContainsKey(word))
count = wordsCount[word] + 1;
wordsCount[word] = count;
}
foreach(var pair in wordsCount)
{
Console.WriteLine("{0} -> {1}", pair.Key, pair.Value);
}
IComparable<T>
Dictionary<TKey,TValue> relies on
Object.Equals() – for comparing the keys
Object.GetHashCode() – for calculating the hash codes of the keys
SortedDictionary<TKey,TValue> relies on IComparable<T> for ordering the keys
Built-in types like int, long, float, string and DateTime already implement Equals(), GetHashCode() and IComparable<T>
Other types used when used as dictionary keys should provide custom implementations
Implementing Equals() and GetHashCode()
publicstruct Point
{
publicint X { get; set; }
publicint Y { get; set; }
publicoverrideboolEquals(Object obj)
{
if (!(obj is Point) || (obj == null)) returnfalse;
Point p = (Point)obj;
return (X == p.X) && (Y == p.Y);
}
publicoverrideintGetHashCode()
{
return (X << 16 | X >> 16) ^ Y;
}
}
Implementing IComparable<T>
publicstruct Point : IComparable<Point>
{
publicint X { get; set; }
publicint Y { get; set; }
publicintCompareTo(Point otherPoint)
{
if (X != otherPoint.X)
{
returnthis.X.CompareTo(otherPoint.X);
}
else
{
returnthis.Y.CompareTo(otherPoint.Y);
}
}
}
Set and Bag ADTs
The abstract data type (ADT) "set" keeps a set of elements with no duplicates
Sets with duplicates are also known as ADT "bag"
Set operations:
Add(element)
Contains(element) → true / false
Delete(element)
Union(set) / Intersect(set)
Sets can be implemented in several ways
List, array, hash table, balanced tree, …
HashSet<T>
HashSet<T> implements ADT set by hash table
Elements are in no particular order
All major operations are fast:
Add(element) – appends an element to the set
Does nothing if the element already exists
Remove(element) – removes given element
Count – returns the number of elements
UnionWith(set) / IntersectWith(set) – performs union / intersection with another set
Write a program that extracts from a given sequence of strings all elements that present in it odd number of times. Example:
{C#, SQL, PHP, PHP, SQL, SQL } → {C#, SQL}
Exercises (2)
Implement the data structure "hash table" in a class HashTable<K,T>. Keep the data in array of lists of key-value pairs (LinkedList<KeyValuePair<K,T>>[]) with initial capacity of 16. When the hash table load runs over 75%, perform resizing to 2 times larger capacity. Implement the following methods and properties: Add(key,value), Find(key)→value, Remove( key), Count, Clear(), this[], Keys. Try to make the hash table to support iterating over its elements with foreach.
Implement the data structure "set" in a class HashedSet<T> using your class HashTable<K,T> to hold the elements. Implement all standard set operations like Add(T), Find(T), Remove(T), Count, Clear(), union and intersect.