Video: Introduction to Data Structures, Algorithms and Complexity (Bulgarian)
June 24, 2020
Nikolay Kostov (Nikolay.IT)
Topics covered:
Data Structures Overview
Linear Structures, Trees, Hash Tables, Others
Algorithms Overview
Algorithms Complexity
Time and Memory Complexity
Mean, Average and Worst Case
Asymptotic Notation O(g)
Video (in Bulgarian)
Presentation Content
What is a Data Structure?
“In computer science, a data structure is a particular way of storing and organizing data in a computer so that it can be used efficiently.”
-- Wikipedia
Examples of data structures:
Person structure (first name + last name + age)
Array of integers – int[]
List of strings – List
Queue of people – Queue
Why are Data Structures so Important?
Data structures and algorithms are the foundation of computer programming
Algorithmic thinking, problem solving and data structures are vital for software engineers
All .NET developers should know when to use T[], LinkedList, List, Stack, Queue, Dictionary<K,T>, HashSet, SortedDictionary<K,T> and SortedSet
Computational complexity is important for algorithm design and efficient programming
Primitive Types and Collections
Primitive data types
Numbers: int, float, decimal, …
Text data: char, string, …
Simple structures
A group of fields stored together
E.g. DateTime, Point, Rectangle, …
Collections
A set of elements (of the same type)
E.g. array, list, stack, tree, hash-table, …
Abstract Data Types (ADT)
An Abstract Data Type (ADT) is
A data type together with the operations, whose properties are specified independently of any particular implementation
ADT are set of definitions of operations
Like the interfaces in C#
ADT can have multiple different implementations
Different implementations can have different efficiency, inner logic and resource needs
Basic Data Structures
Linear structures
Lists: fixed size and variable size
Stacks: LIFO (Last In First Out) structure
Queues: FIFO (First In First Out) structure
Trees and tree-like structures
Binary, ordered search trees, balanced, B-trees, etc.
Dictionaries (maps)
Contain pairs (key, value)
Hash tables: use hash functions to search/insert
Sets and bags
Set – collection of unique elements
Bag – collection of non-unique elements
Ordered sets, bags and dictionaries
Priority queues / heaps
Special tree structures
Suffix tree, interval tree, index tree, trie
Graphs
Directed / undirected
Weighted / un-weighted
Connected / non-connected, …
Algorithms
Introduction
What is an Algorithm?
"In mathematics and computer science, an algorithm is a step-by-step procedure for calculations. An algorithm is an effective method expressed as a finite list of well-defined instructions for calculating a function.”
-- Wikipedia
The term “algorithm” comes from the
Derived from Muḥammad Al-Khwārizmī, a Persian mathematician and astronomer
An algorithm for solving quadratic equations
Algorithms in Computer Science
Algorithms are fundamental in programming
Imperative (traditional) programming means to describe in formal steps how to do something
Algorithm == sequence of operations (steps)
Can include branches (conditional blocks) and repeated logic (loops)
The total number of primitive operations executed (machine independent steps)
Also known as algorithm complexity
Algorithmic Complexity
What to measure?
CPU Time
Memory
Number of steps
Number of particular operations
Number of disk operations
Number of network packets
Asymptotic complexity
Time Complexity
Worst-case
An upper bound on the running time for any input of given size
Average-case
Assume all inputs of a given size are equally likely
Best-case
The lower bound on the running time (the optimal case)
Time Complexity – Example
Sequential search in a list of size n
Worst-case:
n comparisons
Best-case:
1 comparison
Average-case:
n/2 comparisons
The algorithm runs in linear time
Linear number of operations
Algorithms Complexity
Algorithm complexity is a rough estimation of the number of steps performed by given computation depending on the size of the input data
Measured through asymptotic notation
O(g) where g is a function of the input data size
Examples:
Linear complexity O(n) – all elements are processed once (or constant number of times)
Quadratic complexity O(n2) – each of the elements is processed n times
Asymptotic Notation: Definition
Asymptotic upper bound
O-notation (Big O notation)
For given function g(n), we denote by O(g(n)) the set of functions that are different than g(n) by a constant
O(g(n)) = {f(n): there exist positive constants c and n0 such that f(n) <= c*g(n) for all n>=n0}
Examples:
3 * n2 + n/2 + 12 ∈ O(n2)
4nlog2(3n+1) + 2n-1 ∈ O(n * log n)
Typical Complexities
Complexity
Notation
Description
constant
O(1)
Constant number of operations, not depending on the input data size, e.g. n = 1 000 000 → 1-2 operations
logarithmic
O(log n)
Number of operations propor-tional of log2(n) where n is the size of the input data, e.g. n = 1 000 000 000 → 30 operations
linear
O(n)
Number of operations proportional to the input data size, e.g. n = 10 000 → 5 000 operations
Complexity
Notation
Description
quadratic
O(n2)
Number of operations proportional to the square of the size of the input data, e.g. n = 500 → 250 000 operations
cubic
O(n3)
Number of operations propor-tional to the cube of the size of the input data, e.g. n = 200 → 8 000 000 operations
exponential
O(2n), O(kn), O(n!)
Exponential number of operations, fast growing, e.g. n = 20 → 1 048 576 operations
Time Complexity and Speed
Complexity
10
20
50
100
1000
10000
100000
O(1)
< 1 s
< 1 s
< 1 s
< 1 s
< 1 s
< 1 s
< 1 s
O(log(n))
< 1 s
< 1 s
< 1 s
< 1 s
< 1 s
< 1 s
< 1 s
O(n)
< 1 s
< 1 s
< 1 s
< 1 s
< 1 s
< 1 s
< 1 s
O(n*log(n))
< 1 s
< 1 s
< 1 s
< 1 s
< 1 s
< 1 s
< 1 s
O(n2)
< 1 s
< 1 s
< 1 s
< 1 s
< 1 s
2 s
3-4 min
O(n3)
< 1 s
< 1 s
< 1 s
< 1 s
20 s
5 hours
231 days
O(2n)
< 1 s
< 1 s
260 days
hangs
hangs
hangs
hangs
O(n!)
< 1 s
hangs
hangs
hangs
hangs
hangs
hangs
O(nn)
3-4 min
hangs
hangs
hangs
hangs
hangs
hangs
Time and Memory Complexity
Complexity can be expressed as formula on multiple variables, e.g.
Algorithm filling a matrix of size [n x m] with the natural numbers 1, 2, … will run in O(n*m)
A traversal of graph with n vertices and m edges will run in O(n + m)
Memory consumption should also be considered, for example:
Running time O(n) & memory requirement O(n2)
n = 50 000 → OutOfMemoryException
The Hidden Constant
Sometimes a linear algorithm could be slower than quadratic algorithm
The hidden constant could be significant
Example:
Algorithm A makes: 100*n steps → O(n)
Algorithm B makes: n*n/2 steps → O(n2)
For n < 200 the algorithm B is faster
Real-world example:
Insertion sort is faster than quicksort for n <= 16
Polynomial Algorithms
A polynomial-time algorithm is one whose worst-case time complexity is bounded above by a polynomial function of its input size
W(n) ∈ O(p(n))
Examples:
Polynomial-time: log(n), n2, 3n3 + 4n, 2 * n log(n)
Non polynomial-time : 2n, 3n, nk, n!
Non-polynomial algorithms hang for large input data sets
Computational Classes
Computational complexity theory divides the computational problems into several classes:
Analyzing Complexity of Algorithms
Examples
Complexity Examples
intFindMaxElement(int[] array)
{
int max = array[0];
for (int i = 0; i < array.length; i++)
{
if (array[i] > max)
{
max = array[i];
}
}
return max;
}
Runs in O(n) where n is the size of the array
The number of elementary steps is ~n
longFindInversions(int[] array)
{
long inversions = 0;
for (int i = 0; i < array.Length; i++)
for (int j = i + 1; j < array.Length; i++)
if (array[i] > array[j])
inversions++;
return inversions;
}
Runs in O(n2) where n is the size of the array
The number of elementary steps is ~n*(n+1)/2
decimalSum3(int n)
{
decimal sum = 0;
for (int a = 0; a < n; a++)
for (int b = 0; b < n; b++)
for (int c = 0; c < n; c++)
sum += a * b * c;
return sum;
}
Runs in cubic time O(n3)
The number of elementary steps is ~n3
longSumMN(int n, int m)
{
long sum = 0;
for (int x = 0; x < n; x++)
for (int y = 0; y < m; y++)
sum += x * y;
return sum;
}
Runs in quadratic time O(n*m)
The number of elementary steps is ~n*m
longSumMN(int n, int m)
{
long sum = 0;
for (int x = 0; x < n; x++)
for (int y = 0; y < m; y++)
if (x == y)
for (int i = 0; i < n; i++)
sum += i * x * y;
return sum;
}
Runs in quadratic time O(n*m)
The number of elementary steps is ~n*m + min(m,n)*n
decimalCalculation(int n)
{
decimal result = 0;
for (int i = 0; i < (1 << n); i++)
result += i;
return result;
}
Runs in exponential time O(2n)
The number of elementary steps is ~2n
decimalFactorial(int n)
{
if (n == 0)
return1;
elsereturn n * Factorial(n-1);
}