Video: Introduction to Data Structures, Algorithms and Complexity (Bulgarian)

June 24, 2020
Nikolay Kostov (Nikolay.IT)

Topics covered:

Data Structures Overview
- Linear Structures, Trees, Hash Tables, Others
Algorithms Overview
Algorithms Complexity
- Time and Memory Complexity
- Mean, Average and Worst Case
- Asymptotic Notation O(g)

Video (in Bulgarian)

Presentation Content

What is a Data Structure?

“In computer science, a data structure is a particular way of storing and organizing data in a computer so that it can be used efficiently.”

-- Wikipedia

Examples of data structures:
- Person structure (first name + last name + age)
- Array of integers – int[]
- List of strings – List
- Queue of people – Queue

Why are Data Structures so Important?

Data structures and algorithms are the foundation of computer programming
Algorithmic thinking, problem solving and data structures are vital for software engineers
- All .NET developers should know when to use T[], LinkedList, List, Stack, Queue, Dictionary<K,T>, HashSet, SortedDictionary<K,T> and SortedSet
Computational complexity is important for algorithm design and efficient programming

Primitive Types and Collections

Primitive data types
- Numbers: int, float, decimal, …
- Text data: char, string, …
Simple structures
- A group of fields stored together
- E.g. DateTime, Point, Rectangle, …
Collections
- A set of elements (of the same type)
- E.g. array, list, stack, tree, hash-table, …

Abstract Data Types (ADT)

An Abstract Data Type (ADT) is
- A data type together with the operations, whose properties are specified independently of any particular implementation
- ADT are set of definitions of operations
  - Like the interfaces in C#
ADT can have multiple different implementations
- Different implementations can have different efficiency, inner logic and resource needs

Basic Data Structures

Linear structures
- Lists: fixed size and variable size
- Stacks: LIFO (Last In First Out) structure
- Queues: FIFO (First In First Out) structure
Trees and tree-like structures
- Binary, ordered search trees, balanced, B-trees, etc.
Dictionaries (maps)
- Contain pairs (key, value)
- Hash tables: use hash functions to search/insert
Sets and bags
- Set – collection of unique elements
- Bag – collection of non-unique elements
Ordered sets, bags and dictionaries
Priority queues / heaps
Special tree structures
- Suffix tree, interval tree, index tree, trie
Graphs
- Directed / undirected
- Weighted / un-weighted
- Connected / non-connected, …

Algorithms

Introduction

What is an Algorithm?

"In mathematics and computer science, an algorithm is a step-by-step procedure for calculations. An algorithm is an effective method expressed as a finite list of well-defined instructions for calculating a function.”

-- Wikipedia

The term “algorithm” comes from the
- Derived from Muḥammad Al-Khwārizmī, a Persian mathematician and astronomer
  - An algorithm for solving quadratic equations

Algorithms in Computer Science

Algorithms are fundamental in programming
- Imperative (traditional) programming means to describe in formal steps how to do something
- Algorithm == sequence of operations (steps)
  - Can include branches (conditional blocks) and repeated logic (loops)
Algorithmic thinking (mathematical thinking, logical thinking, engineering thinking)
- Ability to decompose the problems into formal sequences of steps (algorithms)

Pseudocode and Flowcharts

Algorithms can be expressed in pseudocode, through flowcharts or program code

BFS(node)
{
  queue <- node
  while queue not empty
    v <- queue
    print v
    for each child c of v
      queue <- c
}

Algorithms in Programming

Sorting and searching
Dynamic programming
Graph algorithms
- DFS and BFS traversals
Combinatorial algorithms
- Recursive algorithms
Other algorithms
- Greedy algorithms, computational geometry, randomized algorithms, genetic algorithms

Algorithm Complexity

Asymptotic Notation

Algorithm Analysis

Why we should analyze algorithms?
- Predict the resources the algorithm requires
  - Computational time (CPU consumption)
  - Memory space (RAM consumption)
  - Communication bandwidth consumption
- The running time of an algorithm is:
  - The total number of primitive operations executed (machine independent steps)
  - Also known as algorithm complexity

Algorithmic Complexity

What to measure?
- CPU Time
- Memory
- Number of steps
- Number of particular operations
  - Number of disk operations
  - Number of network packets
- Asymptotic complexity

Time Complexity

Worst-case
- An upper bound on the running time for any input of given size
Average-case
- Assume all inputs of a given size are equally likely
Best-case
- The lower bound on the running time (the optimal case)

Time Complexity – Example

Sequential search in a list of size n
- Worst-case:
  - n comparisons
- Best-case:
  - 1 comparison
- Average-case:
  - n/2 comparisons
The algorithm runs in linear time
- Linear number of operations

Algorithms Complexity

Algorithm complexity is a rough estimation of the number of steps performed by given computation depending on the size of the input data
- Measured through asymptotic notation
  - O(g) where g is a function of the input data size
- Examples:
  - Linear complexity O(n) – all elements are processed once (or constant number of times)
  - Quadratic complexity O(n²) – each of the elements is processed n times

Asymptotic Notation: Definition

Asymptotic upper bound
- O-notation (Big O notation)
For given function g(n), we denote by O(g(n)) the set of functions that are different than g(n) by a constant

O(g(n)) = {f(n): there exist positive constants c and n₀ such that f(n) <= c*g(n) for all n>=n₀}

Examples:
- 3 * n² + n/2 + 12 ∈ O(n²)
- 4nlog₂(3n+1) + 2n-1 ∈ O(n * log n)

Typical Complexities

Complexity	Notation	Description
constant	O(1)	Constant number of operations, not depending on the input data size, e.g. n = 1 000 000 → 1-2 operations
logarithmic	O(log n)	Number of operations propor-tional of log2(n) where n is the size of the input data, e.g. n = 1 000 000 000 → 30 operations
linear	O(n)	Number of operations proportional to the input data size, e.g. n = 10 000 → 5 000 operations

Complexity	Notation	Description
quadratic	O(n²)	Number of operations proportional to the square of the size of the input data, e.g. n = 500 → 250 000 operations
cubic	O(n³)	Number of operations propor-tional to the cube of the size of the input data, e.g. n = 200 → 8 000 000 operations
exponential	O(2ⁿ), O(kⁿ), O(n!)	Exponential number of operations, fast growing, e.g. n = 20 → 1 048 576 operations

Time Complexity and Speed

Complexity	10	20	50	100	1000	10000	100000
O(1)	< 1 s	< 1 s	< 1 s	< 1 s	< 1 s	< 1 s	< 1 s
O(log(n))	< 1 s	< 1 s	< 1 s	< 1 s	< 1 s	< 1 s	< 1 s
O(n)	< 1 s	< 1 s	< 1 s	< 1 s	< 1 s	< 1 s	< 1 s
O(n*log(n))	< 1 s	< 1 s	< 1 s	< 1 s	< 1 s	< 1 s	< 1 s
O(n²)	< 1 s	< 1 s	< 1 s	< 1 s	< 1 s	2 s	3-4 min
O(n³)	< 1 s	< 1 s	< 1 s	< 1 s	20 s	5 hours	231 days
O(2ⁿ)	< 1 s	< 1 s	260 days	hangs	hangs	hangs	hangs
O(n!)	< 1 s	hangs	hangs	hangs	hangs	hangs	hangs
O(nⁿ)	3-4 min	hangs	hangs	hangs	hangs	hangs	hangs

Time and Memory Complexity

Complexity can be expressed as formula on multiple variables, e.g.
- Algorithm filling a matrix of size [n x m] with the natural numbers 1, 2, … will run in O(n*m)
- A traversal of graph with n vertices and m edges will run in O(n + m)
Memory consumption should also be considered, for example:
- Running time O(n) & memory requirement O(n²)
- n = 50 000 → OutOfMemoryException

The Hidden Constant

Sometimes a linear algorithm could be slower than quadratic algorithm
- The hidden constant could be significant
Example:
- Algorithm A makes: 100*n steps → O(n)
- Algorithm B makes: n*n/2 steps → O(n²)
- For n < 200 the algorithm B is faster
Real-world example:
- Insertion sort is faster than quicksort for n <= 16

Polynomial Algorithms

A polynomial-time algorithm is one whose worst-case time complexity is bounded above by a polynomial function of its input size

W(n) ∈ O(p(n))

Examples:
- Polynomial-time: log(n), n², 3n³ + 4n, 2 * n log(n)
- Non polynomial-time : 2ⁿ, 3ⁿ, n^k, n!
- Non-polynomial algorithms hang for large input data sets

Computational Classes

Computational complexity theory divides the computational problems into several classes:

Analyzing Complexity of Algorithms

Examples

Complexity Examples

int FindMaxElement(int[] array)
{
    int max = array[0];
    for (int i = 0; i < array.length; i++)
    {
        if (array[i] > max)
        {
            max = array[i];
        }
    }
    return max;
}

Runs in O(n) where n is the size of the array

The number of elementary steps is ~n

long FindInversions(int[] array)
{
    long inversions = 0;
    for (int i = 0; i < array.Length; i++)
        for (int j = i + 1; j < array.Length; i++)
            if (array[i] > array[j])
                inversions++;
    return inversions;
}

Runs in O(n²) where n is the size of the array

The number of elementary steps is ~n*(n+1)/2

decimal Sum3(int n)
{
    decimal sum = 0;
    for (int a = 0; a < n; a++)
        for (int b = 0; b < n; b++)
            for (int c = 0; c < n; c++)
                sum += a * b * c;
    return sum;
}

Runs in cubic time O(n³)

The number of elementary steps is ~n³

long SumMN(int n, int m)
{
    long sum = 0;
    for (int x = 0; x < n; x++)
        for (int y = 0; y < m; y++)
            sum += x * y;
    return sum;
}

Runs in quadratic time O(n*m)

The number of elementary steps is ~n*m

long SumMN(int n, int m)
{
    long sum = 0;
    for (int x  = 0; x < n; x++)
        for (int y = 0; y < m; y++)
            if (x == y)
                for (int i = 0; i < n; i++)
                    sum += i * x * y;
    return sum;
}

Runs in quadratic time O(n*m)

The number of elementary steps is
~n*m + min(m,n)*n

decimal Calculation(int n)
{
    decimal result = 0;
    for (int i = 0; i < (1 << n); i++)
        result += i;
    return result;
}

Runs in exponential time O(2ⁿ)

The number of elementary steps is ~2ⁿ

decimal Factorial(int n)
{
    if (n == 0)
        return 1;
    else
        return n * Factorial(n-1);
}

Runs in linear time O(n)

The number of elementary steps is ~n

decimal Fibonacci(int n)
{
    if (n == 0)
        return 1;
    else if (n == 1)
        return 1;
    else
        return Fibonacci(n-1) + Fibonacci(n-2);
}

Runs in exponential time O(2ⁿ)

The number of elementary steps is ~Fib(n+1) where Fib(k) is the k-th Fibonacci’s number

Summary

Data structures organize data for efficient use
- ADT describe a set of operations
- Collections hold a group of elements
Algorithms are sequences of steps for performing or calculating something
Algorithm complexity is rough estimation of the number of steps performed by given computation
- Complexity can be logarithmic, linear, n log n, square, cubic, exponential, etc.
- Allows to estimating the speed of given code before its execution