Topics covered:

  • Data Structures Overview
    • Linear Structures, Trees, Hash Tables, Others
  • Algorithms Overview
  • Algorithms Complexity
    • Time and Memory Complexity
    • Mean, Average and Worst Case
    • Asymptotic Notation O(g)

Video (in Bulgarian)

Presentation Content

What is a Data Structure?

“In computer science, a data structure is a particular way of storing and organizing data in a computer so that it can be used efficiently.”
-- Wikipedia
  • Examples of data structures:
    • Person structure (first name + last name + age)
    • Array of integers – int[]
    • List of strings – List
    • Queue of people – Queue

Why are Data Structures so Important?

  • Data structures and algorithms are the foundation of computer programming
  • Algorithmic thinking, problem solving and data structures are vital for software engineers
    • All .NET developers should know when to use T[], LinkedList, List, Stack, Queue, Dictionary<K,T>, HashSet, SortedDictionary<K,T> and SortedSet
  • Computational complexity is important for algorithm design and efficient programming

Primitive Types and Collections

  • Primitive data types
    • Numbers: int, float, decimal, …
    • Text data: char, string, …
  • Simple structures
    • A group of fields stored together
    • E.g. DateTime, Point, Rectangle, …
  • Collections
    • A set of elements (of the same type)
    • E.g. array, list, stack, tree, hash-table, …

Abstract Data Types (ADT)

  • An Abstract Data Type (ADT) is
    • A data type together with the operations, whose properties are specified independently of any particular implementation
    • ADT are set of definitions of operations
      • Like the interfaces in C#
  • ADT can have multiple different implementations
    • Different implementations can have different efficiency, inner logic and resource needs

Basic Data Structures

  • Linear structures
    • Lists: fixed size and variable size
    • Stacks: LIFO (Last In First Out) structure
    • Queues: FIFO (First In First Out) structure
  • Trees and tree-like structures
    • Binary, ordered search trees, balanced, B-trees, etc.
  • Dictionaries (maps)
    • Contain pairs (key, value)
    • Hash tables: use hash functions to search/insert
  • Sets and bags
    • Set – collection of unique elements
    • Bag – collection of non-unique elements
  • Ordered sets, bags and dictionaries
  • Priority queues / heaps
  • Special tree structures
    • Suffix tree, interval tree, index tree, trie
  • Graphs
    • Directed / undirected
    • Weighted / un-weighted
    • Connected / non-connected, …

Algorithms

Introduction

What is an Algorithm?

"In mathematics and computer science, an algorithm is a step-by-step procedure for calculations. An algorithm is an effective method expressed as a finite list of well-defined instructions for calculating a function.”
-- Wikipedia
  • The term “algorithm” comes from the
    • Derived from Muḥammad Al-Khwārizmī, a Persian mathematician and astronomer
      • An algorithm for solving quadratic equations

Algorithms in Computer Science

  • Algorithms are fundamental in programming
    • Imperative (traditional) programming means to describe in formal steps how to do something
    • Algorithm == sequence of operations (steps)
      • Can include branches (conditional blocks) and repeated logic (loops)
  • Algorithmic thinking (mathematical thinking, logical thinking, engineering thinking)
    • Ability to decompose the problems into formal sequences of steps (algorithms)

Pseudocode and Flowcharts

  • Algorithms can be expressed in pseudocode, through flowcharts or program code
BFS(node)
{
  queue <- node
  while queue not empty
    v <- queue
    print v
    for each child c of v
      queue <- c
}

Algorithms in Programming

  • Sorting and searching
  • Dynamic programming
  • Graph algorithms
    • DFS and BFS traversals
  • Combinatorial algorithms
    • Recursive algorithms
  • Other algorithms
    • Greedy algorithms, computational geometry, randomized algorithms, genetic algorithms

Algorithm Complexity

Asymptotic Notation

Algorithm Analysis

  • Why we should analyze algorithms?
    • Predict the resources the algorithm requires
      • Computational time (CPU consumption)
      • Memory space (RAM consumption)
      • Communication bandwidth consumption
    • The running time of an algorithm is:
      • The total number of primitive operations executed (machine independent steps)
      • Also known as algorithm complexity

Algorithmic Complexity

  • What to measure?
    • CPU Time
    • Memory
    • Number of steps
    • Number of particular operations
      • Number of disk operations
      • Number of network packets
    • Asymptotic complexity

Time Complexity

  • Worst-case
    • An upper bound on the running time for any input of given size
  • Average-case
    • Assume all inputs of a given size are equally likely
  • Best-case
    • The lower bound on the running time (the optimal case)

Time Complexity – Example

  • Sequential search in a list of size n
    • Worst-case:
      • n comparisons
    • Best-case:
      • 1 comparison
    • Average-case:
      • n/2 comparisons
  • The algorithm runs in linear time
    • Linear number of operations

Algorithms Complexity

  • Algorithm complexity is a rough estimation of the number of steps performed by given computation depending on the size of the input data
    • Measured through asymptotic notation
      • O(g) where g is a function of the input data size
    • Examples:
      • Linear complexity O(n) – all elements are processed once (or constant number of times)
      • Quadratic complexity O(n2) – each of the elements is processed n times

Asymptotic Notation: Definition

  • Asymptotic upper bound
    • O-notation (Big O notation)
  • For given function g(n), we denote by O(g(n)) the set of functions that are different than g(n) by a constant
O(g(n)) = {f(n): there exist positive constants c and n0 such that f(n) <= c*g(n) for all n>=n0}
  • Examples:
    • 3 * n2 + n/2 + 12 ∈ O(n2)
    • 4nlog2(3n+1) + 2n-1 ∈ O(n * log n)

Typical Complexities

Complexity Notation Description
constant O(1) Constant number of operations, not depending on the input data size, e.g. n = 1 000 000 → 1-2 operations
logarithmic O(log n) Number of operations propor-tional of log2(n) where n is the size of the input data, e.g. n = 1 000 000 000 → 30 operations
linear O(n) Number of operations proportional to the input data size, e.g. n = 10 000 → 5 000 operations
Complexity Notation Description
quadratic O(n2) Number of operations proportional to the square of the size of the input data, e.g. n = 500 → 250 000 operations
cubic O(n3) Number of operations propor-tional to the cube of the size of the input data, e.g. n = 200 → 8 000 000 operations
exponential O(2n),
O(kn),
O(n!)
Exponential number of operations, fast growing, e.g. n = 20 → 1 048 576 operations

Time Complexity and Speed

Complexity 10 20 50 100 1000 10000 100000
O(1) < 1 s < 1 s < 1 s < 1 s < 1 s < 1 s < 1 s
O(log(n)) < 1 s < 1 s < 1 s < 1 s < 1 s < 1 s < 1 s
O(n) < 1 s < 1 s < 1 s < 1 s < 1 s < 1 s < 1 s
O(n*log(n)) < 1 s < 1 s < 1 s < 1 s < 1 s < 1 s < 1 s
O(n2) < 1 s < 1 s < 1 s < 1 s < 1 s 2 s 3-4 min
O(n3) < 1 s < 1 s < 1 s < 1 s 20 s 5 hours 231 days
O(2n) < 1 s < 1 s 260 days hangs hangs hangs hangs
O(n!) < 1 s hangs hangs hangs hangs hangs hangs
O(nn) 3-4 min hangs hangs hangs hangs hangs hangs

Time and Memory Complexity

  • Complexity can be expressed as formula on multiple variables, e.g.
    • Algorithm filling a matrix of size [n x m] with the natural numbers 1, 2, … will run in O(n*m)
    • A traversal of graph with n vertices and m edges will run in O(n + m)
  • Memory consumption should also be considered, for example:
    • Running time O(n) & memory requirement O(n2)
    • n = 50 000 → OutOfMemoryException

The Hidden Constant

  • Sometimes a linear algorithm could be slower than quadratic algorithm
    • The hidden constant could be significant
  • Example:
    • Algorithm A makes: 100*n steps → O(n)
    • Algorithm B makes: n*n/2 steps → O(n2)
    • For n < 200 the algorithm B is faster
  • Real-world example:
    • Insertion sort is faster than quicksort for n <= 16

Polynomial Algorithms

  • A polynomial-time algorithm is one whose worst-case time complexity is bounded above by a polynomial function of its input size
W(n) ∈ O(p(n))
  • Examples:
    • Polynomial-time: log(n), n2, 3n3 + 4n, 2 * n log(n)
    • Non polynomial-time : 2n, 3n, nk, n!
    • Non-polynomial algorithms hang for large input data sets

Computational Classes

  • Computational complexity theory divides the computational problems into several classes:

Analyzing Complexity of Algorithms

Examples

Complexity Examples

int FindMaxElement(int[] array)
{
    int max = array[0];
    for (int i = 0; i < array.length; i++)
    {
        if (array[i] > max)
        {
            max = array[i];
        }
    }
    return max;
}
    • Runs in O(n) where n is the size of the array
    • The number of elementary steps is ~n
long FindInversions(int[] array)
{
    long inversions = 0;
    for (int i = 0; i < array.Length; i++)
        for (int j = i + 1; j < array.Length; i++)
            if (array[i] > array[j])
                inversions++;
    return inversions;
}
  • Runs in O(n2) where n is the size of the array
  • The number of elementary steps is ~n*(n+1)/2
decimal Sum3(int n)
{
    decimal sum = 0;
    for (int a = 0; a < n; a++)
        for (int b = 0; b < n; b++)
            for (int c = 0; c < n; c++)
                sum += a * b * c;
    return sum;
}
  • Runs in cubic time O(n3)
  • The number of elementary steps is ~n3
long SumMN(int n, int m)
{
    long sum = 0;
    for (int x = 0; x < n; x++)
        for (int y = 0; y < m; y++)
            sum += x * y;
    return sum;
}
  • Runs in quadratic time O(n*m)
  • The number of elementary steps is ~n*m
long SumMN(int n, int m)
{
    long sum = 0;
    for (int x  = 0; x < n; x++)
        for (int y = 0; y < m; y++)
            if (x == y)
                for (int i = 0; i < n; i++)
                    sum += i * x * y;
    return sum;
}
  • Runs in quadratic time O(n*m)
  • The number of elementary steps is
    ~n*m + min(m,n)*n
decimal Calculation(int n)
{
    decimal result = 0;
    for (int i = 0; i < (1 << n); i++)
        result += i;
    return result;
}
  • Runs in exponential time O(2n)
  • The number of elementary steps is ~2n
decimal Factorial(int n)
{
    if (n == 0)
        return 1;
    else
        return n * Factorial(n-1);
}
  • Runs in linear time O(n)
  • The number of elementary steps is ~n
decimal Fibonacci(int n)
{
    if (n == 0)
        return 1;
    else if (n == 1)
        return 1;
    else
        return Fibonacci(n-1) + Fibonacci(n-2);
}
  • Runs in exponential time O(2n)
  • The number of elementary steps is ~Fib(n+1) where Fib(k) is the k-th Fibonacci’s number

Summary

  • Data structures organize data for efficient use
    • ADT describe a set of operations
    • Collections hold a group of elements
  • Algorithms are sequences of steps for performing or calculating something
  • Algorithm complexity is rough estimation of the number of steps performed by given computation
    • Complexity can be logarithmic, linear, n log n, square, cubic, exponential, etc.
    • Allows to estimating the speed of given code before its execution