Analysis of Algorithms - Introduction and Examples

# Analysis of Algorithms - Introduction and Examples

## Introductory Computer Programming

### Deepayan Sarkar

---

# About this course

- Compulsory non-credit course (pass marks: 35%)

- Does not count towards composite score, but you need to pass
--

- Official syllabus

- Basics in Programming: flow-charts, logic in programming

- Common syntax

- Handling input/output files

- Sorting

- Iterative algorithms

- Simulations from statistical distributions

- Programming for statistical data analyses: regression,
      estimation, parametric tests

<div>
$$
\newcommand{\sub}[1]{_{#1}}
$$
</div>

---

# Previous semester

- Introduction to basic ideas

- Examples of problems
	
	- Flowcharts, algorithms

- Practice

- A high-level overview of R

- Basic usage of R

- Discuss various solutions to prototype problems

---

# This semester

- Theory

- Algorithms: correctness and runtime analysis (mostly sorting)

- Computer representation of numbers

- Conditioning and stability

- Implementation (using R / C++ / ...)

---

# Review: Algorithms

- Set of instructions to solve a problem

- Usually require some inputs

- Instructions are executed sequentially, finally resulting in an
  output (also called _return value_)

---

# Example: is a given number $n$ prime?

---

![flowchart](extra/isprime-flowchart.svg)

---

- Written in the form of _pseudo-code_:

.algorithm[
.name[`is\_prime(n)`]
i := 2
__while__ (i $\leq$ sqrt(n)) {
    __if__ (n mod i == 0) {
        __return__ FALSE
    }
    i := i + 1
}
__return__ TRUE
]

- Here we skip checking whether $n > 1$ (and that it is an integer)

---

# How to interpret an algorithm?

- It is important to make sure that an algorithm makes sense

- Steps are executed sequentially, so the sequence must be clear
	
- It must be possible to evaluate each step

- All variables used must have been defined in a previous step
	
	- It is OK to call other functions (or algorithms), but they must be clearly defined

- It is OK for an algorithm to call itself (_recursion_)

---

# Theoretical questions about algorithms

* __Is an algorithm correct?__ To be correct, an algorithm must

- stop after a finite number of steps, and

- produce the _correct output_ for _all possible inputs_ (i.e.,
	  all _instances_ of the problem).
  
* __How efficient is the algorithm?__

- What resources does the algorithm need to run, typically in
	  terms of time and storage? 
	  
	- How does it compare with other algorithms for the same problem?

* To answer such questions, we need a model for computation

---

# Correctness

* When is an algorithm correct?

* The answer may depend on the input

* An algorithm may be correct for some inputs, not for others

* A specific input for a general problem is often called an _instance_ of the problem

* To be correct, an algorithm must

* Stop (after a finite number of steps), and

* produce the _correct output_

* This must happen for _all possible inputs_, i.e.,  all _instances_ of the problem

---

# An incorrect algorithm

.algorithm[
.name[`is\_prime(n)`]
__if__ $i \in \lbrace 2, 3, 5, 7, 11, 13, 17, 19 \rbrace$
    __return__ TRUE
__else__
    __return__ FALSE
]

* Correct for "most" input (for "random" input)

* But not correct (because incorrect for at least one input)

---

# Efficiency

* _How efficient is an algorithm?_

* That is, how much of resources does the algorithm need?

* We are usually interested in efficiency in terms of

* Time needed for the algorithm to execute

* Amount of memory / storage needed while the algorithm runs

* The answer may again depend on the specific instance of the problem

---

# Sorting

* We will study these questions mainly in the context of one specific
  problem, namely _sorting_

* The basic problem:

* Input: A sequence of numbers <span>$(a_1, a_2, ..., a_n)$</span>

* Desired output: A permutation of the input, <span>$(b_1, b_2, ..., b_n)$</span>
      such that <span>$b_1 \leq b_2 \leq ... \leq b_n$</span>

* Sometimes we are interested in the _permutation_ rather than the _permuted output_

* The $a_i$-s are known as <em>keys</em>

---

# Arrays

* The analysis of algorithms is both a practical and a theoretical exercise

* For a theoretical analysis of algorithms, we need

* Abstract data structures to represent the input and output (and
	  possibly intermediate objects)

* Some rules or conventions regarding how these structures behave

* These structures and rules should reflect actual practical implementations
--

* For sorting, we usually need a simple data structure known as an
_array_:

* An array $A[1,...,n]$ of length $n$ is a sequence of length $n$

* The $i$-th element of an array $A$ is denoted by $A[i]$

* Each $A[i]$ acts as a _variable_, that is, we can assign values
      to it, and query its current value

* The sub-array with indices $i$ to $j$ (inclusive) is often
      indicated by $A[i, ..., j]$

---

# Insertion sort

* _Insertion sort_ is a simple and intuitive sorting algorithm

* Basic idea:

* Think of sorting a hand of cards

* Start with an empty left hand and the cards face down on the table

* Remove one card at a time from the table and insert it into the
      correct position in the left hand

* To find its correct position, compare it with each of the cards
      already in the hand, from right to left

* Insertion sort is a good algorithm for sorting a small number of
  elements

---

# Insertion sort

* The following pseudo-code represents the insertion sort algorithm

* Here the input is an already-constructed array `A`

* The length of the array is given by the attribute `A.length`

.algorithm[
.name[`insertion-sort(A)`]
__for__ (j = 2 to A.length) {
    key = A[j] // Value to insert into the sorted sequence A[1,...,j-1]
    i = j - 1
    __while__ (i > 0 and A[i] > key) {
        A[i+1] = A[i]
        i = i - 1
    }
    A[i+1] = key
}
]

---

# Exercise

* Is it obvious that this algorithm works?

* Can you think of any other sorting algorithm?

* Is your algorithm more efficient than insertion sort?

---

# Insertion sort in R

```r
insertion.sort <- function(A, verbose = FALSE)
{
    if (length(A) < 2) return(A)
    for (j in 2:length(A)) {
        key <- A[j]
        i <- j - 1
        while (i > 0 && A[i] > key) {
            A[i+1] <- A[i]
            i <- i - 1
        }
        A[i+1] <- key
        if (verbose) cat("j =", j, ", i =", i,
		                 ", A = (", paste(format(A), collapse = ", "), ")\n")
    }
    return (A)
}
```

* More or less same as the algorithm pseudo-code

* Additional `verbose` argument to print intermediate steps

* Due to R semantics, the result must be _returned_ (not modified _in place_)

* This last behaviour has important practical implications (to be discussed later)

---

# Insertion sort in R

```r
(A <- sample(10))
```

```
 [1]  4  7  8  3  2 10  6  9  1  5
```

```r
insertion.sort(A)
```

```
 [1]  1  2  3  4  5  6  7  8  9 10
```

```r
(A <- round(runif(10), 2))
```

```
 [1] 0.17 0.85 0.60 0.20 0.91 0.08 0.64 0.35 0.89 0.92
```

```r
insertion.sort(A)
```

```
 [1] 0.08 0.17 0.20 0.35 0.60 0.64 0.85 0.89 0.91 0.92
```

---

# Insertion sort in R

```r
A
```

```
 [1] 0.17 0.85 0.60 0.20 0.91 0.08 0.64 0.35 0.89 0.92
```

```r
insertion.sort(A, verbose = TRUE)
```

```
j = 2 , i = 1 , A = ( 0.17, 0.85, 0.60, 0.20, 0.91, 0.08, 0.64, 0.35, 0.89, 0.92 )
j = 3 , i = 1 , A = ( 0.17, 0.60, 0.85, 0.20, 0.91, 0.08, 0.64, 0.35, 0.89, 0.92 )
j = 4 , i = 1 , A = ( 0.17, 0.20, 0.60, 0.85, 0.91, 0.08, 0.64, 0.35, 0.89, 0.92 )
j = 5 , i = 4 , A = ( 0.17, 0.20, 0.60, 0.85, 0.91, 0.08, 0.64, 0.35, 0.89, 0.92 )
j = 6 , i = 0 , A = ( 0.08, 0.17, 0.20, 0.60, 0.85, 0.91, 0.64, 0.35, 0.89, 0.92 )
j = 7 , i = 4 , A = ( 0.08, 0.17, 0.20, 0.60, 0.64, 0.85, 0.91, 0.35, 0.89, 0.92 )
j = 8 , i = 3 , A = ( 0.08, 0.17, 0.20, 0.35, 0.60, 0.64, 0.85, 0.91, 0.89, 0.92 )
j = 9 , i = 7 , A = ( 0.08, 0.17, 0.20, 0.35, 0.60, 0.64, 0.85, 0.89, 0.91, 0.92 )
j = 10 , i = 9 , A = ( 0.08, 0.17, 0.20, 0.35, 0.60, 0.64, 0.85, 0.89, 0.91, 0.92 )
```

```
 [1] 0.08 0.17 0.20 0.35 0.60 0.64 0.85 0.89 0.91 0.92
```

---

# Correctness

* Examples suggest that this algorithm works

* How can we formally prove correctness for all possible input (all instances)?
--

* Note that the algorithm works by running a loop

* The key observation is the following:

>  At the beginning of each loop (for any particular value of
    >  $j$), The first $j-1$ elements in $A[1,...,j-1]$ are the same
    >  as the first $j-1$ elements originally in the array, but they
    >  are now sorted.

---

# Loop invariant

* This kind of statement is known as a _loop invariant_

* Such loop invariants can be used to prove correctness if we can show three things:

- Initialization: It is true prior to the first iteration of the loop

- Maintenance: If it is true before an iteration of the loop, it
	  remains true before the next iteration

- Termination: Upon termination, the invariant leads to a useful conclusion

* The first two properties are similar to induction

* The third is important in the sense that a loop invariant is useless
  unless the third property holds

---

# Loop invariant for insertion sort

### Statement

>  At the beginning of each loop (for any particular value of $j$),
>  The first $j-1$ elements in $A[1,...,j-1]$ are the same as the
>  first $j-1$ elements originally in the array, but they are now
>  sorted.

### Initialization

* Before starting the for loop for $j = 2$, $A[1,...,j-1]$ is
  basically just $A[1]$, which is

* trivially sorted, and

* the same as the original $A[1]$

---

# Loop invariant for insertion sort

### Maintenance

* At the beginning of the for loop with some value of $j$, $A[1,..., j-1]$ is sorted

* Informally, the while loop within each iteration [works by](#insertion-sort)

* comparing $key = A[j]$ with $A[j-1], A[j-2], ..., A[1]$ (in that order)

* moving them one position to the right, until the correct position of $key$ is found

* Clearly, this while loop must terminate within at most $j$ steps

* After the while loop ends, $key = A[j]$ is inserted in the correct position

* At the end, $A[1,...,j]$ is a sorted version of the original $A[1,...,j]$.

* Thus, the loop invariant is now true for index $j+1$
--

* To be more formal, we could prove a loop invariant for the while loop also

* Will not go into that much detail

---

# Loop invariant for insertion sort

### Termination

* The for loop essentially increments $j$ by 1 every time it runs

* The loop terminates when $j > n = A.length$

* As each loop iteration increases $j$ by 1, we must have $j = n + 1$ at that time

* Substituting $n + 1$ for $j$ in the loop invariant, we have

* $A[1,...,n]$ has the same elements as it originally had, and is now sorted.

* Hence, the algorithm is correct.

---

# Run time analysis

* It is natural to be interested in studying the efficiency of an algorithm

* Usually, we are interested in running time and memory usage

* Both these may depend on the size of the input, and often on the
  specific input
--

* If we have a practical implementation, we can simply run the
  algorithm to study running time

* Let's try this with the R implementation

---

# Run time of R implementation

* We expect running time to depend on size of input

* To average out effect of individual inputs, we can consider multiple random inputs, e.g.,

```r
x <- replicate(20, runif(100), simplify = FALSE) # list of 20 vectors
system.time(lapply(x, insertion.sort))
```

```
   user  system elapsed 
  0.005   0.000   0.005 
```

```r
x <- replicate(20, runif(1000), simplify = FALSE)
system.time(lapply(x, insertion.sort))
```

```
   user  system elapsed 
  0.466   0.001   0.469 
```

---

# Run time of R implementation

* Do this systematically for various input sizes

```r
timeSort <- function(size, nrep = 20, sort.fun = insertion.sort)
{
    x <- replicate(nrep, runif(size), simplify = FALSE)
    system.time(lapply(x, sort.fun))["elapsed"] / nrep
}
n <- seq(100, 3000, by = 100)
tinsertion <- sapply(n, timeSort, nrep = 5, sort.fun = insertion.sort)
```

---

# Run time of R implementation

```r
xyplot(tinsertion ~ n, grid = TRUE, aspect = "xy")
```

![plot of chunk unnamed-chunk-6](figures/algo-1-unnamed-chunk-6-1.svg)

---

# Run time of R implementation

```r
tsort <- sapply(n, timeSort, nrep = 5, sort.fun = sort) # built-in sort() function
xyplot(tinsertion + tsort ~ n, grid = TRUE, outer = TRUE, ylab = "time (seconds)")
```

![plot of chunk unnamed-chunk-7](figures/algo-1-unnamed-chunk-7-1.svg)

---

# Insertion sort in Python

* We can also implement the algorithm in Python

* Arrays are not copied when given as arguments, so changes modify original

* Python array index starts from 0, so need to suitably modify

```python
def insertion_sort_py(A):
    for j in range(1, len(A)):
        key = A[j]
        i = j - 1
        while i > -1 and A[i] > key :
            A[i+1] = A[i]
            i = i - 1
        A[i+1] = key
```

---

# Insertion sort in Python

```python
import numpy as np
import time
x = np.random.uniform(0, 1, 10).round(2)
x
```

```
array([0.72, 0.77, 0.36, 0.14, 0.69, 0.07, 0.53, 0.57, 0.35, 0.09])
```

```python
t0 = time.time()
insertion_sort_py(x)
t1 = time.time()
x
```

```
array([0.07, 0.09, 0.14, 0.35, 0.36, 0.53, 0.57, 0.69, 0.72, 0.77])
```

```python
t1 - t0 # elapsed time in seconds
```

```
0.007844209671020508
```

---

# Run time of Python implementation

```python
def time_sort(size, nrep, sortfun):
	total_time = 0.0
	for i in range(nrep):
		x = np.random.uniform(0, 1, size)
		t0 = time.time()
		sortfun(x)
		t1 = time.time()
		total_time += (t1 - t0)
	return total_time / nrep

nvals = range(100, 3001, 100)
tvals = [time_sort(i, 5, insertion_sort_py) for i in nvals]
print(tvals)
```

```
[0.0009085655212402344, 0.0038155555725097657, 0.008127641677856446, 0.014680051803588867, 0.02114095687866211, 0.031355857849121094, 0.042926359176635745, 0.05416231155395508, 0.07168688774108886, 0.09676403999328613, 0.12046031951904297, 0.13056044578552245, 0.15528602600097657, 0.1838587760925293, 0.2100743293762207, 0.24043521881103516, 0.2630645751953125, 0.2919174671173096, 0.31547322273254397, 0.38001089096069335, 0.4296229362487793, 0.4152398586273193, 0.4850484371185303, 0.5300886154174804, 0.5574004173278808, 0.6264862060546875, 0.7083564758300781, 0.741786527633667, 0.8163054466247559, 0.8009706497192383]
```

---

# Run time comparison

```r
library(reticulate) # to communicate between R and Python (ignore for now)
tpython <- py$tvals
xyplot(tinsertion + tpython ~ n, grid = TRUE, outer = TRUE, ylab = "time (seconds)")
```

![plot of chunk unnamed-chunk-11](figures/algo-1-unnamed-chunk-11-1.svg)

---

# Insertion sort in C++

* Yet another possibility is to implement the algorithm in C / C++

* We will use `Rcpp` so that we can easily call the function from R

* Array indexing starts from 0 (like Python), so similar modifications needed

```cpp
#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
NumericVector insertion_sort_rcpp_bad(NumericVector A)
{
	int i, j, n = A.size();
	double key;
    for (int j = 1; j < n; j++) {
        key = A[j];
        i = j - 1;
        while (i > -1 && A[i] > key) {
            A[i+1] = A[i];
            i = i - 1;
		}
        A[i+1] = key;
	}
	return A;
}
```

---

# Insertion sort in C++

```r
(A <- round(runif(10), 2))
```

```
 [1] 0.51 0.93 0.96 0.94 0.41 0.32 0.50 0.36 0.87 0.06
```

```r
insertion_sort_rcpp_bad(A)
```

```
 [1] 0.06 0.32 0.36 0.41 0.50 0.51 0.87 0.93 0.94 0.96
```

```r
A # changed!
```

```
 [1] 0.06 0.32 0.36 0.41 0.50 0.51 0.87 0.93 0.94 0.96
```

---

# Insertion sort in C++

* C++ also does not copy arrays when given as arguments, so changes modify original

* This violates implicit contract of R functions, so we need to explicitly copy

```cpp
#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
NumericVector insertion_sort_rcpp(NumericVector x)
{
	int i, j, n = x.size();
	double key;
	NumericVector A = clone(x);
    for (int j = 1; j < n; j++) {
        key = A[j];
        i = j - 1;
        while (i > -1 && A[i] > key) {
            A[i+1] = A[i];
            i = i - 1;
		}
        A[i+1] = key;
	}
	return A;
}
```

---

# Insertion sort in C++

```r
(A <- round(runif(10), 2))
```

```
 [1] 0.90 0.54 0.36 0.94 0.13 0.54 0.82 0.64 0.43 0.28
```

```r
insertion_sort_rcpp(A)
```

```
 [1] 0.13 0.28 0.36 0.43 0.54 0.54 0.64 0.82 0.90 0.94
```

```r
A # unchanged
```

```
 [1] 0.90 0.54 0.36 0.94 0.13 0.54 0.82 0.64 0.43 0.28
```

---

# Run time comparison

```r
trcpp <- sapply(n, timeSort, nrep = 5, sort.fun = insertion_sort_rcpp)
xyplot(tinsertion + tpython + trcpp ~ n, grid = TRUE, outer = TRUE, ylab = "time (seconds)")
```

![plot of chunk unnamed-chunk-16](figures/algo-1-unnamed-chunk-16-1.svg)

---

# Run time comparison

```r
xyplot(tinsertion + tpython + trcpp ~ n, grid = TRUE, outer = TRUE,
       scales = list(y = "free"), ylab = "time (seconds)")
```

![plot of chunk unnamed-chunk-17](figures/algo-1-unnamed-chunk-17-1.svg)

---

# Run time comparison (for larger inputs)

```r
trcpp10 <- sapply(10 * n, timeSort, nrep = 5, sort.fun = insertion_sort_rcpp)
tsort <- sapply(10 * n, timeSort, nrep = 5, sort.fun = sort)
xyplot(trcpp10 + tsort ~ (10 * n), grid = TRUE, outer = TRUE, ylab = "time (seconds)", aspect = 1)
```

![plot of chunk unnamed-chunk-18](figures/algo-1-unnamed-chunk-18-1.svg)

---

# Run time comparison (for larger inputs)

```r
tsort <- sapply(100 * n, timeSort, nrep = 5, sort.fun = sort)
xyplot(tsort ~ (100 * n), grid = TRUE, outer = TRUE, ylab = "time (seconds)", aspect = 1)
```

![plot of chunk unnamed-chunk-19](figures/algo-1-unnamed-chunk-19-1.svg)

---

# Run time comparison: summary

* Run time may vary substantially depending on implementation

* Even a C++ implementation of insertion sort is mich slower than built in `sort()` in R

* As a crude approximation, run time of insertion sort seems to be roughly quadratic in input size

* Can we validate this observation theoretically?

---

# Theoretical analysis of algorithms

* Analysis of an algorithm means predicting the resources requires by it, e.g.,

* amount of memory

* amount of input-output

* (most commonly) amount of computational time

* This helps identify efficient algorithms when multiple candidates available

* Such analysis may indicate multiple viable candidates, but helps to discard inferior ones

---

# Theoretical model

* Analysis of an algorithm requires a _model_ of the implementation technology

* Specifically, we need model for the resources and their associated costs
--

* We will assume a _single-processor random access machine (RAM)_ model

* This has a precise technical meaning, but for our purposes, it means that

* Instructions are executed one after another, with no concurrent
	  operations

* Accessing any location in memory has the same cost, regardless
      of the location

---

# Theoretical model

* In particular, accessing variable values (memory look-up) requires constant time

* Arrays are assumed to occupy contiguous locations in memory

* In other words, location of $A[i]$ = location of $A[1]$ + constant * $(i-1)$

* So accessing any $A[i]$ has same cost

* Drawback: arrays cannot be resized without incurring significant cost (by copying)

---

# Theoretical model

* We can be more precise, by

* listing the set of basic instructions the machine can perform

* E.g., add, multiply, data copy, move, branching, etc.

* Model the cost of each such operation

* We will not try to be that precise

* With reasonable assumptions, we will still be able to do reasonable analysis

---

# Runtime analysis of insertion sort

* Intuitively clear that time taken by insertion sort depends on several factors:

- Size of the input (longer arrays will need more time)

- Whether the array is already (almost) sorted

- In that case the position of the key is found quickly in every step

* We need to formalize both these dependencies
--

* Notion of _input size_ depends on the context

* For sorting problem, length of the input array is the natural notion

* For multiplying two numbers, a reasonable notion may be their magnitudes
--

* To take the nature of input into account, we usually consider

* worst case

* best case

* average case

---

# How should we define "running time"?

* Ideally, sum of the times taken (or _cost_) for each basic
instruction in the machine.

* We take a slightly different approach

* Instead of assigning a cost to each basic instruction, we assign a
  cost to each _step_ in our algorithm

* Then, count the number of times each step is executed

---

# Runtime analysis of insertion sort

* Try this for insertion sort

* Assume a cost for each line of the algorithm

.algorithm[
.name[`insertion-sort(A)`]                                                cost
for (j = 2 to A.length) {                       $c_1$
    key = A[j]                                  $c_2$
    i = j - 1                                   $c_3$
    while (i > 0 and A[i] > key) {              $c_4$
        A[i+1] = A[i]                           $c_5$
        i = i - 1                               $c_6$
    }
    A[i+1] = key                                $c_7$
}
]

* We need to count the number of times each step is executed

* This depends on the number of times the while loop runs, which depends on the input

---

# Runtime analysis of insertion sort

* Let $t_j$ denote the number of times the while condition is tested for index $j$

* The test will be false for the last iteration (and the loop will not run)

.algorithm[
.name[`insertion-sort(A)`]                                              cost * times
for (j = 2 to A.length) {                       $c\_1 \times n$
    key = A[j]                                  $c\_2 \times (n-1)$
    i = j - 1                                   $c\_3 \times (n-1)$
    while (i > 0 and A[i] > key) {              $c\_4 \times \sum\_{j=2}^n t\_j$
        A[i+1] = A[i]                           $c\_5 \times \sum\_{j=2}^n (t\_j-1)$
        i = i - 1                               $c\_6 \times \sum\_{j=2}^n (t\_j-1)$
    }
    A[i+1] = key                                $c\_7 \times n-1$
}
]

<!--

<div class="algorithm">
.name[`insertion-sort(A)`]              <span class="fright">cost * times</span>
__for__ (j = 2 to A.length) {           <span class="fright">$c_1 \times n$  </span>
    key = A[j]                          <span class="fright">$c_2 \times n-1$</span>
    i = j - 1                           <span class="fright">$c_3 \times n-1$</span>
    __while__ (i > 0 and A[i] > key) {  <span class="fright">$c_4 \times \sum_{j=2}^n t_j$</span>
        A[i+1] = A[i]                   <span class="fright">$c_5 \times \sum_{j=2}^n (t_j-1)$</span>
        i = i - 1                       <span class="fright">$c_6 \times \sum_{j=2}^n (t_j-1)$</span>
    }
    A[i+1] = key                        <span class="fright">$c_7 \times n-1$</span>
}
</div>

-->

* The total running time (cost) is

$$
T(n) = c_1 n + (c_2+c_3+c_7) (n-1) + c_4 \left( \sum t_j \right) +
       (c_5+c_6) \left( \sum t_j - 1 \right)
$$

---

# Runtime analysis of insertion sort

* Runtime of insertion sort

$$
T(n) = c_1 n + (c_2+c_3+c_7) (n-1) + c_4 \left( \sum t_j \right) +
       (c_5+c_6) \left( \sum t_j - 1 \right)
$$

* Depends on the values of $t_j$

* If input is already sorted, then $t_j=1$ for all $j$, and hence

$$
T(n) = c_1 n + (c_2+c_3+c_7+c_4) (n-1) = an + b,
$$

* In other words, $T(n)$ is linear in $n$, with coefficients $a$ and $b$ that depend on the costs $c_i$

* This is the _best case_ scenario

---

# Runtime analysis of insertion sort

* Runtime of insertion sort

$$
T(n) = c_1 n + (c_2+c_3+c_7) (n-1) + c_4 \left( \sum t_j \right) +
       (c_5+c_6) \left( \sum t_j - 1 \right)
$$

* The _worst case_ scenario is when the array is reverse sorted

* In that case, $t_j = j$ for all $j$

* Noting that $\sum\limits_2^n j = \frac{n(n+1)}{2} - 1$ and $\sum\limits_2^n (j-1) = \frac{n(n-1)}{2}$, we have

$$
T(n) = an^2 + bn + c
$$

* In other words, $T(n)$ is quadratic, with coefficients $a, b, c$ that depend on the costs $c_i$

---

# Runtime analysis of insertion sort

* The best case scenario is usually not of interest

* An algorithm is typically evaluated based on its worst case running time

* Another reasonable definition is the _average case_ running time

* For the sorting problem, this is defined as the

* Expected running time if the input is randomly ordered

* More precisely, "randomly ordered" means all permutations are equally likely

---

# Assignment

* Derive the average case running time of insertion sort

* Modify the insertion sort algorithm to return a _permutation_ that will sort the input

* Specifically, `p <- insertion_order(A)` should give an index vector `p` such that `A[p]` is sorted

* Implement this modified algorithm using both R and Rcpp

* To use Rcpp, you must first install a compiler and other tools [from here](https://cran.rstudio.com/bin/windows/Rtools/)

* See also the [RStudio page for Rcpp](https://support.rstudio.com/hc/en-us/articles/200486088-Using-Rcpp-with-RStudio) for other resources

---

# Questions?

---

# Order of growth

---

* Note that we have ignored the exact costs $c_i$ for each step

* Instead, we express the worst-case running time as $T(n) = an^2 + bn + c$

* As $n$ grows larger, this is dominated by the $n^2$ term

* Lower order terms (linear and constant) are asymptotically insignificant compared to $n^2$

* For this reason, we usually simplify further and say that the _order of growth_ of $T(n)$ is like $n^2$

* This is indicated using the notation

$$
T(n) = \Theta(n^2)
$$

---

* One algorithm is considered better than another if it has lower order of growth

* This is true even if the second one is faster for small input (as it will be slower for large enough input)

* If two algorithms have same order of growth, the coefficients may be important in practice

* However, theoretical analysis will usually consider them to be equivalent

---

# Divide and Conquer

* Insertion sort is an _incremental algorithm_: modifies the input one step at a time

* Another common approach is known as "divide-and-conquer"

* Depends on a technique called _recursion_ (an algorithm calling itself)

* The basic idea is:

- **Divide** the problem into a number of subproblems that are smaller instances of the same problem

- **Conquer** the subproblems by solving them recursively

- **Combine** the solutions to the subproblems into the solution for the original problem

---

# Merge sort

* The first example of this we study is called merge sort

* Loosely, it operates as follows

- Divide: Divide the $n$-element sequence to be sorted into two
      subsequences of $n/2$ elements each

- Conquer: Sort the two subsequences
	    * If a subsequences is of length 1, it is already sorted, and there is nothing more to do
	    * Otherwise, sort it recursively using merge sort

- Combine: Merge the two sorted subsequences to produce the sorted answer

* The first two steps are essentially trivial

* Key operation: merge two sorted sequences in the "combine" step

---

# The merge step

* Done using an auxiliary procedure $\text{MERGE}(A, p, q, r)$, where

* $A$ is an array

* $p$, $q$, and $r$ are indices into the array such that $p \leq q < r$

* Assumes that subarrays $A[p,...,q]$ and $A[q+1,...,r]$ are in sorted order

* Goal is to merge them to into single sorted subarray that replaces the current subarray $A[p,...,r]$

---

# The merge step

* The essential idea of $\text{MERGE}$ is the following:

* Suppose we have two sorted piles on the table, with the smallest cards on top

* Start with a new empty pile

* Look at the top two cards, pick the smaller one, and add to new pile

* Repeat (if one pile empty, choose always from the other)

---

# The merge step

.algorithm[
.name[`merge(A, p, q, r)`]
n$\sub{1}$ = q - p + 1
n$\sub{2}$ = r - q
Create new arrays L[1, ... , n$\sub{1}$+1] and R[1, ..., n$\sub{2}$+1]
__for__ (i = 1, ..., n$\sub{1}$) { L[i] = A[ p+i-1] }
__for__ (j = 1, ..., n$\sub{2}$) { R[j] = A[ q+j] }
L[ n$\sub{1}$+1 ] = $\infty$    // sentinel values
R[ n$\sub{2}$+1 ] = $\infty$    // ensures that L and R never become empty
i = 1
j = 1
__for__ (k = p, ..., r) {
    __if__ (L[i] $\leq$ R[j]) {
        A[k] = L[i]
        i = i + 1
    }
    __else__ {
        A[k] = R[j]
        j = j + 1
    }
}
]

---

# The merge step

* It is easy to see that the runtime of $\text{merge}$ is linear in $n=r-p+1$

* One comparison needed to fill every position

* To prove correctness, consider the loop invariant

> At the start of each iteration of the main for loop, the subarray
> $A[p,...,k-1]$ contains the $k-p$ smallest elements of
> $L[1,...,n_1+1]$ and $R[1,...,n_2+1]$ in sorted order. Also, of the
> remaining elements, $L[i]$ and $R[j]$ are the smallest elements in
> their respective arrays.

---

# Correctness of `merge`

### Initialization

* Prior to the first iteration, we have $k = p$, so that the subarray
  $A[p,...,k-1]$ is empty

* This empty subarray contains the $k - p = 0$ smallest elements of
  $L$ and $R$

* As $i = j = 1$, $L[i]$ and $R[j]$ are the respective smallest
  elements not copied back into $A$

---

# Correctness of `merge`

### Maintenance

* Suppose that $L[i] \leq R[j]$

* Then $L[i]$ is the smallest element not yet copied back into $A$

* $A[p,...,k-1]$ already contains the $k-p$ smallest elements of $L$ and $R$

* So, after $L[i]$ is copied into $A[k]$, $A[p,...,k]$ will contain the $k - p + 1$ smallest elements

* Incrementing $k$ (in for loop) and $i$ reestablishes the loop invariant for the next iteration

* If instead $L[i] > R[j]$, then the other branch maintains the loop invariant

---

# Correctness of `merge`

### Termination

* At termination, $k = r + 1$

* By loop invariant,

> the subarray $A[p,...,k-1] \equiv A[p,...,r]$, contains the $k - p =
> r - p + 1$ smallest elements of $L[1,...,n_1]$ and $R[1,...,n_2]$,
> in sorted order

* The arrays $L$ and $R$ together contain $n_1 + n_2 + 2 = r - p + 3$ elements

* All but the two largest have been copied back into A, and these two largest elements are the sentinels

---

# Merge sort

* Using $\text{merge}$, the merge sort algorithm is now implemented as

.algorithm[
.name[`merge-sort(A, p, r)`]
if (p < r) {
    q = floor( (p+r)/2 )
    merge-sort(A, p, q)
    merge-sort(A, q+1, r)
    merge(A, p, q, r)
}
]

* In general, this sorts the subarray $A[p,...,r]$

* It is initially called as $\text{merge}(A, 1, n)$ for an $n$-element input array

---

# Analysis of divide and conquer algorithms

* The runtime of merge sort can be expressed as a recurrence

$$
T(n) = \begin{cases}
            \Theta(1) & n \leq 1 \cr
            2 T(\lceil n/2 \rceil) + \Theta(n) & \text{otherwise}
         \end{cases}
$$

* $\Theta(1)$ represents a constant cost of sorting a 0 or 1-element array

* The $\Theta(n)$ term is the cost of merging, including the
  (constant) cost of computing the split
--

* We will later see a general result that helps to solve recurrences of this form

* For now, we will derive the solution for merge sort based on heuristic arguments

---

# Analysis of merge sort

* We do this by constructing a so-called _recursion tree_

* For convenience, we assume that the input size $n$ is an exact power of $2$

* This means that each split is of exactly half the size

* This lets us rewrite the recurrence in a simpler form:

$$T(n) =
\begin{cases}
  c & n = 1 \cr
  2 T(n/2) + cn & n > 1
\end{cases}$$

---

# Recursion tree for merge sort

* Main observations:

* Each level of the tree requires $cn$ time

* There are $1 + \log_2 n$ levels in total

* This gives a total runtime of

$$
T(n) = cn (1 + \log_2 n) = \Theta( n \log n )
$$

]

---

# Growth of functions

* Before moving on, we will briefly discuss asymptotic growth notation

* Formally, we are interested in the behaviour of a function $f(n)$ as $n \to \infty$

* All functions we consider are from $\mathbb{N} \to \mathbb{R}$

* Sometimes we may abuse notation and consider functions with domain $\mathbb{R}$

---

# $\Theta$-notation

* Given a function $g: \mathbb{N} \to \mathbb{R}$, we define the _set_

<div>
\begin{eqnarray*}
\Theta(g(n)) = \lbrace f(n)~ &\mid & ~\exists~ c\sub{1}, c\sub{2} > 0 \text{ and } N \in \mathbb{N}
                               \text{ such that } \\
						&&	   n \geq N \implies 0
							   \leq c\sub{1} g(n) \leq f(n) \leq c\sub{2} g(n)
  \rbrace
\end{eqnarray*}
</div>

* That is, $f(n) \in \Theta(g(n))$ if $f(n)$ can be asymptotically bounded
  on both sides by multiples of $g(n)$

* We will usually write $f(n) = \Theta(g(n))$ to mean the same thing.

* Note that this definition implicitly requires $f(n)$ to be asymptotically non-negative

* We will assume this here as well as for other asymptotic notations used in
this course.

* The $\Theta$ notation is used to indicate *exact* order of growth

* The next two notations indicate upper and lower bounds

---

# $O$-notation

* The $O$-notation (usually pronounced "big-oh") indicates an asymptotic upper bound

$$
O(g(n)) = \left\lbrace f(n) ~\mid ~\exists~ c > 0 \text{ and }
  N \in \mathbb{N} \text{ such that } n \geq N \implies
  0 \leq  f(n) \leq c g(n) \right\rbrace
$$
--

* As before, we usually write $f(n) = \Theta(g(n))$ to mean $f(n) \in \Theta(g(n))$

* Note that $f(n) = \Theta(g(n)) \implies f(n) = O(g(n))$, that is,
  $\Theta(g(n)) \subseteq O(g(n))$
--

* The $O$-notation is important because upper bounds are often easier
  to prove (than lower bounds)

* That is often a sufficiently useful characterization of an algorithm

---

# $\Omega$-notation

* The $\Omega$-notation (pronounced "big-omega") similarly indicates
  an asymptotic lower bound

$$
\Omega(g(n)) = \left\lbrace f(n)~ \mid ~\exists~ c > 0 \text{ and }
  N \in \mathbb{N} \text{ such that } n \geq N \implies
  0 \leq c g(n) \leq  f(n) \right\rbrace
$$
--

* The proof of the following theorem is an exercise:

$$f(n) = \Theta(g(n)) \iff f(n) = \Omega(g(n)) \text{ and } f(n) = O(g(n))$$

* So, for example, if $T(n)$ is the running time of insertion sort, then we can say that

$$T(n) = \Omega(n) \text{ and } T(n) = O(n^2)$$

* But *not* that

$$T(n) = \Theta(n) \text{ or } T(n) = \Theta(n^2)$$

* However, if $T(n)$ denotes the worst-case running time of insertion sort, then

$$
T(n) = \Theta(n^2)
$$

---

# Arithmetic with asymptotic notation

* We will often do casual arithmetic with asymptotic notation

* Most of the time this is OK, but we should be careful about potential ambiguity

* Example: Consider the statement

$$a n^2 + bn + c = an^2 + \Theta(n)$$

* Here we use $\Theta(n)$ to actually mean a *function* $f(n) \in
  \Theta(n)$ (in this case, $f(n) = bn + c$)

* Similarly, we could write

$$2n^2 + \Theta(n) = \Theta(n^2)$$

* This means that whatever the choice of $f(n) \in \Theta(n)$ in the LHS, $2n^2 + f(n) = \Theta(n^2)$

---

# Arithmetic with asymptotic notation

* This kind of abuse of notation can sometimes lead to amiguity

* For example, if $f(n) = \Theta(n)$, then

$$
\sum_{i=1}^n f(i) = \Theta(n(n+1)/2) = \Theta(n^2)
$$

* We may write the following to mean the same thing:

$$\sum_{i=1}^n \Theta(i)$$

--
* But this is not the same as $\Theta(1) + \Theta(2) + \cdots + \Theta(n)$

* This may not even make sense (what is $\Theta(2)$ ?)

* Each $\Theta(i)$ may represent a different function

---

# $o$- and $\omega$-notation

* The $O$- and $\Omega$-notations indicate bounds that may or may not
  be asymptotically "tight"

* The "little-oh" and "little-omega" notations indicate strictly *non-tight* bounds

$$
o(g(n)) = \left\lbrace f(n) \colon ~\text{for all}~ c > 0, ~\exists~ N \in \mathbb{N}
                       \text{ such that } n \geq N \implies
				       0 \leq  f(n) \leq c g(n) \right\rbrace
$$

* and

$$
\omega(g(n)) = \left\lbrace f(n) \colon ~\text{for all}~ c > 0, ~\exists~ N \in \mathbb{N}
                \text{ such that } n \geq N \implies
				0 \leq c g(n) \leq  f(n) \right\rbrace
$$

---

# $o$- and $\omega$-notation

* Essentially, as $f(n)$ and $g(n)$ are asymptotically non-negative,

$$f(n) = o(g(n)) \implies \limsup \frac{f(n)}{g(n)} = 0 \implies \lim \frac{f(n)}{g(n)} = 0$$

* Similarly, $f(n) = \omega(g(n)) \implies \lim \frac{f(n)}{g(n)} = \infty$

* Refer to _Introduction to Algorithms_ (Cormen et al) for further properties of asymptotic notation

* We will use these properties as and when necessary

---

# Analyzing Divide and Conquer algorithms

* Runtime analysis of divide-and-conquer algorithms usually involves solving a recurrence

* We have already seen the example of merge sort

* Let $T(n)$ be the running time on a problem of size n

* We can write

$$T(n) =
\begin{cases}
  \Theta(1) & \text{if}~ n \leq c \cr
  a T(n/b) + D(n) + C(n) & \text{otherwise}
\end{cases}$$

* where $T(n)$ is constant if the problem is small enough (say $n \leq c$ for some constant $c$), and otherwise

- the division step produces $a$ subproblems, each of size $n/b$

- $D(n)$ is the time taken to divide the problem into subproblems,

- $C(n)$ is the time taken to combine the sub-solutions.

---

# Analyzing Divide and Conquer algorithms

* There are three common methods to solve recurrences.

- The _substitution method_: guess a bound and then use mathematical
      induction to prove it correct

- The _recursion-tree method_: convert the recurrence into a tree,
      and use techniques for bounding summations to solve the
      recurrence

- The _master method_ provides bounds for recurrences of the form
      $T(n) = aT(n/b) + f(n)$ for certain functions $f(n)$ that cover
      most common cases

---

# The substitution method

* The substitution method is basically to

1. Guess the form of the solution, and

2. Use mathematical induction to verify it

* Example (similar to merge sort): Find an upper bound for the recurrence

$$T(n) = 2 T(n/2) + n$$

* Suppose we guess that the solution is $T(n) = O(n \log\sub{2} n)$

* We need to prove that $T(n) \leq c n \log\sub{2} n$ for some constant $c > 0$

* Assume this holds for all positive $m < n$, in particular,

$$
T( n/2 ) \leq \frac{cn}{2}  \log\sub{2}  \frac{n}{2}
$$

---

# The substitution method

* Substituting, we have (provided $c \geq 1$)

$$
\begin{aligned}
T(n) & = &  2 T( n/2 ) + n \cr
     & \leq & 2 \frac12 c  n  \log\sub{2}  (n/2)  + n  \cr
     & = &  c n \log\sub{2} n - c n \log\sub{2} 2  + n  \cr
     & = &  c n \log\sub{2} n - c n  + n  \cr
     & \leq &  c n \log\sub{2} n   \cr
\end{aligned}
$$

---

# The substitution method

* Technically, we still need to prove the guess for a boundary condition.

* Let's try for $n=1$:

- Require $T(1) \leq c~1 \log\sub{2} 1 = 0$

- Not possible for any realistic value of $T(1)$

- So the solution is not true for $n=1$

* However, for $n=2$:

- Require $T(2) \leq c~2 \log\sub{2} 2 = 2c$

- Can be made to hold for some choice of $c>1$, whatever the value of $T(2) = 2 T(1) + 2$

* Similarly for $T(3)$

* Note that for $n > 3$, the induction step never makes use of $T(1)$ directly

---

# The substitution method

* Remark: be careful not to use asymptotic notation in the induction step

* Consider this proof to show $T(n) = O(n)$, assuming $T(m) \leq cm$ for $m<n$

$$
\begin{aligned}
  T(n) & = & 2 T(n / 2) + n  \cr
       & \leq & 2 c  n/2  + n  \cr
       & \leq & cn + n  \cr
       & =    & O(n)
\end{aligned}
$$

* The last step is invalid

* Unfortunately, making a good guess is not always easy, limiting the
usefulness of this method

---

# The recursion tree method

* This is the method we used to calculate the merge sort run time

* Usually this is helpful to derive a guess that we can then formally
prove using recursion

---

# The master method

* __The Master theorem__: Let $a \geq 1$ and $b > 1$ be constants, let
  $f(n)$ be a function, and

$$T(n) = a T(n/b) + f(n)$$

* Here $n/b$ could also floor or ceiling of $n/b$

* Then $T(n)$ has the following asymptotic bounds:

1. If $f(n) = O(n^{\log_b a - \varepsilon})$ for some constant
       $\varepsilon > 0$, then $T(n) = \Theta(n^{\log_b a})$

2. If $f(n) = \Theta(n^{\log_b a})$, then $T(n) = \Theta(n^{\log_b
	   a} \log\sub{2} n) = \Theta(f(n) \log\sub{2} n)$

3. If $f(n) = \Omega(n^{\log_b a + \varepsilon})$ for some
       constant $\varepsilon > 0$, and if $a f(n/b) \leq c f(n)$ for
       some constant $c < 1$ and all sufficiently large $n$, then
       $T(n) = \Theta(f(n))$

---

# The master method

* We will not prove the master theorem

* Note that we are essentially comparing $f(n)$ with $n^{\log_b a}$

* whichever is bigger (by a polynomial factor) determines the solution

* If they are the same size, we get an additional $\log n$ factor

* Additionally, the third case needs a regularity condition on $f(n)$

* Exercise: Use the master theorem to obtain the asymptotic order for

$$
T(n) = T(n/2) + cn
$$