using namespace Rcpp;
// [[Rcpp::export]]
NumericVector insertion_sort_rcpp(NumericVector x)
{
int i, j, n = x.size();
double key;
NumericVector A = clone(x);
for (int j = 1; j < n; j++) {
key = A[j];
i = j - 1;
while (i > -1 && A[i] > key) {
A[i+1] = A[i];
i = i - 1;
}
A[i+1] = key;
}
return A;
}
```
---
# Insertion sort in C++
```r
(A <- round(runif(10), 2))
```
```
[1] 0.90 0.54 0.36 0.94 0.13 0.54 0.82 0.64 0.43 0.28
```
```r
insertion_sort_rcpp(A)
```
```
[1] 0.13 0.28 0.36 0.43 0.54 0.54 0.64 0.82 0.90 0.94
```
```r
A # unchanged
```
```
[1] 0.90 0.54 0.36 0.94 0.13 0.54 0.82 0.64 0.43 0.28
```
---
# Run time comparison
```r
trcpp <- sapply(n, timeSort, nrep = 5, sort.fun = insertion_sort_rcpp)
xyplot(tinsertion + tpython + trcpp ~ n, grid = TRUE, outer = TRUE, ylab = "time (seconds)")
```

---
# Run time comparison
```r
xyplot(tinsertion + tpython + trcpp ~ n, grid = TRUE, outer = TRUE,
scales = list(y = "free"), ylab = "time (seconds)")
```

---
# Run time comparison (for larger inputs)
```r
trcpp10 <- sapply(10 * n, timeSort, nrep = 5, sort.fun = insertion_sort_rcpp)
tsort <- sapply(10 * n, timeSort, nrep = 5, sort.fun = sort)
xyplot(trcpp10 + tsort ~ (10 * n), grid = TRUE, outer = TRUE, ylab = "time (seconds)", aspect = 1)
```

---
# Run time comparison (for larger inputs)
```r
tsort <- sapply(100 * n, timeSort, nrep = 5, sort.fun = sort)
xyplot(tsort ~ (100 * n), grid = TRUE, outer = TRUE, ylab = "time (seconds)", aspect = 1)
```

---
# Run time comparison: summary
* Run time may vary substantially depending on implementation
* Even a C++ implementation of insertion sort is mich slower than built in `sort()` in R
* As a crude approximation, run time of insertion sort seems to be roughly quadratic in input size
* Can we validate this observation theoretically?
---
# Theoretical analysis of algorithms
* Analysis of an algorithm means predicting the resources requires by it, e.g.,
* amount of memory
* amount of input-output
* (most commonly) amount of computational time
* This helps identify efficient algorithms when multiple candidates available
* Such analysis may indicate multiple viable candidates, but helps to discard inferior ones
---
# Theoretical model
* Analysis of an algorithm requires a _model_ of the implementation technology
* Specifically, we need model for the resources and their associated costs
--
* We will assume a _single-processor random access machine (RAM)_ model
* This has a precise technical meaning, but for our purposes, it means that
* Instructions are executed one after another, with no concurrent
operations
* Accessing any location in memory has the same cost, regardless
of the location
---
# Theoretical model
* In particular, accessing variable values (memory look-up) requires constant time
* Arrays are assumed to occupy contiguous locations in memory
* In other words, location of $A[i]$ = location of $A[1]$ + constant * $(i-1)$
* So accessing any $A[i]$ has same cost
* Drawback: arrays cannot be resized without incurring significant cost (by copying)
---
# Theoretical model
* We can be more precise, by
* listing the set of basic instructions the machine can perform
* E.g., add, multiply, data copy, move, branching, etc.
* Model the cost of each such operation
* We will not try to be that precise
* With reasonable assumptions, we will still be able to do reasonable analysis
---
# Runtime analysis of insertion sort
* Intuitively clear that time taken by insertion sort depends on several factors:
- Size of the input (longer arrays will need more time)
- Whether the array is already (almost) sorted
- In that case the position of the key is found quickly in every step
* We need to formalize both these dependencies
--
* Notion of _input size_ depends on the context
* For sorting problem, length of the input array is the natural notion
* For multiplying two numbers, a reasonable notion may be their magnitudes
--
* To take the nature of input into account, we usually consider
* worst case
* best case
* average case
---
# How should we define "running time"?
* Ideally, sum of the times taken (or _cost_) for each basic
instruction in the machine.
* We take a slightly different approach
* Instead of assigning a cost to each basic instruction, we assign a
cost to each _step_ in our algorithm
* Then, count the number of times each step is executed
---
# Runtime analysis of insertion sort
* Try this for insertion sort
* Assume a cost for each line of the algorithm
.algorithm[
.name[`insertion-sort(A)`] cost
for (j = 2 to A.length) { $c_1$
key = A[j] $c_2$
i = j - 1 $c_3$
while (i > 0 and A[i] > key) { $c_4$
A[i+1] = A[i] $c_5$
i = i - 1 $c_6$
}
A[i+1] = key $c_7$
}
]
* We need to count the number of times each step is executed
* This depends on the number of times the while loop runs, which depends on the input
---
# Runtime analysis of insertion sort
* Let $t_j$ denote the number of times the while condition is tested for index $j$
* The test will be false for the last iteration (and the loop will not run)
.algorithm[
.name[`insertion-sort(A)`] cost * times
for (j = 2 to A.length) { $c\_1 \times n$
key = A[j] $c\_2 \times (n-1)$
i = j - 1 $c\_3 \times (n-1)$
while (i > 0 and A[i] > key) { $c\_4 \times \sum\_{j=2}^n t\_j$
A[i+1] = A[i] $c\_5 \times \sum\_{j=2}^n (t\_j-1)$
i = i - 1 $c\_6 \times \sum\_{j=2}^n (t\_j-1)$
}
A[i+1] = key $c\_7 \times n-1$
}
]
* The total running time (cost) is
$$
T(n) = c_1 n + (c_2+c_3+c_7) (n-1) + c_4 \left( \sum t_j \right) +
(c_5+c_6) \left( \sum t_j - 1 \right)
$$
---
# Runtime analysis of insertion sort
* Runtime of insertion sort
$$
T(n) = c_1 n + (c_2+c_3+c_7) (n-1) + c_4 \left( \sum t_j \right) +
(c_5+c_6) \left( \sum t_j - 1 \right)
$$
* Depends on the values of $t_j$
* If input is already sorted, then $t_j=1$ for all $j$, and hence
$$
T(n) = c_1 n + (c_2+c_3+c_7+c_4) (n-1) = an + b,
$$
* In other words, $T(n)$ is linear in $n$, with coefficients $a$ and $b$ that depend on the costs $c_i$
* This is the _best case_ scenario
---
# Runtime analysis of insertion sort
* Runtime of insertion sort
$$
T(n) = c_1 n + (c_2+c_3+c_7) (n-1) + c_4 \left( \sum t_j \right) +
(c_5+c_6) \left( \sum t_j - 1 \right)
$$
* The _worst case_ scenario is when the array is reverse sorted
* In that case, $t_j = j$ for all $j$
* Noting that $\sum\limits_2^n j = \frac{n(n+1)}{2} - 1$ and $\sum\limits_2^n (j-1) = \frac{n(n-1)}{2}$, we have
$$
T(n) = an^2 + bn + c
$$
* In other words, $T(n)$ is quadratic, with coefficients $a, b, c$ that depend on the costs $c_i$
---
# Runtime analysis of insertion sort
* The best case scenario is usually not of interest
* An algorithm is typically evaluated based on its worst case running time
* Another reasonable definition is the _average case_ running time
* For the sorting problem, this is defined as the
* Expected running time if the input is randomly ordered
* More precisely, "randomly ordered" means all permutations are equally likely
---
# Assignment
* Derive the average case running time of insertion sort
* Modify the insertion sort algorithm to return a _permutation_ that will sort the input
* Specifically, `p <- insertion_order(A)` should give an index vector `p` such that `A[p]` is sorted
* Implement this modified algorithm using both R and Rcpp
* To use Rcpp, you must first install a compiler and other tools [from here](https://cran.rstudio.com/bin/windows/Rtools/)
* See also the [RStudio page for Rcpp](https://support.rstudio.com/hc/en-us/articles/200486088-Using-Rcpp-with-RStudio) for other resources
---
class: center middle
# Questions?
---
layout: true
# Order of growth
---
* Note that we have ignored the exact costs $c_i$ for each step
* Instead, we express the worst-case running time as $T(n) = an^2 + bn + c$
* As $n$ grows larger, this is dominated by the $n^2$ term
* Lower order terms (linear and constant) are asymptotically insignificant compared to $n^2$
* For this reason, we usually simplify further and say that the _order of growth_ of $T(n)$ is like $n^2$
* This is indicated using the notation
$$
T(n) = \Theta(n^2)
$$
---
* One algorithm is considered better than another if it has lower order of growth
* This is true even if the second one is faster for small input (as it will be slower for large enough input)
* If two algorithms have same order of growth, the coefficients may be important in practice
* However, theoretical analysis will usually consider them to be equivalent
---
layout: false
# Divide and Conquer
* Insertion sort is an _incremental algorithm_: modifies the input one step at a time
* Another common approach is known as "divide-and-conquer"
* Depends on a technique called _recursion_ (an algorithm calling itself)
* The basic idea is:
- **Divide** the problem into a number of subproblems that are smaller instances of the same problem
- **Conquer** the subproblems by solving them recursively
- **Combine** the solutions to the subproblems into the solution for the original problem
---
# Merge sort
* The first example of this we study is called merge sort
* Loosely, it operates as follows
- Divide: Divide the $n$-element sequence to be sorted into two
subsequences of $n/2$ elements each
- Conquer: Sort the two subsequences
* If a subsequences is of length 1, it is already sorted, and there is nothing more to do
* Otherwise, sort it recursively using merge sort
- Combine: Merge the two sorted subsequences to produce the sorted answer
* The first two steps are essentially trivial
* Key operation: merge two sorted sequences in the "combine" step
---
# The merge step
* Done using an auxiliary procedure $\text{MERGE}(A, p, q, r)$, where
* $A$ is an array
* $p$, $q$, and $r$ are indices into the array such that $p \leq q < r$
* Assumes that subarrays $A[p,...,q]$ and $A[q+1,...,r]$ are in sorted order
* Goal is to merge them to into single sorted subarray that replaces the current subarray $A[p,...,r]$
---
# The merge step
* The essential idea of $\text{MERGE}$ is the following:
* Suppose we have two sorted piles on the table, with the smallest cards on top
* Start with a new empty pile
* Look at the top two cards, pick the smaller one, and add to new pile
* Repeat (if one pile empty, choose always from the other)
---
# The merge step
.algorithm[
.name[`merge(A, p, q, r)`]
n$\sub{1}$ = q - p + 1
n$\sub{2}$ = r - q
Create new arrays L[1, ... , n$\sub{1}$+1] and R[1, ..., n$\sub{2}$+1]
__for__ (i = 1, ..., n$\sub{1}$) { L[i] = A[ p+i-1] }
__for__ (j = 1, ..., n$\sub{2}$) { R[j] = A[ q+j] }
L[ n$\sub{1}$+1 ] = $\infty$ // sentinel values
R[ n$\sub{2}$+1 ] = $\infty$ // ensures that L and R never become empty
i = 1
j = 1
__for__ (k = p, ..., r) {
__if__ (L[i] $\leq$ R[j]) {
A[k] = L[i]
i = i + 1
}
__else__ {
A[k] = R[j]
j = j + 1
}
}
]
---
# The merge step
* It is easy to see that the runtime of $\text{merge}$ is linear in $n=r-p+1$
* One comparison needed to fill every position
--
* To prove correctness, consider the loop invariant
> At the start of each iteration of the main for loop, the subarray
> $A[p,...,k-1]$ contains the $k-p$ smallest elements of
> $L[1,...,n_1+1]$ and $R[1,...,n_2+1]$ in sorted order. Also, of the
> remaining elements, $L[i]$ and $R[j]$ are the smallest elements in
> their respective arrays.
---
# Correctness of `merge`
### Initialization
* Prior to the first iteration, we have $k = p$, so that the subarray
$A[p,...,k-1]$ is empty
* This empty subarray contains the $k - p = 0$ smallest elements of
$L$ and $R$
* As $i = j = 1$, $L[i]$ and $R[j]$ are the respective smallest
elements not copied back into $A$
---
# Correctness of `merge`
### Maintenance
* Suppose that $L[i] \leq R[j]$
* Then $L[i]$ is the smallest element not yet copied back into $A$
* $A[p,...,k-1]$ already contains the $k-p$ smallest elements of $L$ and $R$
* So, after $L[i]$ is copied into $A[k]$, $A[p,...,k]$ will contain the $k - p + 1$ smallest elements
* Incrementing $k$ (in for loop) and $i$ reestablishes the loop invariant for the next iteration
* If instead $L[i] > R[j]$, then the other branch maintains the loop invariant
---
# Correctness of `merge`
### Termination
* At termination, $k = r + 1$
* By loop invariant,
> the subarray $A[p,...,k-1] \equiv A[p,...,r]$, contains the $k - p =
> r - p + 1$ smallest elements of $L[1,...,n_1]$ and $R[1,...,n_2]$,
> in sorted order
* The arrays $L$ and $R$ together contain $n_1 + n_2 + 2 = r - p + 3$ elements
* All but the two largest have been copied back into A, and these two largest elements are the sentinels
---
# Merge sort
* Using $\text{merge}$, the merge sort algorithm is now implemented as
.algorithm[
.name[`merge-sort(A, p, r)`]
if (p < r) {
q = floor( (p+r)/2 )
merge-sort(A, p, q)
merge-sort(A, q+1, r)
merge(A, p, q, r)
}
]
* In general, this sorts the subarray $A[p,...,r]$
* It is initially called as $\text{merge}(A, 1, n)$ for an $n$-element input array
---
# Analysis of divide and conquer algorithms
* The runtime of merge sort can be expressed as a recurrence
$$
T(n) = \begin{cases}
\Theta(1) & n \leq 1 \cr
2 T(\lceil n/2 \rceil) + \Theta(n) & \text{otherwise}
\end{cases}
$$
* $\Theta(1)$ represents a constant cost of sorting a 0 or 1-element array
* The $\Theta(n)$ term is the cost of merging, including the
(constant) cost of computing the split
--
* We will later see a general result that helps to solve recurrences of this form
* For now, we will derive the solution for merge sort based on heuristic arguments
---
# Analysis of merge sort
* We do this by constructing a so-called _recursion tree_
* For convenience, we assume that the input size $n$ is an exact power of $2$
* This means that each split is of exactly half the size
* This lets us rewrite the recurrence in a simpler form:
$$T(n) =
\begin{cases}
c & n = 1 \cr
2 T(n/2) + cn & n > 1
\end{cases}$$
---
# Recursion tree for merge sort
.right-half[
]
.left-half[
* Main observations:
* Each level of the tree requires $cn$ time
* There are $1 + \log_2 n$ levels in total
* This gives a total runtime of
$$
T(n) = cn (1 + \log_2 n) = \Theta( n \log n )
$$
]
---
# Growth of functions
* Before moving on, we will briefly discuss asymptotic growth notation
* Formally, we are interested in the behaviour of a function $f(n)$ as $n \to \infty$
* All functions we consider are from $\mathbb{N} \to \mathbb{R}$
* Sometimes we may abuse notation and consider functions with domain $\mathbb{R}$
---
# $\Theta$-notation
* Given a function $g: \mathbb{N} \to \mathbb{R}$, we define the _set_
\begin{eqnarray*}
\Theta(g(n)) = \lbrace f(n)~ &\mid & ~\exists~ c\sub{1}, c\sub{2} > 0 \text{ and } N \in \mathbb{N}
\text{ such that } \\
&& n \geq N \implies 0
\leq c\sub{1} g(n) \leq f(n) \leq c\sub{2} g(n)
\rbrace
\end{eqnarray*}
--
* That is, $f(n) \in \Theta(g(n))$ if $f(n)$ can be asymptotically bounded
on both sides by multiples of $g(n)$
--
* We will usually write $f(n) = \Theta(g(n))$ to mean the same thing.
--
* Note that this definition implicitly requires $f(n)$ to be asymptotically non-negative
* We will assume this here as well as for other asymptotic notations used in
this course.
* The $\Theta$ notation is used to indicate *exact* order of growth
* The next two notations indicate upper and lower bounds
---
# $O$-notation
* The $O$-notation (usually pronounced "big-oh") indicates an asymptotic upper bound
$$
O(g(n)) = \left\lbrace f(n) ~\mid ~\exists~ c > 0 \text{ and }
N \in \mathbb{N} \text{ such that } n \geq N \implies
0 \leq f(n) \leq c g(n) \right\rbrace
$$
--
* As before, we usually write $f(n) = \Theta(g(n))$ to mean $f(n) \in \Theta(g(n))$
* Note that $f(n) = \Theta(g(n)) \implies f(n) = O(g(n))$, that is,
$\Theta(g(n)) \subseteq O(g(n))$
--
* The $O$-notation is important because upper bounds are often easier
to prove (than lower bounds)
* That is often a sufficiently useful characterization of an algorithm
---
# $\Omega$-notation
* The $\Omega$-notation (pronounced "big-omega") similarly indicates
an asymptotic lower bound
$$
\Omega(g(n)) = \left\lbrace f(n)~ \mid ~\exists~ c > 0 \text{ and }
N \in \mathbb{N} \text{ such that } n \geq N \implies
0 \leq c g(n) \leq f(n) \right\rbrace
$$
--
* The proof of the following theorem is an exercise:
$$f(n) = \Theta(g(n)) \iff f(n) = \Omega(g(n)) \text{ and } f(n) = O(g(n))$$
* So, for example, if $T(n)$ is the running time of insertion sort, then we can say that
$$T(n) = \Omega(n) \text{ and } T(n) = O(n^2)$$
* But *not* that
$$T(n) = \Theta(n) \text{ or } T(n) = \Theta(n^2)$$
* However, if $T(n)$ denotes the worst-case running time of insertion sort, then
$$
T(n) = \Theta(n^2)
$$
---
# Arithmetic with asymptotic notation
* We will often do casual arithmetic with asymptotic notation
* Most of the time this is OK, but we should be careful about potential ambiguity
* Example: Consider the statement
$$a n^2 + bn + c = an^2 + \Theta(n)$$
* Here we use $\Theta(n)$ to actually mean a *function* $f(n) \in
\Theta(n)$ (in this case, $f(n) = bn + c$)
* Similarly, we could write
$$2n^2 + \Theta(n) = \Theta(n^2)$$
* This means that whatever the choice of $f(n) \in \Theta(n)$ in the LHS, $2n^2 + f(n) = \Theta(n^2)$
---
# Arithmetic with asymptotic notation
* This kind of abuse of notation can sometimes lead to amiguity
* For example, if $f(n) = \Theta(n)$, then
$$
\sum_{i=1}^n f(i) = \Theta(n(n+1)/2) = \Theta(n^2)
$$
--
* We may write the following to mean the same thing:
$$\sum_{i=1}^n \Theta(i)$$
--
* But this is not the same as $\Theta(1) + \Theta(2) + \cdots + \Theta(n)$
* This may not even make sense (what is $\Theta(2)$ ?)
* Each $\Theta(i)$ may represent a different function
---
# $o$- and $\omega$-notation
* The $O$- and $\Omega$-notations indicate bounds that may or may not
be asymptotically "tight"
* The "little-oh" and "little-omega" notations indicate strictly *non-tight* bounds
$$
o(g(n)) = \left\lbrace f(n) \colon ~\text{for all}~ c > 0, ~\exists~ N \in \mathbb{N}
\text{ such that } n \geq N \implies
0 \leq f(n) \leq c g(n) \right\rbrace
$$
* and
$$
\omega(g(n)) = \left\lbrace f(n) \colon ~\text{for all}~ c > 0, ~\exists~ N \in \mathbb{N}
\text{ such that } n \geq N \implies
0 \leq c g(n) \leq f(n) \right\rbrace
$$
---
# $o$- and $\omega$-notation
* Essentially, as $f(n)$ and $g(n)$ are asymptotically non-negative,
$$f(n) = o(g(n)) \implies \limsup \frac{f(n)}{g(n)} = 0 \implies \lim \frac{f(n)}{g(n)} = 0$$
* Similarly, $f(n) = \omega(g(n)) \implies \lim \frac{f(n)}{g(n)} = \infty$
* Refer to _Introduction to Algorithms_ (Cormen et al) for further properties of asymptotic notation
* We will use these properties as and when necessary
---
# Analyzing Divide and Conquer algorithms
* Runtime analysis of divide-and-conquer algorithms usually involves solving a recurrence
* We have already seen the example of merge sort
--
* Let $T(n)$ be the running time on a problem of size n
--
* We can write
$$T(n) =
\begin{cases}
\Theta(1) & \text{if}~ n \leq c \cr
a T(n/b) + D(n) + C(n) & \text{otherwise}
\end{cases}$$
* where $T(n)$ is constant if the problem is small enough (say $n \leq c$ for some constant $c$), and otherwise
- the division step produces $a$ subproblems, each of size $n/b$
- $D(n)$ is the time taken to divide the problem into subproblems,
- $C(n)$ is the time taken to combine the sub-solutions.
---
# Analyzing Divide and Conquer algorithms
* There are three common methods to solve recurrences.
- The _substitution method_: guess a bound and then use mathematical
induction to prove it correct
- The _recursion-tree method_: convert the recurrence into a tree,
and use techniques for bounding summations to solve the
recurrence
- The _master method_ provides bounds for recurrences of the form
$T(n) = aT(n/b) + f(n)$ for certain functions $f(n)$ that cover
most common cases
---
# The substitution method
* The substitution method is basically to
1. Guess the form of the solution, and
2. Use mathematical induction to verify it
* Example (similar to merge sort): Find an upper bound for the recurrence
$$T(n) = 2 T(n/2) + n$$
* Suppose we guess that the solution is $T(n) = O(n \log\sub{2} n)$
* We need to prove that $T(n) \leq c n \log\sub{2} n$ for some constant $c > 0$
* Assume this holds for all positive $m < n$, in particular,
$$
T( n/2 ) \leq \frac{cn}{2} \log\sub{2} \frac{n}{2}
$$
---
# The substitution method
* Substituting, we have (provided $c \geq 1$)
$$
\begin{aligned}
T(n) & = & 2 T( n/2 ) + n \cr
& \leq & 2 \frac12 c n \log\sub{2} (n/2) + n \cr
& = & c n \log\sub{2} n - c n \log\sub{2} 2 + n \cr
& = & c n \log\sub{2} n - c n + n \cr
& \leq & c n \log\sub{2} n \cr
\end{aligned}
$$
---
# The substitution method
* Technically, we still need to prove the guess for a boundary condition.
* Let's try for $n=1$:
- Require $T(1) \leq c~1 \log\sub{2} 1 = 0$
- Not possible for any realistic value of $T(1)$
- So the solution is not true for $n=1$
--
* However, for $n=2$:
- Require $T(2) \leq c~2 \log\sub{2} 2 = 2c$
- Can be made to hold for some choice of $c>1$, whatever the value of $T(2) = 2 T(1) + 2$
* Similarly for $T(3)$
* Note that for $n > 3$, the induction step never makes use of $T(1)$ directly
---
# The substitution method
* Remark: be careful not to use asymptotic notation in the induction step
* Consider this proof to show $T(n) = O(n)$, assuming $T(m) \leq cm$ for $m 1$ be constants, let
$f(n)$ be a function, and
$$T(n) = a T(n/b) + f(n)$$
* Here $n/b$ could also floor or ceiling of $n/b$
--
* Then $T(n)$ has the following asymptotic bounds:
1. If $f(n) = O(n^{\log_b a - \varepsilon})$ for some constant
$\varepsilon > 0$, then $T(n) = \Theta(n^{\log_b a})$
2. If $f(n) = \Theta(n^{\log_b a})$, then $T(n) = \Theta(n^{\log_b
a} \log\sub{2} n) = \Theta(f(n) \log\sub{2} n)$
3. If $f(n) = \Omega(n^{\log_b a + \varepsilon})$ for some
constant $\varepsilon > 0$, and if $a f(n/b) \leq c f(n)$ for
some constant $c < 1$ and all sufficiently large $n$, then
$T(n) = \Theta(f(n))$
---
# The master method
* We will not prove the master theorem
* Note that we are essentially comparing $f(n)$ with $n^{\log_b a}$
* whichever is bigger (by a polynomial factor) determines the solution
* If they are the same size, we get an additional $\log n$ factor
* Additionally, the third case needs a regularity condition on $f(n)$
* Exercise: Use the master theorem to obtain the asymptotic order for
$$
T(n) = T(n/2) + cn
$$