Introduction

# Introduction

## Introductory Computer Programming

### Deepayan Sarkar

---

# About this course

- This is a __compulsory non-credit course__

- Does not count towards composite score
	
	- But you still need to pass (Pass marks is 35%)

- Goal: Develop programming skills and learn related theory

- These will be useful in several other courses

---

# About this course

- Will run over two semesters

- First semester:

- 9:30-11:00am Tuesdays and Thursdays
	
	- 3:30-4:30pm Friday (lab session)

- Only first month!

---

# Official syllabus

- Basics in Programming: flow-charts, logic in programming

- Common syntax

- Handling input/output files

- Sorting

- Iterative algorithms

- Simulations from statistical distributions

- Programming for statistical data analyses: regression, estimation,
  parametric tests

---

# Tentative plan

- Introduction to basic ideas

- Practice [Semester I]

- A high-level overview of R

- Basic usage of R

- Theory [Semester II]

- Algorithms: correctness and runtime analysis (mostly sorting)

- Computer representation of numbers

- Conditioning and stability

---

# Grading scheme

- Written exam: 20%

- Pratical exam: 20%

- Assignments: 30%

- Projects: 30%

---

# Exercise

- Think of tasks that cannot be easily done without a computer

- Could be both related and unrelated to what you are studying

---

# Some simple numeric examples

- Problems involving scalar objects only:

- Is a given natural number $n \in \mathbb{N}$ prime?

- Given integer $k \geq 0$, compute its factorial $k!$, and $\log k!$

- Given integers $n, k \geq 0$ such that $k \leq n$, compute $n \choose k$
	
--

- Problems that most likely will need more complicated objects to be solved:

- Find all prime numbers less than a given number $N$

- Sort a given collection (vector) of numbers

- Produce a random permutation of a given vector of numbers

- Given set $S$ and query object $x$, determine whether $x \in S$ (set membership)

- We will discuss vectors, but probably not more complicated data structures

---

# Some examples of simulation

- Simple random walk (+1 or -1 with probability $p$ and $1-p$):

- How long does it take to return to zero for the first time?

- When was the last return to zero before time $2n$?
--

- Toss a coin (with probability of head $p$) until you get $k$
  consecutive heads. 
  
    - Based on observed value, can you test for $p = \frac12$?
--
  
- Given a game of snakes and ladders, how many throws of the dice does
  it take to reach the end?
--

- Shuffle a deck of cards.

- How can we probabilistically model a shuffle?

- How many times do we need to shuffle to make the deck
	  approximately random?
  
	- How can we "test" for randomness?

---

# Some generally important problems

$\newcommand{\bs}{\boldsymbol}$

- Given a function $f$, solve for $f(x) = 0$, e.g.,

- solve non-linear equations like $e^x + \sin x = 0$

- solve linear equations $A \bs{x} = \bs{b}$ for
      vector $\bs{x}$ (e.g., as part of fitting linear models)
--

- Optimization: given a function $f$, find $\bs{x}$ where $f(\bs{x})$ is minimized

- Sometimes this can be done by solving $\nabla f(\bs{x}) = \bs{0}$

- Other solutions may be more practical depending on context

---

# Algorithms

- We will spend a lot of time discussing algorithms

- An algorithm is essentially a set of instructions to solve a problem

- Algorithms usually require some inputs

- Instructions are executed sequentially, finally resulting in an
  output (also called _return value_)

- You can think of an algorithm as a recipe (inputs: ingredients, output: food!)

---

# Example: is a given number $n$ prime?

---

- Basic idea: see if $n$ is divisible by any number between $2$ and $n-1$

- Obviously, enough to check whether $n$ is divisible by any number between $2$ and $\sqrt{n}$

- Intuitively, the second approach is more "efficient"

- Also, we can stop as soon as we find the first divisor

---

- Simple algorithms are often easy to understand as a _flowchart_

![flowchart](extra/isprime-flowchart.svg)

---

- But we will usually write algorithms in the form of _pseudo-code_ as follows:

.algorithm[
.name[`is\_prime(n)`]
i := 2
__while__ (i $\leq$ sqrt(n)) {
    __if__ (n mod i == 0) {
        __return__ FALSE
    }
    i := i + 1
}
__return__ TRUE
]

- Here we skip checking whether $n > 1$ (and that it is an integer)

---

# How to interpret an algorithm?

---

- The meaning of this algorithm / pseudo-code should be more or less obvious

- Assumes availability of certain basic operators / functions (mod, sqrt)

- We often employ some _conventions_ and use some _structures_ in pseudo-code

- For example,

.algorithm[
.name[`is\_prime(n)`]
i := 2                    // variable assignment
__while__ (i $\leq$ sqrt(n)) {    // loop while condition holds
    __if__ (n mod i == 0) {   // branch if condition holds
        __return__ FALSE      // exits with output value
    }                     // end of blocks within loops, branches, etc.
    i := i + 1            // update variable value
}
__return__ TRUE
]

---

- It is important to make sure that an algorithm makes sense

- Steps are executed sequentially, so the sequence must be clear
	
- It must be possible to evaluate each step

- All variables used must have been defined in a previous step
	
	- It is OK to call other functions (or algorithms), but they must be clearly defined

- It is even OK for an algorithm to call itself (this is known as _recursion_)

---

# Pseudo-code

---

- The general structure of algorithms is derived from a language called [ALGOL](https://en.wikipedia.org/wiki/ALGOL)

- However, there are no fixed rules that pseudo-code must follow

- An alternative form of our `is_prime` algorithm could be:

.algorithm[
.name[`is\_prime(n)`]
i = 2                  // different assignment operator
__while__ i $\leq$ sqrt(n)     // end of loop indicated by indentation
    __if__ n mod i == 0
        __return__ FALSE
    i = i + 1
__return__ TRUE
]

---

- Another form could be:

.algorithm[
.name[`is\_prime(n)`]
i $\leftarrow$ 2                 // yet another assignment operator
__while__ i $\leq$ sqrt(n)     // end of loop indicated by __end__ keyword
    __if__ n mod i == 0
        __return__ FALSE
    __end__
    i $\leftarrow$ i + 1
__end__
__return__ TRUE
]

- Any of these forms are fine as long as

- the steps of the algorithm are clearly specified

- the essential ideas are expressed without ambiguity

---

# Theoretical questions about algorithms

* __Is an algorithm correct?__ To be correct, an algorithm must

- stop after a finite number of steps, and

- produce the _correct output_ for _all possible inputs_ (i.e.,
	  all _instances_ of the problem).
  
* __How efficient is the algorithm?__

- What resources does the algorithm need to run, typically in
	  terms of time and storage? 
	  
	- How does it compare with other algorithms for the same problem?

* To answer such questions, we need a model for computation

---

# Ingredients of a computational model

* There are actually many different approaches to programming

* We will mostly consider [structured programming](https://en.wikipedia.org/wiki/Structured_programming)

* Characterized by use of various control flow constructs (if, then,
  while, for, etc.) and
  [block structures](https://en.wikipedia.org/wiki/Code_block)
--

* More specifically, we will focus of [procedural programming](https://en.wikipedia.org/wiki/Procedural_programming)

* Characterized by use of modular procedures (usually called functions)

* We will be mainly interested in procedures that perform some
	  computations

* Most algorithms we discuss will directly correspond to
      procedures or functions when implemented
--

* We will not discuss other kinds of programs (e.g., operating system,
  web browser, editor, etc.)

---

# Questions?

---

# Exercises

1. Write an algorithm to compute $k!$ given $k$

2. Write an algorithm to compute $\log k!$ given $k$

3. Write an algorithm to compute $n \choose k$ given $n, k$

---

# Functions and control flow structures

---

* The main components of our programs are going to be functions

* Functions usually

- have one or more input arguments,

- perform some computations, possibly calling other functions, and

- return one or more output values.

* The second step is the main contribution of a function
--

* Usually a programming language will already have many built-in functions

* These can be called by other functions

* Knowing what is available is an essential part of "learning" a language

---

* The standard model for performing computations is __sequential
  execution__

* In other words, a function executes a set of instructions in a specified sequence

* Some control flow structures may be used to create branches or loops
  in the flow of execution
  
---

* Briefly, the main ingredients used are

- Declaration of variables (implicit in some languages)

- Evaluation of expressions. _Can involve variables provided they
	  have been defined in an earlier step_

- Assignment to variables (to store intermediate results for later use)

- Logical tests (equal?, less than?, greater than?, is more input available?)

- Logical operations (AND, OR, NOT, XOR)

- Branching - take different paths based on result of a logical
	  operation (if-then-else)

- Loops - repeat sequence of steps, a fixed number of times, or
	  while a condition holds (for / while)
--

* The details of how variables store values, and who can access them
  (scope) are important
  
* But we will not worry about these issues for now

---

# Common operators (may have language-specific variants)

- _Mathematical operators_: 
    - `+` (addition)
	- `*` (multiplication)
	- `/` (division --- possibly integer division)
	- `^` (power)
	- `%` (the modulo operation)

- _Logical operators_: 
	- `&` (AND)
	- `|` (OR)
	- `!` (NOT)

- _Comparisons_: 
	- `==` (equality)
	- `!=` ($\neq$)
	- `<`, `>` (strictly less than or greater than) 
	- `<=` `>=` ($\leq$, $\geq$)

- _Mathematical functions_: `round, floor, ceil, abs, sqrt, exp, log, sin, cos, ...`

---

# Practical implementation: programming languages

* The algorithms we discuss can be implemented in many programming languages

* Some standard languages suitable for structured programming are

- [C](https://en.wikipedia.org/wiki/C_%28programming_language%29) (compiled)
	- [C++](https://en.wikipedia.org/wiki/C_%28programming_language%29) (compiled)
	- [R](https://en.wikipedia.org/wiki/R_%28programming_language%29) (interpreted)
	- [Python](https://en.wikipedia.org/wiki/Python_%28programming_language%29) (interpreted)
	- [Julia](https://en.wikipedia.org/wiki/Julia_%28programming_language%29) (interpreted)

* There are also many others with various relative strengths and weaknesses

* In this course, we will mainly focus on

- __R__ because it already has an extensive collection of statistical tools that we can use

- __C__ / __C++__ because it is easy to call C / C++ code from R (useful when R code is inefficient)

---

# Example: The `is_prime` algorithm in various languages

* Recall the `is_prime` algorithm to determine if a number is prime

* With slight modification to use only integer arithmetic

.algorithm[
.name[`is\_prime(n)`]
i := 2
__while__ (i * i $\leq$ n) {
    __if__ (n mod i == 0) {
        __return__ FALSE
    }
  i := i + 1
}
__return__ TRUE
]

---

# Example: The `is_prime` algorithm in various languages

* Implemented in C, the algorithm would look like this:

```c
int is_prime_c(int n) 
{
	int i = 2;
	while (i * i <= n) {
		if (n % i == 0) {
			return 0;
		}
		i = i + 1;
	}
	return 1;
}
```

* C is a compiled language, so actually running this code involves
  some additional work

* Note that all variable _types_ need to be explicitly declared

* This includes the types of function arguments (inputs) and return value (output)

---

# Example: The `is_prime` algorithm in various languages

* The same algorithm would look like this in R:

```r
is_prime_r <- function(n)
{
	i <- 2
	while (i * i <= n) {
		if (n %% i == 0) {
			return (FALSE)
		}
		i <- i + 1
	}
	return (TRUE)
}
```

* The basic structure is very similar, but with some differences:

- The assignment operator is different (but `=` also works in R)
    - The function declaration looks like a variable assignment
    - The modulo operator is `%%` instead of `%`
    - Uses `TRUE` and `FALSE` instead of `1` and `0` for logical values
	- Statements do not end with a semicolon (although they could)
	- Variable types are not declared
	- The return value must be put in parentheses

---

# Example: The `is_prime` algorithm in various languages

* We can call this function after starting R and copy-pasting the function definition

```r
is_prime_r(4)
```

```
[1] FALSE
```

```r
is_prime_r(10)
```

```
[1] FALSE
```

```r
is_prime_r(100)
```

```
[1] FALSE
```

```r
is_prime_r(101)
```

```
[1] TRUE
```

---

# Example: The `is_prime` algorithm in various languages

* The implementation looks a little different in Python:

```python
def is_prime_py(n):
	i = 2
	while i * i <= n:
		if n % i == 0:
			return 0;
		i = i + 1
	return 1
```

* The main difference is in how code blocks are defined:

- start with a colon (`:`) 
	
	- end is defined by indentation (amount of space in the beginning)

* Changing indentation will change meaning of code, which does not happen in C or R

* However, code in all languages _should be indented properly for readability_

---

# Example: The `is_prime` algorithm in various languages

* Again, we can start python, define the function, and run the following code

```python
print(is_prime_py(4))
```

```
0
```

```python
print(is_prime_py(10))
```

```
0
```

```python
print(is_prime_py(100))
```

```
0
```

```python
print(is_prime_py(101))
```

```
1
```

---

# How can we run C / C++ code?

---

* The code needs to be "compiled" before it is run

* It also needs a `main()` function to be defined

* `main()` is run first when the program is executed

---

```c
#include <stdio.h>
#include <stdlib.h>

int is_prime_c(int n) 
{
    int i = 2;
    while (i * i <= n) {
		if (n % i == 0) {
			return 0;
		}
		i = i + 1;
    }
    return 1;
}

int main(int argc, char *argv[])
{
    int i, n;
    if (argc > 1) {    /* one or more arguments supplied  */
		for (i = 1; i < argc; i++) {
			n = atoi(argv[i]); 	/* converts string to integer */
			printf("%d -> %d\n", n, is_prime_c(n));
		}
    }
    else printf("Usage: %s <n1> <n2> ...\n", argv[0]);
    return 0;
}
```

---

# Compiled code vs interpreted code

* R, Python, etc., are "interpreted" languages that read and evaluate code interactively

* Compiled code is usually (but not always) much faster than interpreters

---

# Plan

* First semester: We will focus on learning R

* Second semester:

* Algorithms
	
	* Using other languages
	
	* Projects

---

# Questions?