class: center, middle # Introduction ## Introductory Computer Programming ### Deepayan Sarkar
--- # About this course - This is a __compulsory non-credit course__ - Does not count towards composite score - But you still need to pass (Pass marks is 35%) -- - Goal: Develop programming skills and learn related theory - These will be useful in several other courses --- # About this course - Will run over two semesters - First semester: - 9:30-11:00am Tuesdays and Thursdays - 3:30-4:30pm Friday (lab session) - Only first month! --- # Official syllabus - Basics in Programming: flow-charts, logic in programming - Common syntax - Handling input/output files - Sorting - Iterative algorithms - Simulations from statistical distributions - Programming for statistical data analyses: regression, estimation, parametric tests --- # Tentative plan - Introduction to basic ideas - Practice [Semester I] - A high-level overview of R - Basic usage of R - Theory [Semester II] - Algorithms: correctness and runtime analysis (mostly sorting) - Computer representation of numbers - Conditioning and stability --- # Grading scheme - Written exam: 20% - Pratical exam: 20% - Assignments: 30% - Projects: 30% --- # Exercise - Think of tasks that cannot be easily done without a computer - Could be both related and unrelated to what you are studying --- # Some simple numeric examples - Problems involving scalar objects only: - Is a given natural number $n \in \mathbb{N}$ prime? - Given integer $k \geq 0$, compute its factorial $k!$, and $\log k!$ - Given integers $n, k \geq 0$ such that $k \leq n$, compute $n \choose k$ -- - Problems that most likely will need more complicated objects to be solved: - Find all prime numbers less than a given number $N$ - Sort a given collection (vector) of numbers - Produce a random permutation of a given vector of numbers - Given set $S$ and query object $x$, determine whether $x \in S$ (set membership) - We will discuss vectors, but probably not more complicated data structures --- # Some examples of simulation - Simple random walk (+1 or -1 with probability $p$ and $1-p$): - How long does it take to return to zero for the first time? - When was the last return to zero before time $2n$? -- - Toss a coin (with probability of head $p$) until you get $k$ consecutive heads. - Based on observed value, can you test for $p = \frac12$? -- - Given a game of snakes and ladders, how many throws of the dice does it take to reach the end? -- - Shuffle a deck of cards. - How can we probabilistically model a shuffle? - How many times do we need to shuffle to make the deck approximately random? - How can we "test" for randomness? --- # Some generally important problems $\newcommand{\bs}{\boldsymbol}$ - Given a function $f$, solve for $f(x) = 0$, e.g., - solve non-linear equations like $e^x + \sin x = 0$ - solve linear equations $A \bs{x} = \bs{b}$ for vector $\bs{x}$ (e.g., as part of fitting linear models) -- - Optimization: given a function $f$, find $\bs{x}$ where $f(\bs{x})$ is minimized - Sometimes this can be done by solving $\nabla f(\bs{x}) = \bs{0}$ - Other solutions may be more practical depending on context --- # Algorithms - We will spend a lot of time discussing algorithms - An algorithm is essentially a set of instructions to solve a problem - Algorithms usually require some inputs - Instructions are executed sequentially, finally resulting in an output (also called _return value_) - You can think of an algorithm as a recipe (inputs: ingredients, output: food!) --- layout: true # Example: is a given number $n$ prime? --- - Basic idea: see if $n$ is divisible by any number between $2$ and $n-1$ - Obviously, enough to check whether $n$ is divisible by any number between $2$ and $\sqrt{n}$ - Intuitively, the second approach is more "efficient" - Also, we can stop as soon as we find the first divisor --- - Simple algorithms are often easy to understand as a _flowchart_  --- - But we will usually write algorithms in the form of _pseudo-code_ as follows: .algorithm[ .name[`is\_prime(n)`] i := 2 __while__ (i $\leq$ sqrt(n)) { __if__ (n mod i == 0) { __return__ FALSE } i := i + 1 } __return__ TRUE ] - Here we skip checking whether $n > 1$ (and that it is an integer) --- layout: true # How to interpret an algorithm? --- - The meaning of this algorithm / pseudo-code should be more or less obvious - Assumes availability of certain basic operators / functions (mod, sqrt) - We often employ some _conventions_ and use some _structures_ in pseudo-code - For example, .algorithm[ .name[`is\_prime(n)`] i := 2 // variable assignment __while__ (i $\leq$ sqrt(n)) { // loop while condition holds __if__ (n mod i == 0) { // branch if condition holds __return__ FALSE // exits with output value } // end of blocks within loops, branches, etc. i := i + 1 // update variable value } __return__ TRUE ] --- - It is important to make sure that an algorithm makes sense - Steps are executed sequentially, so the sequence must be clear - It must be possible to evaluate each step - All variables used must have been defined in a previous step - It is OK to call other functions (or algorithms), but they must be clearly defined - It is even OK for an algorithm to call itself (this is known as _recursion_) --- layout: true # Pseudo-code --- - The general structure of algorithms is derived from a language called [ALGOL](https://en.wikipedia.org/wiki/ALGOL) - However, there are no fixed rules that pseudo-code must follow - An alternative form of our `is_prime` algorithm could be: .algorithm[ .name[`is\_prime(n)`] i = 2 // different assignment operator __while__ i $\leq$ sqrt(n) // end of loop indicated by indentation __if__ n mod i == 0 __return__ FALSE i = i + 1 __return__ TRUE ] --- - Another form could be: .algorithm[ .name[`is\_prime(n)`] i $\leftarrow$ 2 // yet another assignment operator __while__ i $\leq$ sqrt(n) // end of loop indicated by __end__ keyword __if__ n mod i == 0 __return__ FALSE __end__ i $\leftarrow$ i + 1 __end__ __return__ TRUE ] - Any of these forms are fine as long as - the steps of the algorithm are clearly specified - the essential ideas are expressed without ambiguity --- layout: false # Theoretical questions about algorithms * __Is an algorithm correct?__ To be correct, an algorithm must - stop after a finite number of steps, and - produce the _correct output_ for _all possible inputs_ (i.e., all _instances_ of the problem). * __How efficient is the algorithm?__ - What resources does the algorithm need to run, typically in terms of time and storage? - How does it compare with other algorithms for the same problem? -- * To answer such questions, we need a model for computation --- # Ingredients of a computational model * There are actually many different approaches to programming * We will mostly consider [structured programming](https://en.wikipedia.org/wiki/Structured_programming) * Characterized by use of various control flow constructs (if, then, while, for, etc.) and [block structures](https://en.wikipedia.org/wiki/Code_block) -- * More specifically, we will focus of [procedural programming](https://en.wikipedia.org/wiki/Procedural_programming) * Characterized by use of modular procedures (usually called functions) * We will be mainly interested in procedures that perform some computations * Most algorithms we discuss will directly correspond to procedures or functions when implemented -- * We will not discuss other kinds of programs (e.g., operating system, web browser, editor, etc.) --- class: center middle # Questions? --- # Exercises 1. Write an algorithm to compute $k!$ given $k$ 2. Write an algorithm to compute $\log k!$ given $k$ 3. Write an algorithm to compute $n \choose k$ given $n, k$ --- layout: true # Functions and control flow structures --- * The main components of our programs are going to be functions * Functions usually - have one or more input arguments, - perform some computations, possibly calling other functions, and - return one or more output values. * The second step is the main contribution of a function -- * Usually a programming language will already have many built-in functions * These can be called by other functions * Knowing what is available is an essential part of "learning" a language --- * The standard model for performing computations is __sequential execution__ * In other words, a function executes a set of instructions in a specified sequence * Some control flow structures may be used to create branches or loops in the flow of execution --- * Briefly, the main ingredients used are - Declaration of variables (implicit in some languages) - Evaluation of expressions. _Can involve variables provided they have been defined in an earlier step_ - Assignment to variables (to store intermediate results for later use) - Logical tests (equal?, less than?, greater than?, is more input available?) - Logical operations (AND, OR, NOT, XOR) - Branching - take different paths based on result of a logical operation (if-then-else) - Loops - repeat sequence of steps, a fixed number of times, or while a condition holds (for / while) -- * The details of how variables store values, and who can access them (scope) are important * But we will not worry about these issues for now --- layout: false # Common operators (may have language-specific variants) - _Mathematical operators_: - `+` (addition) - `*` (multiplication) - `/` (division --- possibly integer division) - `^` (power) - `%` (the modulo operation) - _Logical operators_: - `&` (AND) - `|` (OR) - `!` (NOT) - _Comparisons_: - `==` (equality) - `!=` ($\neq$) - `<`, `>` (strictly less than or greater than) - `<=` `>=` ($\leq$, $\geq$) - _Mathematical functions_: `round, floor, ceil, abs, sqrt, exp, log, sin, cos, ...` --- # Practical implementation: programming languages * The algorithms we discuss can be implemented in many programming languages * Some standard languages suitable for structured programming are - [C](https://en.wikipedia.org/wiki/C_%28programming_language%29) (compiled) - [C++](https://en.wikipedia.org/wiki/C_%28programming_language%29) (compiled) - [R](https://en.wikipedia.org/wiki/R_%28programming_language%29) (interpreted) - [Python](https://en.wikipedia.org/wiki/Python_%28programming_language%29) (interpreted) - [Julia](https://en.wikipedia.org/wiki/Julia_%28programming_language%29) (interpreted) * There are also many others with various relative strengths and weaknesses * In this course, we will mainly focus on - __R__ because it already has an extensive collection of statistical tools that we can use - __C__ / __C++__ because it is easy to call C / C++ code from R (useful when R code is inefficient) --- # Example: The `is_prime` algorithm in various languages * Recall the `is_prime` algorithm to determine if a number is prime * With slight modification to use only integer arithmetic .algorithm[ .name[`is\_prime(n)`] i := 2 __while__ (i * i $\leq$ n) { __if__ (n mod i == 0) { __return__ FALSE } i := i + 1 } __return__ TRUE ] --- # Example: The `is_prime` algorithm in various languages * Implemented in C, the algorithm would look like this: ```c int is_prime_c(int n) { int i = 2; while (i * i <= n) { if (n % i == 0) { return 0; } i = i + 1; } return 1; } ``` * C is a compiled language, so actually running this code involves some additional work * Note that all variable _types_ need to be explicitly declared * This includes the types of function arguments (inputs) and return value (output) --- # Example: The `is_prime` algorithm in various languages * The same algorithm would look like this in R: ```r is_prime_r <- function(n) { i <- 2 while (i * i <= n) { if (n %% i == 0) { return (FALSE) } i <- i + 1 } return (TRUE) } ``` * The basic structure is very similar, but with some differences: - The assignment operator is different (but `=` also works in R) - The function declaration looks like a variable assignment - The modulo operator is `%%` instead of `%` - Uses `TRUE` and `FALSE` instead of `1` and `0` for logical values - Statements do not end with a semicolon (although they could) - Variable types are not declared - The return value must be put in parentheses --- # Example: The `is_prime` algorithm in various languages * We can call this function after starting R and copy-pasting the function definition ```r is_prime_r(4) ``` ``` [1] FALSE ``` ```r is_prime_r(10) ``` ``` [1] FALSE ``` ```r is_prime_r(100) ``` ``` [1] FALSE ``` ```r is_prime_r(101) ``` ``` [1] TRUE ``` --- # Example: The `is_prime` algorithm in various languages * The implementation looks a little different in Python: ```python def is_prime_py(n): i = 2 while i * i <= n: if n % i == 0: return 0; i = i + 1 return 1 ``` * The main difference is in how code blocks are defined: - start with a colon (`:`) - end is defined by indentation (amount of space in the beginning) * Changing indentation will change meaning of code, which does not happen in C or R * However, code in all languages _should be indented properly for readability_ --- # Example: The `is_prime` algorithm in various languages * Again, we can start python, define the function, and run the following code ```python print(is_prime_py(4)) ``` ``` 0 ``` ```python print(is_prime_py(10)) ``` ``` 0 ``` ```python print(is_prime_py(100)) ``` ``` 0 ``` ```python print(is_prime_py(101)) ``` ``` 1 ``` --- layout: true # How can we run C / C++ code? --- * The code needs to be "compiled" before it is run * It also needs a `main()` function to be defined * `main()` is run first when the program is executed --- ```c #include
#include
int is_prime_c(int n) { int i = 2; while (i * i <= n) { if (n % i == 0) { return 0; } i = i + 1; } return 1; } int main(int argc, char *argv[]) { int i, n; if (argc > 1) { /* one or more arguments supplied */ for (i = 1; i < argc; i++) { n = atoi(argv[i]); /* converts string to integer */ printf("%d -> %d\n", n, is_prime_c(n)); } } else printf("Usage: %s
...\n", argv[0]); return 0; } ``` --- layout: false # Compiled code vs interpreted code * R, Python, etc., are "interpreted" languages that read and evaluate code interactively * Compiled code is usually (but not always) much faster than interpreters --- # Plan * First semester: We will focus on learning R * Second semester: * Algorithms * Using other languages * Projects --- class: center middle # Questions?