# Algorithms, Math and Python

*New Beginnings:* Week 2

*Lecture Notes by Bart Massey*

Copyright © 2014 Bart Massey

- 2014-07-07: Numbers; More Python Programming
- 2014-07-08: Proof, Recursion and Sequences
- 2014-07-09: Introduction To Algorithms; Trees and Graphs
- 2014-07-10: Implementing Algorithms in Python

## 2014-07-07: Numbers; More Python Programming

### Number Representation

History: tally, tally groups, positional

Positional number systems:

- Define
*x*,^{0}= 1*x*^{y}= x x x x [y times] Each "digit" is the count of groups of

*b*next-smaller groupsSo: 123

- base 10: 1 10
^{2}+ 2 10^{1}+ 3 10^{0} - base 2: 1 2
^{2}+ 0 2^{1}+ 1 2^{0} - base 16: 1 16
^{2}+ 2 16^{1}+ 3 16^{0}- Note that we don't have enough digits: use 0-9 and a-f

- base 10: 1 10

- Define

### More About Modulo Arithmetic

Define

*r = a mod b*such that*q = a div b*,*q b + r = a*- Negative numbers are confusing

Some laws we can verify:

*0 <= a mod c < c**(a + b) mod c = (a mod c + b mod c) mod c**(a b) mod c = ((a mod c) (b mod c) mod c)*

Useful for "wraparound" calculations

Algebraically "nice" when modulus is prime

### Base Conversions

2, 16 <-> 10: "do the math"

2 <-> 16: grouping is sufficient

### Why Base 2?

Little switches inside computer again

- on-off / 1-0 / +5V-0V / open-closed

Simplest possible positional system

- Nicely hierarchical

Easy to do arithmetic

### Looping

Conditionals are "forward branching"; looping is "backward branching"

Repeat the same computation over and over

- But with different data
- This is why computers have to be fast

### The `while`

Loop

Repeats a sequence of statements until a condition is false

Condition is tested at the

*start*of each "iteration"Allows for "looping 0 times"

`i = 0 while i < 10: print(i) i = i + 1 # should be i += 1`

### The `for`

Loop

The pattern of the previous problem is very common:

- Repeat a computation with some "index variable" set to successive values (usually increases by 1)

Python provides special syntax for this:

`for i in range(10): print(i) i = 0 while i < len(range(10)): print(range(10)[i]) i += 1`

Makes program easier to read

Makes program easier to write:

- Can't forget to initialize index variable
- Can't forget to increase index variable

### The `range()`

Function

The

`range()`

function is not a special caseTakes an ending value

*n*and returns the sequence*0,1,2...n-1*- Note the
*0*and the*n - 1*! - The sequence is in a weird form in Python 3.3 called a "generator"
You can make the sequence into a list with the

`list()`

function`>>> list(range(3)) [0, 1, 2]`

- Note the
The

`for`

statement can loop over any collection (e.g. sequence, set)You can give more arguments to

`range()`

to get different sequences`>>> list(range(1, 12, 5)) [1, 6, 11] >>> list(range(9, -1, -1)) [9, 8, 7, 6, 5, 4, 3, 2, 1, 0]`

### Exercise: Sum of numbers from 1 to n

### Early Termination: `break`

Sometimes your program should get out of the loop before its scheduled exit

One can let the loop run its course, but this is tedious and inefficient

e.g. "Find the first number in the list greater than 10"

`first_big = None for i in my_list: if first_big == None and i > 10: first_big = i print(first_big)`

This has all kinds of drawbacks

The

`break`

statement cleans this up a bit`first_big = None for i in my_list: if i > 10: first_big = i break print(first_big)`

The

`break`

statement is generally used inside an`if`

. It breaks out of the nearest enclosing`for`

or`while`

.

### Early Looping: `continue`

The

`continue`

statement is like the`break`

statement except that it*starts the next iteration*of the nearest enclosing`for`

or`while`

loop`for i in my_list: if i == 0: print("zero in list") continue print(1 / i)`

### Function Definitions

All those Python built-in functions are very nice, but sometimes you need something they didn't provide

You can define your own functions with the

`def`

statement.Your function can use the

`return`

statement to return a value.- The
`return`

statement is like`break`

: it immediately exits the nearest enclosing function to return the result Normally

`return`

is at the end of the function, but not always`def fact(n): result = 1 for i in range(2, n + 1): result *= i return result`

- The

### Formal and Actual Parameters

The

*n*in the previous example is an example of a "formal parameter" or "argument". You can have your function require as many formals as you want. Order matters hereWhen you call your function, you need to pass a value ("actual parameter") for each formal the function requires

`>>> fact(3 + 2) 120`

Note that each actual will be evaluated to produce a value before the function is called

### Local and Global Variables

A formal is actually a variable: you can store in it

- If you store into a formal, it will only change that storage location, not any other in the program
- Storing into formals is generally considered bad style: hard to read and understand

Any new variable you assign to inside your definition is also "local": it has no relationship to any variable outside the function, even if it has the same name

`x = 3 def f(): x = "hello" print(x) f() print(x)`

If you want to refer to a variable defined outside your function while inside, use the

`global`

statement`x = 3 def f(): global x x = "hello" print(x) f() print(x)`

### Globals Considered Somewhat Harmful

So your function can:

- Refer to data passed as a parameter or available in globals
- Return a result or store into a global

Generally, it is a good idea to use parameters and results rather than global variables to communicate between a function and its caller

If the function is working with data that is part of the well-managed global state of the program, it

*may*be ok to just use`global`

statements- Many (most?) instructors tell beginning programmers to
*never*let functions access global variables - Some go farther: "never
*have*global variables"

- Many (most?) instructors tell beginning programmers to

### Procedures

A function that "doesn't return a result" by using the

`return`

statement is sometimes referred to as a "procedure"- In Python, if you fall off the end of a function, it
is considered to have returned
`None`

- In Python, if you fall off the end of a function, it
is considered to have returned
Procedures are useful for their "side effects": things they do besides compute a value

In general, you should be a bit suspicious of procedures

- Pure output is the most common use of procedures

### Exercise: Define Some Functions

*squaring function**Gauss's sum**Euclidean distance*(four parameters)

## 2014-07-08: Recursion; Formal Proof

### More About Dictionaries

Dictionary: think of it as a function from a "key" to a "value"

- Function: one particular value for each key

Creating a dictionary:

`{1 : 2, "hello" : "world", 12 : "goodbye"} {}`

Indexed and assigned like lists

Can loop over keys with for

Can test presence of a key with in

### More About Lists

Slicing:

`[1, 2, 3][0:2] == [1, 2, 3][:2] == [1, 2] [1, 2, 3][1:] == [1, 2, 3][1:len([1,2,3])+1] == [2, 3] [1, 2, 3][::-1] == [3, 2, 1]`

Append:

`[1, 2] + [3] == [1, 2, 3] l = [] for i in range(0, 5): l += [i]`

### Recursion

Functions are tricky.

In particular, suppose I define

*f(x) = f(x) + 1*- This definition doesn't look sensible; one cannot
find an
*f*that meets this property for*any*x

- This definition doesn't look sensible; one cannot
find an
But recursion is not always a problem:

*f(x) = x f(x)*- The function
*f(x) = 0*satisfies this definition

- The function
We call functions defined in terms of themselves "recursive"

### Base and Recursive Cases

Usually, a well-defined recursive function

*f*will have at least one input for which*f*is not defined in terms of itselfF(0) = 1 F(n) = n * F(n - 1) [n > 0]

We call these the "base case" and "recursive case" of the definition

It is typical that the base case will correspond to a small input, and the recursive case will define larger inputs in terms of smaller ones

G(0) = 1 G(1) = 1 G(n + 2) = G(n + 1) + G(n) [n >= 0]

### Exercise: Recursion In Python

Both

*F()*and*G()*of the previous examples are implementable in PythonCan you say anything about the functions by studying the behavior of their implementations?

### Exercise: Collatz Conjecture

Consider the function

```
f(1, n) = n
f(x, n) = f(x / 2, n + 1) [x > 1 and even]
= f(3 x + 1, n + 1) [x > 1 and odd]
C(x) = f(x, 0)
```

Note that *n* is used as a counter of the number of steps
needed to compute *f*.

The Collatz Conjecture is that the range of the function C is finite. Implement C and try it for various inputs.

### Recursion As A Power Tool

The beauty of recursion is that you can think of it as defining the behavior of a function on "more complicated" (larger) inputs in terms of the behavior on "less complicated" (smaller) inputs

This is a way to break down problems

### Iteration Is A Special Case

Integer log base 2 can be defined as a recursion

`lg(1) = 0 lg(n) = 1 + lg(n div 2)`

This recursive formulation is pretty nice. One can remove the recursion by replacing it with an iteration

`To compute lg(n): a <- 1 while n > 1 n <- n div 2 a <- a + 1 return a`

The iterative version may be faster (next slide) but is more complicated and harder to reason about

### How Functions Are Implemented: Activation Records

When you call a Python function, Python has to remember the values of all the local variables, so it can put them back

- "Re-entrancy" is the property that a function can (directly or indirectly) call itself
- Re-entrancy is a feature of all modern PLs

The way that this is implemented is:

- When any function is called, Python creates an
*activation record*with the new variables in it for parameters and locals - When the function call returns, its activation record is discarded

- When any function is called, Python creates an
Creating an activation record can be kind of expensive

- Performance of allocation / deallocation
- Storage for currently active activation records

### What Is A Proof?

A "social construct" (Eugene Luks)

A watertight argument

Made by piling up trivial arguments

- So can be easily checked for correctness

*Not*necessarily "true in the real world"

### Elements Of A Proof

Axioms: Statements that will be assumed to be true

- One important kind of axiom is the "definition" of some term

Inference rules: Ways to get new true statements from old true statements

### "Informal Proof"

Informal proofs are common in CS and even mathematics

Generally, the axioms and inference rules that you have learned over the years in math classes are fair game

Leaves a lot to be desired (c.f. next week)

### "Lemma": Divisibility

If a number *d* greater than 1 divides *n*, it cannot divide *n + 1*

Proof:

- Assume that
*n rem d = 0*, that*d >= 2*and that*(n + 1) rem d = 0* *(n + 1) rem d*=

*n rem d + 1 rem d*=*1 rem d*= 1

- Assume that
This contradicts our original assumptions. The only assumption that we can change is the third

Thus, the original statement must be true ("Q.E.D.")

### Example: There is no largest prime

(Due to Georgia Reh)

First, take as an axiom the definition of

*prime number*- 2 is a prime
- Any natural number that is not divisible by any smaller prime is a prime

Next define

*composite number*as any natural number greater than 2 that is not primeAssume that some number

*p*is the largest primeLet

*q*be the product of all the numbers from 2 to*p*By our Lemma, none of the numbers from 2 to

*p*can divide*q + 1*Thus

*q + 1*is prime, and larger than*p*

### That Proof Was Kind Of Borked

Yeah, need an additional assumption: that every number can be uniquely factored

This is the danger of informal proofs

This proof can be made right, but not all borked proofs can

### Some Proof Techniques

Direct proof: Usually by instantiation

- e.g. if we know that for all natural numbers
*n*we have*2n >= n*, we can conclude*4 >= 2*

- e.g. if we know that for all natural numbers
Contradiction: What we just saw. Assume the opposite of what we want to prove, and show something that cannot be true

Induction: Use the power of recursion to show that successively smaller and larger cases are true

### Inductive Proof

- Induction looks a lot like recursion
- Show that the proposition is true for some simple "base case"
- Assuming that the proposition is true for some generic case, show that it must also be true for a "more complicated" case

### Example: Gauss's Sum

One of the most common formulae in CS

*sum(1..n) = n (n + 1) / 2*Direct Proof: (Gauss at age 11 or somesuch)

- Add sum(1..n) to sum(n..1)
- Result must be
*(n + 1)*repeated*n*times - Thus, this sum must be
*n (n + 1)* - But this sum also must be
*sum(1..n) + sum(1..n) = 2 sum(1..n)* - Thus
*sum(1..n) = n (n + 1) / 2*

Inductive Proof:

- Base case: When
*n = 1*, we have*sum(1..n) = sum(1..1) = 1* - Inductive case:
- Assume
*sum(1..n-1) = (n - 1) ((n - 1) + 1) / 2 = n (n - 1) / 2* - This means that
sum(1..n) =

sum(1..n-1) + n =

n (n - 1) / 2 + n =

(n^{2}- n + 2n) / 2 =

(n^{2}+ n) / 2 =

n (n + 1) / 2

- Assume

- Base case: When

### Exercise: Sum Of Odd Numbers

- Prove that
*sum(1, 3, 5 ... (2n-1)) = n*^{2}

## 2014-07-09: Introduction To Algorithms; Trees and Graphs

### What Is An Algorithm?

We've spent a lot of time dancing around this. Let's get to it...

An

*algorithm*is instructions for solving any given*instance*of some particular*problem*A

*problem*is a named collection of generic givens and a question to be answered from themAn

*instance*is a particular instantiation of the givens such that the problem can be solved

### Example: Largest Element Of A Set

```
Largest Element of a Set
Given: A set S of integers
Find: The largest element in S
```

OK, this is a simple problem.

A given instance might be something like

*S = {1, 4, 7}*We seek an approach that will yield the largest element of

*S*regardless of how*S*is instantiatedThis problem is a bit ill posed: What about the empty set?

- Change the given to "A finite nonempty set"

### Algorithm: Largest Element Of A Set

An algorithm must be given in terms of fundamental, implementable operations

An algorithm is best described in terms of

*pseudocode*: English-like code that does not refer to a particular programming language- By definition, pseudocode is not a programming language: there is no "correct" syntax, and the operations described may take some effort to implement in yours

Here's an algorithm

`Algorithm: Linear Scan To find the largest element of a set S: let m be any arbitrary element of S for each e in S if e > m m ← e return m`

### Proving Linear Scan "Correct"

Involves the proof techniques discussed yesterday

First, verify that our Linear Scan algorithm always returns a result

- Must, since the for loop is guaranteed to always finish and thus the program must get to the end
- Algorithms that may not return an answer can still be "partially correct"

Next, verify that the answer is always correct

First, show that the result is always <= the maximum element

- Yes, because it is always some element of the set, so it can't be greater than any element in the set

Next, show that the result is always >= the maximum element

- Harder. By contradiction: assume that for some set S, the algorithm returns an element of S smaller than maximum
- Note that each element examined increased the current maximum m to at least its size
- Thus, the result must be some unexamined element
- But the for loop examines
*all*the elements

### Performance Of Linear Scan

We can say things about how fast this algorithm is without ever looking at an implementation

Assume that each step takes a fixed amount of time

Then total running time is

`2 + 2 |S| <= T[S] <= 2 + 3 |S|`

Note that this isn't a terribly interesting difference

- Probably OK to just say that
*T[S]*is linearly proportional to*|S|*

- Probably OK to just say that
This is probably as good as one can do: It's hard to imagine how to find a maximum element of

*S*without examining every element of*S*at least once

### Big-O Notation As A Performance Description

We'll see this again, but here's a preview:

It is reasonable to consider

- Only the worst-case performance of the algorithm on inputs of a given size
- Only the performance of the algorithm as the inputs get arbitrarily large
- Only approximate performance: not worrying about "constant factors"

If this is the plan, let's zero in on it a bit

- Define the input size in terms of the amount of memory required to represent the instance
- Consider only the upper bound on runtime as a function of size
- Throw away additive and multiplicative constants as somewhat irrelevant
- We usually call the input size
*n*and write this approximate function as*O*

For our example, we say that

*T[n]*is in*O(n)*where*n = |S|*

### Algorithm Design

Teaching algorithm design is hard

- Mostly you have to get some experience with ways of thinking about problems

Typical techniques:

- "Brute force": Like our example, just work directly from the definition
- Recursive: Split off the base case and call yourself for the recursive case
- Divide-and-Conquer: A kind of recursion

### Exercises: Algorithm Design

For each of the following

- Formalize problem
- Give pseudocode for an algorithm
- Describe why you think the algorithm is correct
- Estimate the performance of the algorithm

Problems

- Find the minimum element of a set S of integers
- Find a maximum element of a sequence S of integers
- Find the average of a sequence S of integers
- Given sets S and T, print the cross product set S x T
- Given sets S and T, answer true if any element of S divides any element of T, or false otherwise

### Graphs

Unrelated to HS graphs!

Fundamental CS / discrete math structures showing relationships between entities

Consider an arbitrary set V of "vertices" / "nodes" / "bubbles"

It is easy to imagine a set E of "edges" / "arcs" connecting vertices

- E is a subset of V x V
- Self-edges?

Edges can be

*directed*or*undirected*The

*degree*of a vertex is the number of edges connected to it

### More About Graphs

Edges and vertices may be

*labeled*: use a "labeling function"*l : E -> L*or*l : V -> L*There's an obvious graphical representation here

- Directed edges are arrows
- Undirected edges are lines

One can find lots of representations in algorithms / programs

- Vertex + edge set or list
- Adjacency set / list: for each vertex, set of edges / vertices connected to it
- Adjacency matrix: more later

Lots of interesting things to say about these structures!

### Special Case: Linked List

Consider a directed line graph

Each node in the line graph points to its

*successor*This is one natural representation for a sequence

The other natural representation is as an

*array*: successive values are at successive locationsLinked lists vs arrays

- DA: To get to the
*n*-th element, you have to start at the beginning and take*n*steps - DA: Takes twice as much storage: need to store both the element and the edge information
- AD: To insert an element
*e*, you can just change the successor of the element before and set*e*'s successor appropriately

- DA: To get to the

### Special Case: Tree

A tree is a connected undirected graph with no cycles

A rooted tree has one special node designated as the root

- In CS rooted trees are drawn upside-down (why?)

A binary tree has degree at most three

### Other Graph Properties

*Connected:*As mentioned earlier, there is a path from any vertex to any other vertex- More complicated when directed

*Complete:*All vertices are connectedA bajillion other special graphs, each with their own stuff

## 2014-07-10: Implementing Algorithms in Python

### Translating Pseudocode To Python

- This is mostly obvious:
- Give concrete representations to mathematical objects in the pseudocode
- Create auxiliary functions as needed to keep the main code high-level
- Reconsider identifier choices

### Exercise: Voting Machine

Specification

Hardcode 2 or more candidates into program

Each voter must be able to select a candidate

Vote totals for each candidate must be maintained

Vote totals must be reported (but only to authorities)

Mechanism for preventing multiple votes by same person

Each voter must be anonymous (to other voters and officials)

Pseudocode

`set all candidates' vote totals to 0 note that no one has voted yet repeat clear the page prompt for a username u if u is the election official id report vote totals continue if u has already voted report an error continue present choices prompt for a choice increase the vote total of the choice mark u as having voted`

Python

- See
`voting.py`

- See
Verification

- Not performed.

### The `assert`

Statement

Reconsider this algorithm

`To find the earliest most-frequently-occurring element of a nonempty sequence S: m <- 0 d <- new empty dictionary for k in S if k is a key in d d[k] <- d[k] + 1 else d[k] <- 1 if d[k] > m m <- d[k] for k in the keys of d if d[k] = m return k`

Translate into Python

`def earliest_repeated(s): """Find the earliest most-frequently-occurring element of a nonempty list s.""" m = 0 d = {} for k in s: if k in d: d[k] += 1 else: d[k] = 1 if d[k] > m: m = d[k] for k in d: if d[k] == m: return k`

Problem: What if there's a bug in our algorithm? Or what if someone passes an empty list?

- Then the last loop will finish without returning
- Thus the function will return
`None`

- Uggh

Good practice: Have the program crash at that point.

`assert False`

The

`assert`

statement causes your program to crash with a reasonable message unless its condition is trueShould also add an assertion at the beginning that the list is nonempty

`assert s != []`

In general, use assertions liberally

- Anywhere you're not sure things will be right
- Also serve as documentation of how things are supposed to be
- Make debugging much easier

Don't remove assertions gratuitously

### Instrumentation and "`print`

Debugging"

Your programs are starting to be complicated enough to need "debugging"

- Not a part of the "standard engineering process"
- Rather, iteration activity

Debugging reasons backward, from evidence to cause

A good way to have evidence is to

*instrument*your program- Have it print information as it runs so that you can see what is happening

Probably want to have those prints controlled by a global variable so that you can turn them on only when wanted

`# Turn on for instrumentation. debug = True def earliest_repeated(s): """Find the earliest most-frequently-occurring element of a nonempty list s.""" global debug assert s != [] m = 0 d = {} for k in s: if debug: print("Considering", k) if k in d: d[k] += 1 if debug: print("Increased count to", d[k]) else: d[k] = 1 if debug: print("Initialized count") if d[k] > m: if debug: print("Found new maximum", m) m = d[k] for k in d: if d[k] == m: if debug: print("Returning", k) return k assert False`

Obviously, these crud up your program; open question how much you should remove

### Running Programs: The Linux Command Line

So far, most people have mostly been working in IDLE

The Linux command line provides quite a bit of power, so worth getting used to

Short version: Run program with

`$ python3 myprog.py`

I/O:

- Program reads user input from "standard input" (default terminal)
- Program writes user output to "standard output" (default terminal)
- Program also has available a "standard error" channel for writing output not intended to be processed

To provide input from a file, use the "<" command-line operator

`$ python3 myprog.py <myinput.txt`

To store output in a file, use the ">" operator

`$ python3 myprog.py <myinput.txt >results.txt`

Python's stupid

`input()`

prompts are to standard outputThese things are available from Python via the

`sys`

module:`from sys import stderr print("oops", file=stderr)`

### Using `read()`

and `readline()`

`.readline()`

reads a line (like`input()`

)- Includes trailing newline
- Looping over a file

`.read()`

reads a whole file (with newlines)- Fairly viable in 2014

All of this works with

`stdin`

### Stripping Trailing Newlines

You can apply

`.rstrip('\n')`

to a string to remove any trailing newlineA note on dot

- Think of
`x.y()`

as`y(x)`

- You can string dots together; they left-associate
- Style is object-oriented programming; soon

- Think of

### Exercise: Command-Line Programming

- Write a program that counts the number of characters on its input and prints the count as its output. Include newline characters etc in the count.

### Enhancing Your Instrumentation: A `note`

Function

Let's implement a function that does a couple of nice instrumentation things

- Prints your debugging messages to
`stderr`

- Prints them only when debugging is turned on

- Prints your debugging messages to
To do this, we're going to have to explore Python quite a bit

This is the kind of process I go through every week

### Keyword Arguments

Note that

`print()`

takes "keyword arguments" where the actual is passed by formal nameKeyword arguments can help with getting arguments in the right order

### Default Values

Arguments can be given default values

`def mystery(x = 5): print(x)`

Defaulted arguments must be at the end of the formals

### Variable Arguments

Let us figure out how to implement Python functions that take a variable number of arguments, like

`print()`

Python manual 6.3.4 https://docs.python.org/3/reference/expressions.html#calls

`from sys import stderr def note(*args, **kwargs): print(*args, **kwargs) note("hello", 5, file=stderr)`

Now, let's make the default be

`stderr`

`from sys import stdout, stderr def note(*args, **kwargs): if "file" not in kwargs: kwargs["file"] = stderr print(*args, **kwargs) note("hello", 5) note("hello", 6, file=stdout)`

Now, let's make it all controlled by a debug variable

`from sys import stdout, stderr debug = False def note(*args, **kwargs): if not debug: return if "file" not in kwargs: kwargs["file"] = stderr print(*args, **kwargs) note("hello", 5) note("hello", 6, file = stdout)`

### Implementing A Python Module

We would like to make our new

`note`

function available for future programsJust stick it in a file named

`noting.py`

and put it where Python can find it- Current directory
- System standard place
The

`PYTHONPATH`

"environment variable"`$ PYTHONPATH=/tmp $ export PYTHONPATH`

Now can import it just like anything else

`from noting import * debug = True note("noting something")`

However, Python module variables don't work right, so we'll have to fix our interface

`from noting import * import noting noting.debug = True note("noting something")`

### Cleaning Up Our Code

```
from noting import *
noting.debug = True
def earliest_repeated(s):
"""Find the earliest most-frequently-occurring
element of a nonempty list s."""
assert s != []
m = 0
d = {}
for k in s:
note("Considering", k)
if k in d:
d[k] += 1
note("Increased count to", d[k])
else:
d[k] = 1
note("Initialized count")
if d[k] > m:
note("Found new maximum", m)
m = d[k]
for k in d:
if d[k] == m:
note("Returning", k)
return k
assert False
```

### Debugging

From: *Debugging Your Programs*, Bart Massey 2013-01-24

### What is debugging?

During/after coding, before/during/after testing

Bring the program to a state where it appears to be bug-free (but this is a lie)

Estimate 20-40% of programming effort

"Secret": No good books, no chapter in our book, nothin'

### How to debug

Given a failure of the software:

- Find the causes ("faults") leading to that failure
- Find the
*root causes*of those faults - Figure out and apply a
*repair* - Check the repair
- Does it fix the failures?
- Does it cause new failures?

### Key activity: diagnosis

Like in medicine or car repair: "It doesn't work; what's happening and what can be done?"

Diagnosis is hypothesis formation and testing

- What possible reasons
*might*there be for observed symptoms? - Can those reasons be ruled out by what is known so far?
- If not, can we do tests to rule each reason out or increase our belief that it is the correct one?
- Repeat until exactly one possible reason remains, and it looks really likely to be true.

- What possible reasons

### Common bugs

Two basic kinds:

- Bad control flow
- Just plain calculating the wrong thing

Examples

- Off-by-one "fencepost" errors
- Copy-and-paste calculation errors
- Typos/"Thinkos"
- Failure to design to the spec
- Failure to understand/implement the design

### Root Cause Analysis

It's not enough to find the line of code that "causes the bug"

You want to find out how that line got there

In software, faults are caused by mistakes ("errors") that were made by a human (usually you)

With the "root causes" found, you can:

- Correct
*all*the faults caused by that error - Take steps to make that error less likely in the future

- Correct

### Preparing code for debugging

"Real programmers don't comment. It was hard to write--it should be hard to read and harder to understand."

Code should have a spec, simple tests, and pseudocode

Formatting should be as clean as possible

- Consistent indentation
- Consistent liberal use of whitespace
- Good names
- Idiomatic

Code should be instrumented appropriately

### Debugging pre-inspection

Read the code in question carefully. Look for things that are wrong or unclear

Explain the code to someone. Have them look at it too

Most bugs are easily found and fixed by inspection alone

### Debugging tools: your brain

Are you sure the spec and tests are correct?

White-box: what kinds of similar inputs might produce the same program misbehavior?

Black-box: what properties distinguish misbehaving inputs?

Is the timing as expected?

Are your current hypotheses consistent with

*everything*you have observed or can observe?

### Debugging tools: print() function

For a specific hypothesis, stick a print() in that will either disconfirm or confirm the hypothesis

- Works in a huge variety of situations
- But don't spam instrumentation everywhere, or you will get confused by it

Can use print() for exploring program behavior ("tracing"), but beware: one can waste a lot of time doing this without learning anything.

Always best to know what the question is before you start looking for the answer

### Debugging tools: "debugger"

Idle will happily provide you the ability to

- Step through your program one statement at a time
- Run until a given program line is reached
- Examine/change variable values anytime stopped

Except the debugger is really fragile and hard to use

In particular, doesn't interact well with input()

In general, debugger is tool of last resort

### Post-diagnosis

Once you've found the immediate source of a bug, do RCA

Look for other places where faults may have been inserted due to the same root causes

Think hard about how those faults got there. What are you going to do to avoid this in the future?

Craft fixes that fix the faults properly

- This may involve changing the design or revising the specification

Apply the fixes, then test everything carefully

- Did the problem get fixed?
- Are there new problems?

### Backups, versions and source code management

It is really easy to get the buggy version and the fixed version and the version you are working on right now mixed up

Tool called source code management system helps here

It is probably a mistake to have too many backup files around; in any case, use a consistent clear naming scheme for backup files

### Parting thoughts

Don't get stuck!

- Interrupt yourself every few minutes and see if you're making real progress
- If you are stuck, many strategies are available:
- Try a different approach
- Take a break
- Ask for help

Don't get discouraged

The most experienced programmers still make a lot of bugs and have a hard time fixing them

The bugs you will make are all fixable

### Source Code Management: Git

Idea: Keep track of changes to your program as you go

Bit of history...

Let's work through the official tutorial

http://git-scm.com/docs/gittutorial