# Algorithms, Math and Python

New Beginnings: Week 2
Lecture Notes by Bart Massey

## 2014-07-07: Numbers; More Python Programming

### Number Representation

• History: tally, tally groups, positional

• Positional number systems:

• Define x0 = 1, xy = x x x x [y times]
• Each "digit" is the count of groups of b next-smaller groups

• So: 123

• base 10: 1 102 + 2 101 + 3 100
• base 2: 1 22 + 0 21 + 1 20
• base 16: 1 162 + 2 161 + 3 160
• Note that we don't have enough digits: use 0-9 and a-f

• Define r = a mod b such that q = a div b, q b + r = a

• Negative numbers are confusing
• Some laws we can verify:

• 0 <= a mod c < c
• (a + b) mod c = (a mod c + b mod c) mod c
• (a b) mod c = ((a mod c) (b mod c) mod c)
• Useful for "wraparound" calculations

• Algebraically "nice" when modulus is prime

### Base Conversions

• 2, 16 <-> 10: "do the math"

• 2 <-> 16: grouping is sufficient

### Why Base 2?

• Little switches inside computer again

• on-off / 1-0 / +5V-0V / open-closed
• Simplest possible positional system

• Nicely hierarchical
• Easy to do arithmetic

### Looping

• Conditionals are "forward branching"; looping is "backward branching"

• Repeat the same computation over and over

• But with different data
• This is why computers have to be fast

### The `while` Loop

• Repeats a sequence of statements until a condition is false

• Condition is tested at the start of each "iteration"

• Allows for "looping 0 times"

``````    i = 0
while i < 10:
print(i)
i = i + 1   # should be i += 1
``````

### The `for` Loop

• The pattern of the previous problem is very common:

• Repeat a computation with some "index variable" set to successive values (usually increases by 1)
• Python provides special syntax for this:

``````      for i in range(10):
print(i)

i = 0
while i < len(range(10)):
print(range(10)[i])
i += 1
``````
• Makes program easier to read

• Makes program easier to write:

• Can't forget to initialize index variable
• Can't forget to increase index variable

### The `range()` Function

• The `range()` function is not a special case

• Takes an ending value n and returns the sequence 0,1,2...n-1

• Note the 0 and the n - 1!
• The sequence is in a weird form in Python 3.3 called a "generator"
• You can make the sequence into a list with the `list()` function

``````    >>> list(range(3))
[0, 1, 2]
``````
• The `for` statement can loop over any collection (e.g. sequence, set)

• You can give more arguments to `range()` to get different sequences

``````      >>> list(range(1, 12, 5))
[1, 6, 11]

>>> list(range(9, -1, -1))
[9, 8, 7, 6, 5, 4, 3, 2, 1, 0]
``````

### Early Termination: `break`

• Sometimes your program should get out of the loop before its scheduled exit

• One can let the loop run its course, but this is tedious and inefficient

• e.g. "Find the first number in the list greater than 10"

``````    first_big = None
for i in my_list:
if first_big == None and i > 10:
first_big = i
print(first_big)
``````
• This has all kinds of drawbacks

• The `break` statement cleans this up a bit

``````      first_big = None
for i in my_list:
if i > 10:
first_big = i
break
print(first_big)
``````
• The `break` statement is generally used inside an `if`. It breaks out of the nearest enclosing `for` or `while`.

### Early Looping: `continue`

• The `continue` statement is like the `break` statement except that it starts the next iteration of the nearest enclosing `for` or `while` loop

``````      for i in my_list:
if i == 0:
print("zero in list")
continue
print(1 / i)
``````

### Function Definitions

• All those Python built-in functions are very nice, but sometimes you need something they didn't provide

• You can define your own functions with the `def` statement.

• Your function can use the `return` statement to return a value.

• The `return` statement is like `break`: it immediately exits the nearest enclosing function to return the result
• Normally `return` is at the end of the function, but not always

``````    def fact(n):
result = 1
for i in range(2, n + 1):
result *= i
return result
``````

### Formal and Actual Parameters

• The n in the previous example is an example of a "formal parameter" or "argument". You can have your function require as many formals as you want. Order matters here

• When you call your function, you need to pass a value ("actual parameter") for each formal the function requires

``````      >>> fact(3 + 2)
120
``````
• Note that each actual will be evaluated to produce a value before the function is called

### Local and Global Variables

• A formal is actually a variable: you can store in it

• If you store into a formal, it will only change that storage location, not any other in the program
• Storing into formals is generally considered bad style: hard to read and understand
• Any new variable you assign to inside your definition is also "local": it has no relationship to any variable outside the function, even if it has the same name

``````      x = 3

def f():
x = "hello"
print(x)

f()
print(x)
``````
• If you want to refer to a variable defined outside your function while inside, use the `global` statement

``````      x = 3

def f():
global x
x = "hello"
print(x)

f()
print(x)
``````

### Globals Considered Somewhat Harmful

• Refer to data passed as a parameter or available in globals
• Return a result or store into a global
• Generally, it is a good idea to use parameters and results rather than global variables to communicate between a function and its caller

• If the function is working with data that is part of the well-managed global state of the program, it may be ok to just use `global` statements

• Many (most?) instructors tell beginning programmers to never let functions access global variables
• Some go farther: "never have global variables"

### Procedures

• A function that "doesn't return a result" by using the `return` statement is sometimes referred to as a "procedure"

• In Python, if you fall off the end of a function, it is considered to have returned `None`
• Procedures are useful for their "side effects": things they do besides compute a value

• In general, you should be a bit suspicious of procedures

• Pure output is the most common use of procedures

### Exercise: Define Some Functions

• squaring function
• Gauss's sum
• Euclidean distance (four parameters)

## 2014-07-08: Recursion; Formal Proof

• Dictionary: think of it as a function from a "key" to a "value"

• Function: one particular value for each key
• Creating a dictionary:

``````      {1 : 2, "hello" : "world", 12 : "goodbye"}

{}
``````
• Indexed and assigned like lists

• Can loop over keys with for

• Can test presence of a key with in

• Slicing:

``````      [1, 2, 3][0:2] == [1, 2, 3][:2] == [1, 2]
[1, 2, 3][1:] == [1, 2, 3][1:len([1,2,3])+1] == [2, 3]
[1, 2, 3][::-1] == [3, 2, 1]
``````
• Append:

``````      [1, 2] + [3] == [1, 2, 3]

l = []
for i in range(0, 5):
l += [i]
``````

### Recursion

• Functions are tricky.

• In particular, suppose I define f(x) = f(x) + 1

• This definition doesn't look sensible; one cannot find an f that meets this property for any x
• But recursion is not always a problem: f(x) = x f(x)

• The function f(x) = 0 satisfies this definition
• We call functions defined in terms of themselves "recursive"

### Base and Recursive Cases

• Usually, a well-defined recursive function f will have at least one input for which f is not defined in terms of itself

F(0) = 1 F(n) = n * F(n - 1) [n > 0]

• We call these the "base case" and "recursive case" of the definition

• It is typical that the base case will correspond to a small input, and the recursive case will define larger inputs in terms of smaller ones

G(0) = 1 G(1) = 1 G(n + 2) = G(n + 1) + G(n) [n >= 0]

### Exercise: Recursion In Python

• Both F() and G() of the previous examples are implementable in Python

• Can you say anything about the functions by studying the behavior of their implementations?

### Exercise: Collatz Conjecture

Consider the function

``````    f(1, n) = n
f(x, n) = f(x / 2, n + 1)     [x > 1 and even]
= f(3 x + 1, n + 1)   [x > 1 and odd]
C(x)    = f(x, 0)
``````

Note that n is used as a counter of the number of steps needed to compute f.

The Collatz Conjecture is that the range of the function C is finite. Implement C and try it for various inputs.

### Recursion As A Power Tool

• The beauty of recursion is that you can think of it as defining the behavior of a function on "more complicated" (larger) inputs in terms of the behavior on "less complicated" (smaller) inputs

• This is a way to break down problems

### Iteration Is A Special Case

• Integer log base 2 can be defined as a recursion

``````      lg(1) = 0
lg(n) = 1 + lg(n div 2)
``````
• This recursive formulation is pretty nice. One can remove the recursion by replacing it with an iteration

``````      To compute lg(n):
a <- 1
while n > 1
n <- n div 2
a <- a + 1
return a
``````
• The iterative version may be faster (next slide) but is more complicated and harder to reason about

### How Functions Are Implemented: Activation Records

• When you call a Python function, Python has to remember the values of all the local variables, so it can put them back

• "Re-entrancy" is the property that a function can (directly or indirectly) call itself
• Re-entrancy is a feature of all modern PLs
• The way that this is implemented is:

• When any function is called, Python creates an activation record with the new variables in it for parameters and locals
• When the function call returns, its activation record is discarded
• Creating an activation record can be kind of expensive

• Performance of allocation / deallocation
• Storage for currently active activation records

### What Is A Proof?

• A "social construct" (Eugene Luks)

• A watertight argument

• Made by piling up trivial arguments

• So can be easily checked for correctness
• Not necessarily "true in the real world"

### Elements Of A Proof

• Axioms: Statements that will be assumed to be true

• One important kind of axiom is the "definition" of some term
• Inference rules: Ways to get new true statements from old true statements

### "Informal Proof"

• Informal proofs are common in CS and even mathematics

• Generally, the axioms and inference rules that you have learned over the years in math classes are fair game

• Leaves a lot to be desired (c.f. next week)

### "Lemma": Divisibility

If a number d greater than 1 divides n, it cannot divide n + 1

• Proof:

• Assume that n rem d = 0, that d >= 2 and that (n + 1) rem d = 0
• (n + 1) rem d =
n rem d + 1 rem d = 1 rem d = 1
• This contradicts our original assumptions. The only assumption that we can change is the third

• Thus, the original statement must be true ("Q.E.D.")

### Example: There is no largest prime

(Due to Georgia Reh)

• First, take as an axiom the definition of prime number

• 2 is a prime
• Any natural number that is not divisible by any smaller prime is a prime
• Next define composite number as any natural number greater than 2 that is not prime

• Assume that some number p is the largest prime

• Let q be the product of all the numbers from 2 to p

• By our Lemma, none of the numbers from 2 to p can divide q + 1

• Thus q + 1 is prime, and larger than p

### That Proof Was Kind Of Borked

• Yeah, need an additional assumption: that every number can be uniquely factored

• This is the danger of informal proofs

• This proof can be made right, but not all borked proofs can

### Some Proof Techniques

• Direct proof: Usually by instantiation

• e.g. if we know that for all natural numbers n we have 2n >= n, we can conclude 4 >= 2
• Contradiction: What we just saw. Assume the opposite of what we want to prove, and show something that cannot be true

• Induction: Use the power of recursion to show that successively smaller and larger cases are true

### Inductive Proof

• Induction looks a lot like recursion
• Show that the proposition is true for some simple "base case"
• Assuming that the proposition is true for some generic case, show that it must also be true for a "more complicated" case

### Example: Gauss's Sum

• One of the most common formulae in CS

• sum(1..n) = n (n + 1) / 2

• Direct Proof: (Gauss at age 11 or somesuch)

• Result must be (n + 1) repeated n times
• Thus, this sum must be n (n + 1)
• But this sum also must be sum(1..n) + sum(1..n) = 2 sum(1..n)
• Thus sum(1..n) = n (n + 1) / 2
• Inductive Proof:

• Base case: When n = 1, we have sum(1..n) = sum(1..1) = 1
• Inductive case:
• Assume sum(1..n-1) = (n - 1) ((n - 1) + 1) / 2 = n (n - 1) / 2
• This means that

sum(1..n) =
sum(1..n-1) + n =
n (n - 1) / 2 + n =
(n2 - n + 2n) / 2 =
(n2 + n) / 2 =
n (n + 1) / 2

### Exercise: Sum Of Odd Numbers

• Prove that sum(1, 3, 5 ... (2n-1)) = n2

## 2014-07-09: Introduction To Algorithms; Trees and Graphs

### What Is An Algorithm?

We've spent a lot of time dancing around this. Let's get to it...

• An algorithm is instructions for solving any given instance of some particular problem

• A problem is a named collection of generic givens and a question to be answered from them

• An instance is a particular instantiation of the givens such that the problem can be solved

### Example: Largest Element Of A Set

``````        Largest Element of a Set
Given: A set S of integers
Find: The largest element in S
``````
• OK, this is a simple problem.

• A given instance might be something like S = {1, 4, 7}

• We seek an approach that will yield the largest element of S regardless of how S is instantiated

• This problem is a bit ill posed: What about the empty set?

• Change the given to "A finite nonempty set"

### Algorithm: Largest Element Of A Set

• An algorithm must be given in terms of fundamental, implementable operations

• An algorithm is best described in terms of pseudocode: English-like code that does not refer to a particular programming language

• By definition, pseudocode is not a programming language: there is no "correct" syntax, and the operations described may take some effort to implement in yours
• Here's an algorithm

``````      Algorithm: Linear Scan

To find the largest element of a set S:
let m be any arbitrary element of S
for each e in S
if e > m
m ← e
return m
``````

### Proving Linear Scan "Correct"

• Involves the proof techniques discussed yesterday

• First, verify that our Linear Scan algorithm always returns a result

• Must, since the for loop is guaranteed to always finish and thus the program must get to the end
• Algorithms that may not return an answer can still be "partially correct"
• Next, verify that the answer is always correct

• First, show that the result is always <= the maximum element

• Yes, because it is always some element of the set, so it can't be greater than any element in the set
• Next, show that the result is always >= the maximum element

• Harder. By contradiction: assume that for some set S, the algorithm returns an element of S smaller than maximum
• Note that each element examined increased the current maximum m to at least its size
• Thus, the result must be some unexamined element
• But the for loop examines all the elements

### Performance Of Linear Scan

• We can say things about how fast this algorithm is without ever looking at an implementation

• Assume that each step takes a fixed amount of time

• Then total running time is

``````      2 + 2 |S| <= T[S] <= 2 + 3 |S|
``````
• Note that this isn't a terribly interesting difference

• Probably OK to just say that T[S] is linearly proportional to |S|
• This is probably as good as one can do: It's hard to imagine how to find a maximum element of S without examining every element of S at least once

### Big-O Notation As A Performance Description

• We'll see this again, but here's a preview:

• It is reasonable to consider

• Only the worst-case performance of the algorithm on inputs of a given size
• Only the performance of the algorithm as the inputs get arbitrarily large
• Only approximate performance: not worrying about "constant factors"
• If this is the plan, let's zero in on it a bit

• Define the input size in terms of the amount of memory required to represent the instance
• Consider only the upper bound on runtime as a function of size
• Throw away additive and multiplicative constants as somewhat irrelevant
• We usually call the input size n and write this approximate function as O
• For our example, we say that T[n] is in O(n) where n = |S|

### Algorithm Design

• Teaching algorithm design is hard

• Mostly you have to get some experience with ways of thinking about problems
• Typical techniques:

• "Brute force": Like our example, just work directly from the definition
• Recursive: Split off the base case and call yourself for the recursive case
• Divide-and-Conquer: A kind of recursion

### Exercises: Algorithm Design

• For each of the following

• Formalize problem
• Give pseudocode for an algorithm
• Describe why you think the algorithm is correct
• Estimate the performance of the algorithm
• Problems

• Find the minimum element of a set S of integers
• Find a maximum element of a sequence S of integers
• Find the average of a sequence S of integers
• Given sets S and T, print the cross product set S x T
• Given sets S and T, answer true if any element of S divides any element of T, or false otherwise

### Graphs

• Unrelated to HS graphs!

• Fundamental CS / discrete math structures showing relationships between entities

• Consider an arbitrary set V of "vertices" / "nodes" / "bubbles"

• It is easy to imagine a set E of "edges" / "arcs" connecting vertices

• E is a subset of V x V
• Self-edges?
• Edges can be directed or undirected

• The degree of a vertex is the number of edges connected to it

• Edges and vertices may be labeled: use a "labeling function" l : E -> L or l : V -> L

• There's an obvious graphical representation here

• Directed edges are arrows
• Undirected edges are lines
• One can find lots of representations in algorithms / programs

• Vertex + edge set or list
• Adjacency set / list: for each vertex, set of edges / vertices connected to it
• Lots of interesting things to say about these structures!

• Consider a directed line graph

• Each node in the line graph points to its successor

• This is one natural representation for a sequence

• The other natural representation is as an array: successive values are at successive locations

• DA: To get to the n-th element, you have to start at the beginning and take n steps
• DA: Takes twice as much storage: need to store both the element and the edge information
• AD: To insert an element e, you can just change the successor of the element before and set e's successor appropriately

### Special Case: Tree

• A tree is a connected undirected graph with no cycles

• A rooted tree has one special node designated as the root

• In CS rooted trees are drawn upside-down (why?)
• A binary tree has degree at most three

### Other Graph Properties

• Connected: As mentioned earlier, there is a path from any vertex to any other vertex

• More complicated when directed
• Complete: All vertices are connected

• A bajillion other special graphs, each with their own stuff

## 2014-07-10: Implementing Algorithms in Python

### Translating Pseudocode To Python

• This is mostly obvious:
• Give concrete representations to mathematical objects in the pseudocode
• Create auxiliary functions as needed to keep the main code high-level
• Reconsider identifier choices

### Exercise: Voting Machine

• Specification

• Hardcode 2 or more candidates into program

• Each voter must be able to select a candidate

• Vote totals for each candidate must be maintained

• Vote totals must be reported (but only to authorities)

• Mechanism for preventing multiple votes by same person

• Each voter must be anonymous (to other voters and officials)

• Pseudocode

``````      set all candidates' vote totals to 0
note that no one has voted yet
repeat
clear the page
if u is the election official id
report vote totals
continue
report an error
continue
present choices
prompt for a choice
increase the vote total of the choice
mark u as having voted
``````
• Python

• See `voting.py`
• Verification

• Not performed.

### The `assert` Statement

• Reconsider this algorithm

``````      To find the earliest most-frequently-occurring
element of a nonempty sequence S:
m <- 0
d <- new empty dictionary
for k in S
if k is a key in d
d[k] <- d[k] + 1
else
d[k] <- 1
if d[k] > m
m <- d[k]
for k in the keys of d
if d[k] = m
return k
``````
• Translate into Python

``````      def earliest_repeated(s):
"""Find the earliest most-frequently-occurring
element of a nonempty list s."""
m = 0
d = {}
for k in s:
if k in d:
d[k] += 1
else:
d[k] = 1
if d[k] > m:
m = d[k]
for k in d:
if d[k] == m:
return k
``````
• Problem: What if there's a bug in our algorithm? Or what if someone passes an empty list?

• Then the last loop will finish without returning
• Thus the function will return `None`
• Uggh
• Good practice: Have the program crash at that point.

``````        assert False
``````
• The `assert` statement causes your program to crash with a reasonable message unless its condition is true

• Should also add an assertion at the beginning that the list is nonempty

``````        assert s != []
``````
• In general, use assertions liberally

• Anywhere you're not sure things will be right
• Also serve as documentation of how things are supposed to be
• Make debugging much easier
• Don't remove assertions gratuitously

### Instrumentation and "`print` Debugging"

• Your programs are starting to be complicated enough to need "debugging"

• Not a part of the "standard engineering process"
• Rather, iteration activity
• Debugging reasons backward, from evidence to cause

• A good way to have evidence is to instrument your program

• Have it print information as it runs so that you can see what is happening
• Probably want to have those prints controlled by a global variable so that you can turn them on only when wanted

``````      # Turn on for instrumentation.
debug = True

def earliest_repeated(s):
"""Find the earliest most-frequently-occurring
element of a nonempty list s."""
global debug
assert s != []
m = 0
d = {}
for k in s:
if debug:
print("Considering", k)
if k in d:
d[k] += 1
if debug:
print("Increased count to", d[k])
else:
d[k] = 1
if debug:
print("Initialized count")
if d[k] > m:
if debug:
print("Found new maximum", m)
m = d[k]
for k in d:
if d[k] == m:
if debug:
print("Returning", k)
return k
assert False
``````
• Obviously, these crud up your program; open question how much you should remove

### Running Programs: The Linux Command Line

• So far, most people have mostly been working in IDLE

• The Linux command line provides quite a bit of power, so worth getting used to

• Short version: Run program with

``````      \$ python3 myprog.py
``````
• I/O:

• Program reads user input from "standard input" (default terminal)
• Program writes user output to "standard output" (default terminal)
• Program also has available a "standard error" channel for writing output not intended to be processed
• To provide input from a file, use the "<" command-line operator

``````      \$ python3 myprog.py <myinput.txt
``````
• To store output in a file, use the ">" operator

``````      \$ python3 myprog.py <myinput.txt >results.txt
``````
• Python's stupid `input()` prompts are to standard output

• These things are available from Python via the `sys` module:

``````      from sys import stderr

print("oops", file=stderr)
``````

### Using `read()` and `readline()`

• `.readline()` reads a line (like `input()`)

• Includes trailing newline
• Looping over a file
• `.read()` reads a whole file (with newlines)

• Fairly viable in 2014
• All of this works with `stdin`

### Stripping Trailing Newlines

• You can apply `.rstrip('\n')` to a string to remove any trailing newline

• A note on dot

• Think of `x.y()` as `y(x)`
• You can string dots together; they left-associate
• Style is object-oriented programming; soon

### Exercise: Command-Line Programming

• Write a program that counts the number of characters on its input and prints the count as its output. Include newline characters etc in the count.

### Enhancing Your Instrumentation: A `note` Function

• Let's implement a function that does a couple of nice instrumentation things

• Prints your debugging messages to `stderr`
• Prints them only when debugging is turned on
• To do this, we're going to have to explore Python quite a bit

• This is the kind of process I go through every week

### Keyword Arguments

• Note that `print()` takes "keyword arguments" where the actual is passed by formal name

• Keyword arguments can help with getting arguments in the right order

### Default Values

• Arguments can be given default values

``````      def mystery(x = 5):
print(x)
``````
• Defaulted arguments must be at the end of the formals

### Variable Arguments

• Let us figure out how to implement Python functions that take a variable number of arguments, like `print()`

• Now, let's make the default be `stderr`

``````      from sys import stdout, stderr

def note(*args, **kwargs):
if "file" not in kwargs:
kwargs["file"] = stderr
print(*args, **kwargs)

note("hello", 5)
note("hello", 6, file=stdout)
``````
• Now, let's make it all controlled by a debug variable

``````      from sys import stdout, stderr

debug = False

def note(*args, **kwargs):
if not debug:
return
if "file" not in kwargs:
kwargs["file"] = stderr
print(*args, **kwargs)

note("hello", 5)
note("hello", 6, file = stdout)
``````

### Implementing A Python Module

• We would like to make our new `note` function available for future programs

• Just stick it in a file named `noting.py` and put it where Python can find it

• Current directory
• System standard place
• The `PYTHONPATH` "environment variable"

``````    \$ PYTHONPATH=/tmp
\$ export PYTHONPATH
``````
• Now can import it just like anything else

``````      from noting import *

debug = True

note("noting something")
``````
• However, Python module variables don't work right, so we'll have to fix our interface

``````      from noting import *
import noting

noting.debug = True

note("noting something")
``````

### Cleaning Up Our Code

``````        from noting import *
noting.debug = True

def earliest_repeated(s):
"""Find the earliest most-frequently-occurring
element of a nonempty list s."""
assert s != []
m = 0
d = {}
for k in s:
note("Considering", k)
if k in d:
d[k] += 1
note("Increased count to", d[k])
else:
d[k] = 1
note("Initialized count")
if d[k] > m:
note("Found new maximum", m)
m = d[k]
for k in d:
if d[k] == m:
note("Returning", k)
return k
assert False
``````

### Debugging

From: Debugging Your Programs, Bart Massey 2013-01-24

### What is debugging?

• During/after coding, before/during/after testing

• Bring the program to a state where it appears to be bug-free (but this is a lie)

• Estimate 20-40% of programming effort

• "Secret": No good books, no chapter in our book, nothin'

### How to debug

• Given a failure of the software:

• Find the causes ("faults") leading to that failure
• Find the root causes of those faults
• Figure out and apply a repair
• Check the repair
• Does it fix the failures?
• Does it cause new failures?

### Key activity: diagnosis

• Like in medicine or car repair: "It doesn't work; what's happening and what can be done?"

• Diagnosis is hypothesis formation and testing

• What possible reasons might there be for observed symptoms?
• Can those reasons be ruled out by what is known so far?
• If not, can we do tests to rule each reason out or increase our belief that it is the correct one?
• Repeat until exactly one possible reason remains, and it looks really likely to be true.

### Common bugs

• Two basic kinds:

• Just plain calculating the wrong thing
• Examples

• Off-by-one "fencepost" errors
• Copy-and-paste calculation errors
• Typos/"Thinkos"
• Failure to design to the spec
• Failure to understand/implement the design

### Root Cause Analysis

• It's not enough to find the line of code that "causes the bug"

• You want to find out how that line got there

• In software, faults are caused by mistakes ("errors") that were made by a human (usually you)

• With the "root causes" found, you can:

• Correct all the faults caused by that error
• Take steps to make that error less likely in the future

### Preparing code for debugging

• "Real programmers don't comment. It was hard to write--it should be hard to read and harder to understand."

• Code should have a spec, simple tests, and pseudocode

• Formatting should be as clean as possible

• Consistent indentation
• Consistent liberal use of whitespace
• Good names
• Idiomatic
• Code should be instrumented appropriately

### Debugging pre-inspection

• Read the code in question carefully. Look for things that are wrong or unclear

• Explain the code to someone. Have them look at it too

• Most bugs are easily found and fixed by inspection alone

• Are you sure the spec and tests are correct?

• White-box: what kinds of similar inputs might produce the same program misbehavior?

• Black-box: what properties distinguish misbehaving inputs?

• Is the timing as expected?

• Are your current hypotheses consistent with everything you have observed or can observe?

### Debugging tools: print() function

• For a specific hypothesis, stick a print() in that will either disconfirm or confirm the hypothesis

• Works in a huge variety of situations
• But don't spam instrumentation everywhere, or you will get confused by it
• Can use print() for exploring program behavior ("tracing"), but beware: one can waste a lot of time doing this without learning anything.

• Always best to know what the question is before you start looking for the answer

### Debugging tools: "debugger"

• Idle will happily provide you the ability to

• Step through your program one statement at a time
• Run until a given program line is reached
• Examine/change variable values anytime stopped
• Except the debugger is really fragile and hard to use

• In particular, doesn't interact well with input()

• In general, debugger is tool of last resort

### Post-diagnosis

• Once you've found the immediate source of a bug, do RCA

• Look for other places where faults may have been inserted due to the same root causes

• Think hard about how those faults got there. What are you going to do to avoid this in the future?

• Craft fixes that fix the faults properly

• This may involve changing the design or revising the specification
• Apply the fixes, then test everything carefully

• Did the problem get fixed?
• Are there new problems?

### Backups, versions and source code management

• It is really easy to get the buggy version and the fixed version and the version you are working on right now mixed up

• Tool called source code management system helps here

• It is probably a mistake to have too many backup files around; in any case, use a consistent clear naming scheme for backup files

### Parting thoughts

• Don't get stuck!

• Interrupt yourself every few minutes and see if you're making real progress
• If you are stuck, many strategies are available:
• Try a different approach
• Take a break