Software Engineering Principles

New Beginnings: Week 4
Lecture Notes by Bart Massey
Copyright © 2014 Bart Massey

2014-07-21: Software Engineering; Git; Debugging

The Individual SE Life Cycle

  • As discussed, four phases:

    • What: Requirements

      • User Reqs
      • Reqs Specification
      • Watch out for design constraints
      • Construct system tests
    • How: Design

      • Architectural Design
      • Detailed Design
      • Construct unit tests
    • Do it: Implementation

    • Check it: Validation and Verification

      • Testing
      • Inspection
      • Formal Methods
  • Additional Phases

    • Debugging
    • Delivery
    • Maintenance

Project, Process, Practice

  • Process: Set of meta-activities for executing projects

    • Made up of practices: techniques and methods
  • On individual / small projects, processes for

    • Estimating and Scheduling
    • Backups
    • Change Management
    • Delivery
  • Processes should be carried from project to project

Development Models

  • Everyone knows "waterfall" is fail

    • Incremental: take small vertical cuts
    • Iterative: go up and down
  • Traceability, tracking, progress analysis

    • Know what the status of the project is
    • Know what the remaining work items are
    • Fine granularity wins

Estimating and Scheduling

  • Work Breakdown Structure: hierarchical breakdown into tiny tasks

    • Roughly one-hour blocks
  • Task: has

    • Label
    • Duration
    • Dependencies and Prerequisites
    • Resource requirements
  • Schedule: Label tasks with times

    • Must obey constraints
    • Probabilistic
    • "Critical Path Method"

User Requirements Gathering

  • Elicitation / elucidation loop

  • Functional requirements vs "-ilities"

    • Quality
    • Reliability / Stability
    • Performance
    • Cost / Time / Effort
  • User validation

Requirements Specification

  • Formalize requirements, esp functional

    • Enough detail and precision to enable tests
    • Therefore tests should be written at this stage
  • Usually just numbered paragraphs

    • But many fancy tools, notations, methodologies exist
    • Z Notation: First-order logic as a specification tool

Architectural Design

  • Top-level decomposition, then proceed hierarchically

    • Key ideas: minimize extra work / rework, risk
    • Minimize coupling / interface complexity = maximize coherence
  • Example from earlier: input / processing / output split

    • Processing phase further decomposed pre / intra / post
    • Some common architectural ideas: pipelining, concurrency
  • Get help from an experienced architect!

    • Very high leverage here

Detailed Design

  • Continue to decompose design until each component can be pseudocoded

  • Write the pseudocode

  • Check that everything is going to be OK

    • Prototyping is a powerful investigative tool here

Prototyping

  • Construction for obtaining information

    • May be reusable, discardable
    • Prefer discardable
  • Prototype should be minimal to answer question

  • Prototyping is powerful; use it

Implementation; V&V

  • Implementation: Not much to say

  • V&V

    • Verification: Was the process performed correctly?
    • Validation: Is the product right?

Three Pillars of V&V

  • Testing

    • Testing does not work: effective oracle, sample size, diagnosis
    • Testing is vital: catches dumb mistakes
    • Whole talk on this subject shortly
  • Inspection

    • Lots of kinds
      • Informal desk-checking by the author
      • Desk-checking by colleague
      • Walkthrough
      • Formal Inspection
    • Again, more shortly
  • Formal Methods

    • "Prove" program correct
    • Hoare assertions
    • Limits of correctness proof
    • Again, more shortly

Git

  • Work through the gittutorial manual page with a running example

2014-07-22: Debugging; Modular Programming

Debugging

  • Crucial skill, but not well understood

  • Practice is necessary, but not sufficient

What is debugging?

  • During/after coding, before/during/after testing

  • Bring the program to a state where it appears to be bug-free (but this is a lie)

  • Estimate 20-40% of programming effort

  • "Secret": No good books, no chapter in our book, nothin'

How to debug

  • Given a failure of the software:

    • Find the causes ("faults") leading to that failure
    • Find the root causes of those faults
    • Figure out and apply a repair
    • Check the repair
      • Does it fix the failures?
      • Does it cause new failures?

Key activity: diagnosis

  • Like in medicine or car repair: "It doesn't work; what's happening and what can be done?"

  • Diagnosis is hypothesis formation and testing

    • What possible reasons might there be for observed symptoms?
    • Can those reasons be ruled out by what is known so far?
    • If not, can we do tests to rule each reason out or increase our belief that it is the correct one?
    • Repeat until exactly one possible reason remains, and it looks really likely to be true.

Common bugs

  • Two basic kinds:

    • Bad control flow
    • Just plain calculating the wrong thing
  • Examples

    • Off-by-one "fencepost" errors
    • Copy-and-paste calculation errors
    • Typos/"Thinkos"
    • Failure to design to the spec
    • Failure to understand/implement the design

Root Cause Analysis

  • It's not enough to find the line of code that "causes the bug"

  • You want to find out how that line got there

  • In software, faults are caused by mistakes ("errors") that were made by a human (usually you)

  • With the "root causes" found, you can:

    • Correct all the faults caused by that error
    • Take steps to make that error less likely in the future

Preparing code for debugging

  • "Real programmers don't comment. It was hard to write--it should be hard to read and harder to understand."

  • Code should have a spec, simple tests, and pseudocode

  • Formatting should be as clean as possible

    • Consistent indentation
    • Consistent liberal use of whitespace
    • Good names
    • Idiomatic
  • Code should be instrumented appropriately

Debugging pre-inspection

  • Read the code in question carefully. Look for things that are wrong or unclear

  • Explain the code to someone. Have them look at it too

  • Most bugs are easily found and fixed by inspection alone

Debugging tools: your brain

  • Are you sure the spec and tests are correct?

  • White-box: what kinds of similar inputs might produce the same program misbehavior?

  • Black-box: what properties distinguish misbehaving inputs?

  • Is the timing as expected?

  • Are your current hypotheses consistent with everything you have observed or can observe?

Debugging tools: print() function

  • For a specific hypothesis, stick a print() in that will either disconfirm or confirm the hypothesis

    • Works in a huge variety of situations
    • But don't spam instrumentation everywhere, or you will get confused by it
  • Can use print() for exploring program behavior ("tracing"), but beware: one can waste a lot of time doing this without learning anything.

  • Always best to know what the question is before you start looking for the answer

Debugging tools: "debugger"

  • Idle will happily provide you the ability to

    • Step through your program one statement at a time
    • Run until a given program line is reached
    • Examine/change variable values anytime stopped
  • Except the debugger is really fragile and hard to use

  • In particular, doesn't interact well with input()

  • In general, debugger is tool of last resort

Post-diagnosis

  • Once you've found the immediate source of a bug, do RCA

  • Look for other places where faults may have been inserted due to the same root causes

  • Think hard about how those faults got there. What are you going to do to avoid this in the future?

  • Craft fixes that fix the faults properly

    • This may involve changing the design or revising the specification
  • Apply the fixes, then test everything carefully

    • Did the problem get fixed?
    • Are there new problems?

Backups, versions and source code management

  • It is really easy to get the buggy version and the fixed version and the version you are working on right now mixed up

  • Tool called source code management system helps here

  • It is probably a mistake to have too many backup files around; in any case, use a consistent clear naming scheme for backup files

Parting thoughts

  • Don't get stuck!

    • Interrupt yourself every few minutes and see if you're making real progress
    • If you are stuck, many strategies are available:
      • Try a different approach
      • Take a break
      • Ask for help
  • Don't get discouraged

    • The most experience programmers still make a lot of bugs and have a hard time fixing them

    • The bugs you will make are all fixable

Modular Programming

  • Key idea: "Information Hiding" (Parnas); try to reduce the visible complexity by hiding details behind an interface

  • A module provides an API (Applications Programming Interface)

  • A module is often implemented as a library

Modules and Namespaces

  • A module is concerned with the visibility of names

  • In Python, the names that are made visible by a module are controlled solely by the module importer

    • This is bad
  • Two kinds of names:

    • An unqualified name is one that you can just use
    • A qualified name requires a module name as a qualifier to indicate which module it is part of
  • Module best practices:

    • Don't put names in a module that are likely to collide with anything and everything
    • Import qualified and/or selectively when reasonable to do so
    • If you're going to use a large part of an API, feel free to import the whole thing unqualified; will make the code easier to understand

Modules in Python

  • We have already seen most of the syntax

          import math
          from math import sqrt
          from math import *
    
  • A naming convention: Prefixing names with _ indicates they are not intended to be used except by code local to the module

  • Modules are loaded from the current directory, or directories on the PYTHONPATH. Sadly, this means we must discuss "environment variables"

Python Module Gotchas

  • Module globals are a little weird in Python

    • Can be read just fine
    • Can be assigned to only unqualified
  • Functions can have qualifiers, and naming a function the same as a module causes grief

Python Packages

  • Stick an __init__.py file in a directory, and fill it full of imports for the pieces of the package

  • You can them import by the directory name to get a more complicated module

  • There are all kinds of clever scoping rules for the pieces

API Design

  • Try to make functions reasonably generic

  • Give the clearest possible interfaces

  • Document everything mercilessly

  • Test everything mercilessly

2014-07-23: Object-Oriented Programming; Testing

Basic OOP Idea

  • Group data together with the functions that manipulate it

  • Strong relationship to modules

A Simple Python Class

        class point():

            def __init__(self, p):
                (x, y) = p
                self.x = x
                self.y = y

Creating Objects

        p = point((3, 2))
        print(p.x, p.y)

Class Methods

        def add_point(self, p):
            (x, y) = p
            self.x += x
            self.y += y

Calling Class Methods

        p.add_point(point((1, 1)))

Setters and Accessors

        def set_point(self, p):
            (x, y) = p
            self.x = x
            self.y = y

        def get_point(self, p):
            return (self.x, self.y)

Inheritance

        def color_point(point):
            def __init__(self, p, c):
                 super().__init__(p)
                 this.color = c

OO Design

  • Strategy is simulation: model real-world entities

  • Keep things as simple as possible

  • Strong cohesion, loose coupling

A Test Is An Input-Output Pair

  • This includes "should fail" negative tests

  • How to get correct outputs? (the "effective oracle" problem)

    • Slow but correct code (but few tests)
    • Working backward from output to input
    • "Easy" inputs
  • Testing doesn't work

Black Box vs White Box Testing

  • Black Box: Tests given without knowledge of implementation

  • White Box: Use implementation knowledge to construct tests

  • Both are valuable

Test Domains

  • Idea: Divide input or output space up such that only one representative in each domain need be tested

    • How to draw domain boundaries?
    • There are still a lot
    • Better idea: formal methods plus testing

Testing Kinds

  • What should be tested?

    • System (acceptance)
    • Unit
    • Integration
    • Regression
  • Integration testing can be

    • bottom-up
    • top-down

Testing Methods

  • Random (e.g. "fuzz testing")

  • "User Tests"

  • Code-driven

  • Domain-driven

  • Coverage-driven

Coverage Testing

  • How much of the program has been tested?

    • All statements?
    • All branches (each way)?
    • All code paths?
    • All data patterns?
  • Automated tools (e.g. gcov) exist

  • 100% coverage is impossible

  • Untested code is broken code

Fault Seeding

  • Attempt to find out how good the tests are

  • Use SCMS to reliably remove seeded faults (!)

Testing Infrastructure

  • Tests need to be maintained with code

  • Tests need to be runnable automatically

  • Test failures need to be logged as tickets until fixed

2014-07-24: Exercise: Textual Analysis

Digression: Open Source and Software IP

Textual Analysis

  • Used to find distinctive properties of a text.

    • e.g. "who wrote Shakespeare", "who wrote this leak memo?"
  • Some features to consider:

    • Sentence, word, paragraph length
    • Word usage frequencies
      • Unusual words
    • Idiomatic usage e.g. emoticons, textspeak
    • Punctuation

Corpus

  • Blog posts from Keith Packard and myself.

  • git://svcs.cs.pdx.edu/git/blog-corpus.git