C/C++ Programming

New Beginnings: Week 9
Lecture Notes by Bart Massey
Copyright (c) 2014 Bart Massey

2014-08-25: Compilation, Assembly, Linking and Loading; C/C++ Basics

Compilation Re-Reviewed

  • Python is an interpreted language: it is a program written in C, then compiled to assembly, then assembled to native code. Your Python program is just data fed to the Python program, which it treats as a computer program and "pretend executes".

  • C and C++ are compiled languages: See above.

  • Typically, a compiled language will cause very few machine instructions to be generated for each statement in the source language. This makes execution much faster.

C vs C++

  • C++ is an extension that adds object-oriented programming to C, along with a whole host of related stuff.

    • C++ is backward-compatible, for the most part: C programs can be compiled with the C++ compiler, and C and C++ can be mixed in the same program.
  • C is typically used for smaller, more low-level programs. Its style of use is like imperative Python programming.

  • C++ is typically used for larger applications. Its style of use is like object-oriented Python programming.

  • Most of what we'll do this week is actually C. I will try to point out the C++ specific stuff.

A First C Program

  • Let's write the same first program everyone else does:

          /* Copyright © 2014 Bart Massey */
          /* Print "hello world". */
    
          #include <stdio.h>
    
          int main(void) {
              printf("hello world\n");
              return 0;
          }
    
  • Things to notice:

    • Different comment syntax.
    • The "C Preprocessor" CPP for include.
    • Funny function definition for main.
      • The return type of main.
      • The return value of main.
      • The name main is special.
    • Curly braces and semicolons.

CPP

  • One surprising step: before translating C/C++ code to assembly, the compiler passes it through a source-to-source "macro preprocessor" called CPP.

  • CPP has a number of interesting features: The one we care about here is the ability to textually "include" the text of another file into the current file.

  • CPP include is the standard mechanism for loading "module interfaces" in C/C++: stdio.h is a system file that includes a declaration of the printf() function, among other things.

Compiling Our Program

        $ gcc -Wall -g -o hello hello.c
        $ ./hello
        hello world
        $
  • We will use gcc as our C compiler. There are lots of C compilers in the world. This one is the best.

  • The program suffix should be .c.

  • The -o flag names the output executable file. The default executable is named a.out, which is not very useful.

  • The -g flag compiles the program in such a way that it can be used with an interactive debugger.

  • The -Wall flag instructs to give the compiler warnings about anything that looks fishy. Always use this.

  • The -O flag (or -O2 -O3 -O4) turns on compiler optimization of your program.

  • The -E flag causes gcc to emit the CPP output to stdout instead of compiling.

  • The -S flag causes gcc to produce hello.S containing the assembly code that will be assembled.

  • The -c flag causes gcc to produce hello.o containing the object code that will be linked.

  • The -v flag causes gcc to show what it is doing.

    • Note in particular the invocations of cc1, as and collect2 in the compilation. collect2 invokes ld to get its work done.

A More Interesting Program

  • Let's write a C recursive Fibonacci program. As we know, this gets really expensive to execute really fast. See fib.c in the nb-misc repo.

    • Compare with fib.py in the repo.
  • This introduces some new things:

    • The type of strings in C is "char *". This is confusing. We'll explain tomorrow.

    • Some of the expression syntax is different, e.g. "||".

    • The printf() function is the standard C formatting printer.

      • Some C++ users use an alternate I/O system. We won't cover that this week, but your tutorial will probably explain it.
    • Argument processing is a little magic. I'll walk through it.

Benchmarking C

  • Let's try our fib implementations with n = 40.

  • The UNIX time command can be used to time program execution.

Variable and Function Declarations; Types

  • In C, variable and function declarations must be explicit, and occur before first use.

    • In C90, the default for gcc, variable declarations must occur at the start of a block.
    • In C99 (-std=c99) they can occur at point of first use.
  • C has some fundamental types:

    • Integers are int and are limited in size; they will silently overflow and "wrap around" if the value gets too large or small.

      • Smaller integers are available as char and short, and maybe larger as long or long long.

      • You can ask that an integral type be non-negative with unsigned.

    • Floating point numbers are double.

      • Smaller floats are available as float.
    • Characters are just small integers. This is different from Python, where they are small strings.

    • Strings are of type char *. This means that they are essentially the address of their first character. As in Python, string constants are read-only.

    • Arrays are like Python lists, except that you can't really do anything with them except read and write their elements. In particular:

      • No slicing.
      • No indexing from the back.

Seperate Compilation and Linking

  • Let's build a "module" containing a factorial function and a choose function.

  • Now let's build a client that uses this module to calculate some poker statistics.

    • Number of poker hands.

    • Probability that a poker hand contains at least one pair.

  • Note that overflow is killing us here. Let's improve the choose function by doing the calculations in a better order.

2014-08-26: Pointers and Address Arithmetic; Memory; Linked Lists

More About char * As Strings

  • A string constant is treated by the C compiler as follows:

    • The characters (bytes) of the string are put into memory (then marked read-only by the OS).

      • The string is "null-terminated" with a 0 byte.
    • The string constant is then referred to by the C program as just its address.

  • You'll want to get familiar with several things about the char * that strings are represented as:

    • You can do "pointer arithmetic": Phrases like

          "hello" + 1
      

      have their natural meaning: the address of the e in hello. So

          printf("hello\n" + 1);
      

      will print "ello" and a newline.

      Try printf("%d\n", "hello"); to see what address the string is at. It will work fine.

    • You can "dereference" an address with the "*" operator. So

          printf("%c\n", *"hello");
      

      will print an h.

    • You can take the address of something with the "&" operator. So

          char *c = "apple";
          printf("%d\n", c);
          printf("%d\n", &c);
      

      will print the storage address of the variable c.

Pointer and Address Arithmetic Is General/Pervasive

  • The C compiler actually translates array references into pointer references internally. So a fragment like

        char a[10];
        a[1] = 'c';
    

    is treated like

        char a[10];
        *(a + 1) = 'c';
    

    because in C an array variable is a pointer: it contains the starting address of the array.

  • It's common to use pointers to provide an address to be stored into:

        void f(char *s) {
            *s = 'a';
        }
    
        int main(void) {
            char x;
            f(&x);
            printf("%c\n", x);
        }
    

    prints an a.

Pointer Arithmetic Depends On The Pointed-To Type

  • When doing address arithmetic, the C compiler takes into account how large the pointed-to thing is. So

        int a[10];
        a[1] = 7;
    

    is translated into

        int a[10];
        *(a + 1) = 7;
    

    which works the way you'd want even though integers are more than one byte long. The C sizeof operator translates a thing into the number of bytes that it is long, so

        int a[10];
        printf("%d\n", sizeof a[0]);
        printf("%d\n", sizeof a);
    

    will print 4 and then 40 on most machines. Note that

        int *ap = &a[1];
        printf("%d\n", sizeof ap);
        printf("%d\n", sizeof *ap);
    

    will print 8 and then 4 on most modern machines: Turns out that addresses on modern machines are 8 bytes (64 bits) long.

Three Kinds Of Storage

  • There's storage on the stack. This is what you get for arrays declared inside a function. The storage will be allocated on the stack when the function is called, and freed from the stack when the function returns.

  • There's storage in global space. This is what you get for arrays declared globally. The storage will be allocated when the program starts, and freed when it exits.

  • There's storage in the "heap". The libraries and operating system will collude to provide memory when you ask for it, and will mark that storage as available for reuse when you give it back.

Managing the Heap With malloc() and free()

  • To get some heap storage, you use the malloc() function. It takes a size in bytes of the storage you want allocated, and returns a pointer to the start of that storage.

  • Pointers can be "null". If malloc() doesn't have any space for you, it will return the address 0, which is special. You want to check for this just in case.

          #include <assert.h>
          #include <stdlib.h>
    
          int main(void) {
              int *dynamic_storage =
                (int *) malloc(10 * sizeof dynamic_storage[0]);
              assert(dynamic_storage);
          }
    
  • The storage returned by malloc() is uninitialized. For arcane reasons, it will usually be a bunch of zero bytes, but you can't count on it.

  • You free heap storage with free(). It takes an argument which is an address previously provided by malloc(), and will free up however much storage it allocated in the first place.

          int main(void) {
              int *dynamic_storage =
                (int *) malloc(10 * sizeof dynamic_storage[0]);
              assert(dynamic_storage);
              dynamic_storage[9] = 100;
              free(dynamic_storage);
          }
    

Managing the Heap With C++ new and delete.

C++ provides a more convenient interface to the heap. Underneath, it still calls malloc() and free(), but since the language keeps track of sizes and things, it's all less error-prone.

        int main(void) {
            int *dynamic_storage = new int[10];
            assert(dynamic_storage);
            dynamic_storage[9] = 100;
            delete dynamic_storage;
        }

Common Gotchas

  • Returning a pointer to stack-allocated storage. By the time the function returns the address, the underlying storage will have been released.

          int *f(void) {
              int a[10];
              return a;
          }
    

    Note that the compiler might not give a warning here, though gcc usually figures out to.

  • Forgetting that malloc() takes bytes in its argument.

          int *a = malloc(10);
    
  • Freeing storage, then using it later anyway.

  • Not freeing storage; not a big deal unless you keep allocating lots of it. We call continuous allocation without free a "memory leak"; it can be trouble in long-running program.

An Aside: struct

  • In C, we can make a collection of named fields without creating a class with the struct declaration.

          struct pair {
              int x;
              int y;
          };
    
          struct pair mypairs[10];
    
          mypairs[0].x = 5;
          mypairs[0].y = 1;
    
  • As you might expect, address arithmetic on structs uses the size of the whole struct, and sizeof understands to add up the sizes of all the fields.

  • The "alignment rules" and "packing rules" for structs are strange: let us not speak of them here.

How To Make A Linked List In C

  • Let's look at a linked-list implementation.

  • See linked.c in the repo.

valgrind

  • This is a tool to check your program's memory use at run time.

  • Let's demo it.

In-Class Exercise

  • Add a delete_min() function that deletes the minimum element in an intlist. Test it.

2014-08-27: Static Typing; OOP; Program Performance

Review Of C Types

  • We have seen some C types now:

    • char, short, int, long, long long and their unsigned variants.

      • Include <inttypes.h> for uint16_t int64_t etc
    • float and double (usually IEEE 754 standard)

    • For any type, you can make a pointer type out of it with *

    • The array types made with []

    • struct types

Function Types and Function Pointers

  • While we haven't paid too much attention to it, function types are a kind of type also:

          int (char *x, char *y)
    

    is at least in principle the type of functions taking two string arguments and returning an int result.

  • In practice, functions are like arrays: the name of the function is the address at which its code starts.

  • Thus, the type we actually want for function variables is a pointer to a function:

          extern int f(char *, char *);
          int (*fp)(char *, char *) = f;
          printf((*fp)("hello", "world"));
    

    Notice the "invisible" parameters here: in function types, we can omit parameter names, since they aren't relevant to the type. Notice also the necessary parentheses to make everything group properly.

Typedef

  • C types can get pretty complicated pretty fast.

          typedef int myfun_t(char *, char *);
          extern myfun_t f;
          myfun_t *fp = f;
    
          struct posn_2d {
              int x, y;
          };
          struct posn_2d line_segments[15][2];
    
          double length(struct posn_2d segment[2]) {
              double x = segment[0].x - segment[1].x
              double y = segment[0].y - segment[1].y
              return sqrt(x*x + y*y);
          }
    
  • It would be nice if we could just give names to types. We can with the typedef keyword. The following "variable declaration" is actually treated as the declaration of a type name.

          typedef struct posn_2d segment_t[2];
          segment_t line_segments[15];
    
          double length(segment_t segment) {
    
  • Using typedefs makes code easier to read and understand, and thus less error-prone.

  • Some folks insist on typedefing every struct to get rid of the struct keyword. (I think this is strange.)

          typedef struct posn_2d posn_2d_t;
    

C++: Class Definitions and OOP

  • The whole point of C++ is to add types representing classes for OOP.

  • A C++ class declaration looks a lot like a C struct declaration, except that it maybe has methods, and maybe has a parent class.

  • See intlist.cc in the repo.

C++: Template Types

  • We can use "generic" types to allow the use of multiple kinds of type with the same code.

  • The syntax has angle brackets.

Program Performance

  • We've done some benchmarking before. Today we'll talk about profiling and program performance.

  • Let's supply our intlist.c with a benchmark driver.

    • Note that I've changed the API slightly to be compatible with what we did yesterday. Keeping APIs up to date and sane is a really good idea.
  • Once we've benchmarked, we compile everything with -pg and look at where the performance bottlenecks are with gprof.

2014-08-28: Finite State Machines; Turing Machines