Final Topics

New Beginnings: Week 10
Lecture Notes by Bart Massey
Copyright (c) 2014 Bart Massey

2014-09-02: Bit Banging

Review: Representation Of Integers

  • Reminder: In a computer's memory, integers are represented by binary switch patterns: e.g., a byte containing 00010111 might reasonably be interpreted as containing the number 23.

  • Common representation sizes are 2**3 = 8 bits, 2**4 = 16 bits, 2**5 = 32 bits and 2**6 = 64 bits.

    • As humans, we often use base 16 ("hexadecimal" or just "hex") to represent these larger sizes, because we have a hard time dealing with long binary digit strings, and base 16 is easily interconvertible with base 2: e.g., 00010111B = 17H.

    • In an unsigned representation, this means that a 64 bit store can represent a number between 0 and FFFFFFFFFFFFFFFFH (18446744073709551615), which is large enough for most (but not all) purposes.

      • 20! = 2432902008176640000 = 21c3677c82b40000H is the largest factorial that will fit.
  • A CPU will almost always have a "natural" word size: that is, the number of bits of storage for each of its internal scratchpad registers. This unit is 8 bits for tiny hobbyist CPUs such as the Arduino AVR, but for real computers is always either 32 bits (older and smaller parts) or 64 bits (newer and larger parts).

Negative Integers

  • We haven't talked much about negative numbers, other than to note briefly that they use part of the storage in this representation for a "sign bit" and that C allows you to declare either "signed" or "unsigned" ints of a given width.

    • "signed" and "unsigned" are strictly speaking not properties of the representation, but of the interpretation of the bits; the whole C system is kind of a bodge.
  • The interesting tactic that all modern computers use is called "twos-complement" representation of negative numbers:

    • Let ~x be the "bitwise negation" of the number x: that is, flip every bit in the binary representation.

    • We will define the representation of "-x" to be "~x + 1". That is, to negate a number, bitwise negate it and then add one to it, discarding any carry from the high bit position.

    • We know that "un-negation" is the same as "negation", so if we start with 0xffff in a 16-bit register, for example, we find out that the negation of this number is 0x0001: that is, the original number was -1.

    • We will interpret a number as negative if its uppermost bit (the "sign bit") is a 1, and non-negative otherwise.

    • Of necessity, since the set of numbers we can represent with n bits has even cardinality, the representation will be "unbalanced" (no separate 0 and -0): in our case we can represent all the numbers in some range -(x+1)..x

  • The cool fact about twos complement is that to subtract y from x, you can just negate y and add, discarding any carry: the result will always be correct. That's why CPUs use twos complement: it makes subtraction really easy to implement.

Endianness

  • OK, so suppose we want to store a 64-bit number in computer memory. Memory is organized into 8-bit bytes, each with its own address. We have a couple of hard decisions to make:

    • Within a byte, is the most significant bit (MSB) the bit on the "left", or the bit on the "right". Because computers hide the ordering of bits within a byte, we always adopt the convention of standard place-value numbering: the LSB is on the right and the MSB on the left.

    • Harder: Do successive bytes represent bits of increasing or decreasing significance?

  • We call the byte-ordering with least significant bytes at lower addresses "little-endian" and the other ordering "big-endian". Things are still all over the place here: Intel processors are little-endian. ARM processors can actually have their endianness configured! Older ARM HW/SW used to mostly run big-endian; newer is all little-endian. MIPS is big-endian. SPARC is big-endian.

    • (Little-endian and Big-endian are from Gulliver's Travels, and are intended to make fun of big fights over trivial differences. Indeed.)
  • Once a value is sitting in a machine register, byte-endianness, like bit-endianness is obscured. So one only has to worry about it when reading or writing memory in big chunks.

Bitsets

  • Another natural way to interpret an n-bit number is as the "characteristic function" of an n-element set. That is, let each bit in the number be a 1 if and only if the corresponding element in our reference set is turned on.

    • For example, the 32-bit number 00104111H might be interpreted as the set of alphabetic characters {a, e, i, o, u}, where each character gets a bit position in alphabetical order starting from the right.
  • Because it is so natural to represent small sets this way, and because they are easy to provide, CPUs almost always provide instructions specifically for manipulating bitsets. In C or Python:

    • ~x is the bitwise negation operator: it effectively produces the complement of a bitset (with respect to a word-sized universe.)

    • x & y is the "bitwise and" operator; it produces a word that has 1 bits in exactly those positions where both x and y had 1 bits. For example, 7fH & f7H == 77H. Bitwise and effectively produces the intersection of two bitsets.

    • x | y is the "bitwise or" operator; it produces a word that has 1 bits in exactly those positions where at least one of x or y had 1 bits. For example, 7fH | f7H == FFH. Bitwise and effectively produces the union of two bitsets.

    • x ^ y is the "bitwise exclusive-or (xor)" operator. It does what you would expect. Its most natural interpretation on sets (not very natural) is "partial complement": flip those bits in one set that are in the other set. (There is a "symmetric set difference" operator; it isn't used much.)

Bit Shifts

  • The last category of instructions that is worth paying attention to in this regard is the "bit shift" instructions. Older CPUs would allow the bits to be shifted one left or right in a single instruction: newer CPUs allow multiple-bit "barrel shifts".

  • One can think of left and right shifts as multiplying and dividing by powers of 2. This is because of how the binary representation of numbers works. But beware negative numbers! See below.

  • x << y is the shift-left operator. The right-hand operand is the number of bits to shift the left-hand operand left by. The missing bits are filled in with zero. For example, 01H << 5 == 20H.

  • x >> y is the shift-right operator. The right-hand operand is the number of bits to shift the left-hand operand right by. It notionally comes in two flavors:

    • A "logical right shift" fills in zeros in the missing position. This corresponds to division for unsigned quantities.

    • An "arithmetic right shift" fills in copies of the sign bit (MSB) in the missing positions. This corresponds to division for signed quantities.

  • In C, it is undefined whether right shift is arithmetic or logical!

  • In Python (and Nickle), with its arbitrary-sized integers, things get weird: negative numbers are defined to be twos-complement. Given that integers have no bound on their length, this corresponds to negative integers having a notional "infinite string of 1 bits" to their left. Thus, Python right shift sort of has to be arithmetic, but it's hard to tell.

Element-wise Bitset Manipulations

  • It is often a thing to want to turn a single bit in a bitset on or off, or test for its state.

  • To turn a bit on, just form a single-element bitset using a left-shifted 1, and use bitwise or to union that bit into the set. For example, to turn on bit 5 of a bitset x (counting from 0):

          x = x | (1 << 5)
    
  • To turn a bit off, use bitwise complement when forming the set, and bitwise and to kill the bit. For example, to turn off bit 5 of a bitset x (counting from 0):

          x = x & ~(1 << 5)
    
  • To test a bit, you can bitwise and a bitset containing the bit to be tested, and then use the fact that C and Python consider non-zero values to be Boolean true.

          if (x & (1 << 5))
    
  • To extract a bit for further processing, you may want to do things the other way around, so that you end up with either 0 or 1 at the end.

          y = (x >> 5) & 1
    

    You can now shift this bit back to the left to "put it somewhere else", etc.

Cardinality

  • To determine the cardinality of a bitset x, we can just shift x right one bit at a time, counting one bits as we go, until x becomes 0 or we know we've run out of machine word. The limited width of C integers vs Python's infinite integers makes a difference here:

          def bitset_size(x):
              assert x >= 0
              n = 0
              while x > 0:
                  n += x & 1
                  x >>= 1
              return n
    
          int bitset_size(int x) {
              int n = 0;
              int i;
    
              for (i = 0; x != 0 && i < 8 * sizeof x; i++) {
                  n += x & 1;
                  x >>= 1;
              }
              return n;
          }
    

Exercise: Integer Logarithm

  • Write a Python function to find the position of the highest 1-bit in a non-negative integer x. This value bears a strong relationship to the integer logarithm of x.

Bit Paralellism: Popcount Revisited

  • Another way to think of bitwise operations is as computing a bunch of boolean results in parallel. On a 64-bit machine word, for example, we can compute x && (y || z) for 64 bits at a time by simply replacing the int-at-a-time operations with bit-at-a-time ones.

  • We can also think of arithmetic operations + and - as being bit-at-a-time, except that they borrow from or carry to the left. For example, 43H + 34H == 77H: the addition being done on the right four bits cannot effect the addition being done on the left four bits, since there is no carry from the right digit to the left digit in the addition.

  • Let us return to our bitset cardinality operation of earlier. We will try to improve it from O(n), where n is the number of bits in the bitset universe, using bit parallelism. We will work with 64-bit bitsets for now.

    • Consider

          x = (x & 0x5555555555555555) + ((x >> 1) & 0x5555555555555555)
      

      This operation adds all the even bits to all the odd bits. Since there is a zero to the left of each one-bit sum, at the end we can think of the result as 32 2-bit sums.

    • Now add the two-bit sums to get four-bit sums.

          x = (x & 0x3333333333333333) + ((x >> 2) & 0x3333333333333333)
      
    • Now add the four-bit sums to get 8-bit sums. Note that there can be no carry needed here: the largest sum we can have at this point in any position is 4, which is representable in 3 bits. So there must be a zero bit at the left in every position already: the result will still fit in four bits, since it is less than or equal to 8. We will end up with "duplicate" sums without the masking, so we will need to clean up at the end.

          x = (x + (x >> 4)) & 0x00ff00ff00ff00ff
      
    • Now add the 8-bit sums to get 16-bit sums, and the 16-bit sums to get 32-bit sums, and finish with the 64-bit sum. No more masking is needed until the very end: the value 64 fits easily in 8 bits.

          x = (x + ((x >> 8))
          x = (x + ((x >> 16))
          x = (x + ((x >> 32)) & 0x3f
      
  • Obviously, we have reduced an O(n) operation to O(lg n) using bit parallelism. Equally obviously, both operations were really already O(1) anyway on real machines, which have only limited register width available.

  • In Python, it probably makes sense to do some kind of divide-and-conquer regardless of set size, since the number of Python statements executed is the limiting factor on performance. On the other hand, it's hard to see how to generate the bitmasks needed to make this work without knowing some bound on the integer size, and without O(n) time. Hmm. Sounds like a challenge.

Should Have Been In The Reading

2014-09-03: Legal and Ethical Issues

The Law

  • We already covered the basics of IP law:

    • Copyright
    • Patent
    • Trademark
  • In the process, we also covered the basics of civil and criminal law.

  • Contract: An exchange (important) of considerations between parties for mutual benefit. Must be written and signed by both parties to be worth much, except in specific well-defined cases.

    • You can contract around any legal thing or activity. Most commercial transactions have an implicit or explicit contract.

    • Software is often governed by contract between parties.

  • License: A kind of contract in which one party grants legal rights rather than ownership.

    • Most commercial and open source software is governed by license.

    • Licenses may be granted.

Ethics

  • Ethics does not tell you what is right. Ethics gives methodologies for you to figure it out for yourself.

  • There is no one universal ethical framework everyone agrees on. Examples of ethical frameworks:

    • Virtue / religious / moral framework: Defines some absolute notions of right and wrong, without any necessary rationale. Most people use these to some degree: there are things that are widely regarded as "wrong" or "right" in Western societies and taken as absolutes. c.f. 10 Commandments.

    • Utilitarian "good of society" framework: Defines actions as right or wrong depending on how they affect people individually as a whole. Actions with positive outcomes for many people, or strong positive outcomes for a few, and that minimize harms are considered right.

    • Enlightened self interest framework: Actions are right if they are good for the actor, with some obvious exceptions. There is usually an assumption in this framework that if everyone acts in enlightened self interest, it will be better for everyone (to some degree, in the long run).

  • Ethical frameworks raise the question of why they are to be considered, for example:

    • To justify to yourself or others the actions you are going to take anyway.

    • To help substitute reason for emotion in decision-making processes.

    • As a good in their own right.

Software Ethics

  • There isn't really anything super special about software ethics. However:

    • Society has had less time to develop / adapt frameworks around software.

    • Ethical "consensus" is being driven to a very large degree by parties with huge self-interest in setting the rules.

    • Technology is complicated, so the ethical considerations are sometimes more complicated.

Case Study: Software / Music / Book "Piracy"

You are considering whether to download an illegal copy of Adobe Photoshop from a pirate site. You are confident that there is little chance your illegal use will be detected now or in the future. You could not afford Photoshop, so this is the only way you could get it.

  • Case: You plan to use Photoshop to work on advertising images for a charity that searches for missing children.

  • Case: You plan to reverse-engineer Photoshop to identify what image processing algorithms it uses. You do not plan to use it for any practical purpose.

  • Case: You plan to use Photoshop for student projects.

Case Study: Job Application Evaluation Software

A Portland company has made a living for some time with the following strategy:

  • Have a bunch of successful job applicants for low-level positions (e.g. convenience store clerk) fill out a questionnaire full of basically random questions (the "MMPI"). Record the responses.

  • Record, for each person filling out the questionnaire, how long that person stayed on the job from the time they were hired.

  • Feed the results of all this to train a machine learner to try to predict job longevity from questionnaire responses.

  • Use the trained learner to predict the potential job longevity of new applicants.

It turns out this works pretty well. Ethical considerations?

  • Is it "fair" to those rejected for reasons that prima facie have nothing to do with their application proper?

  • One of the claimed advantages is that the process is unbiased against underrepresented groups, because the machine learner does not have that information. Is this true? If it is true, does it make it OK?

Case Study: Military Aircraft Fuel Cost Reduction

In graduate school, I worked on a project to reduce the fuel used by U.S. Air Force transport aircraft, by optimizing aircraft routing. My professor-employers got a lot of funding from the military, but had strict rules against working on projects that involved direct military activity, or that required security clearances. This project fell within their guidelines.

  • Are these guidelines ethical? Is there a sharp ethical boundary that can be drawn here?