# Data Structures

## Part II of II

### Intro

``````   +-----------------------------------+
| applications                      |
| databases                         |
+-----------------------------------+            +------+
| dictionaries (associative arrays) |            |      |
| priority queues                   |            |      |
| stacks, queues                    | containers |      |
|                                   +------------+      |
|                                   +------------+------+
| arrays, hash tables               | contiguous |
+-----------------------------------+------------+
| OS memory abstraction             |
+-----------------------------------+
``````

The horizontal borders represent layers of abstraction on which structures above are built. We're focusing on data structures optimized for storage and retrieval.

• Part I notes

• arrays
• stacks & queues
• priority queues
• trees
• Part II notes

• hash tables
• dictionaries
• block chain

### Hash Tables

• Dream data retrieval scenario: we know the index in an array where our data would be if it were in the array

• A hash function is a mathematical function that maps keys to integers.

• For each key:
• map the key to a large integer
• take the modulo of that integer by a constant
• This his provides a fixed-length hash with the following properties:
• every key can be assigned a hash value
• two equal keys will have the same hash value
• with a good hash function, most unequal keys will have different hash values

A hash table then uses that index to store and retrieve the value of a key. What to do in the event of a collision of hash values from two unequal keys?
Values can be chained together in a bucket: • Pros and Cons of chaining
• It's simple and flexible.
• But pointers take space

Is there a way we could use the empty, adjacent spaces to hold collided values? • The simplest approach is sequential probing
• insert item into the next open space in the table
• this creates a 'run' of adjacent occupied spaces
• upon deletion, re-insert every item following the gap
• Pros and cons of sequential probing
• traversal is faster if there are collisions
• deletion requires O(n) re-hashing or lost space to a tombstone

Efficiency of a chained hash table with doubly linked list
and n keys in a table of m buckets

expected worst case balanced tree
search O(n/m) O(n) O(lg n)
insert O(1) O(1) O(lg n)
delete O(1) O(1) O(lg n)
successor O(n+m) O(n+m) O(lg n)
predecessor O(n+m) O(n+m) O(lg n)
minimum O(n+m) O(n+m) O(lg n)
maximum O(n+m) O(n+m) O(lg n)
• Hashing algorithms can be optimized for different use cases:
• Rabin-Karp
• particularly good at finding any one of a set of sub-strings of a text
• average O(n+m) with text of length n and substrings of combined length m
• worst-case O(nm) with high collision
• Knuth-Morris-Pratt
• faster for single pattern matching, O(n)

#### Dictionaries

• key-value storage structure like the word and definition in a dictionary
• provide the same operations as the other structures we've seen, with focus on
• search
• insert
• delete
• min and max allow dictionary to serve as a priority queue on the values of keys
• predecessor and successor allow iteration through the dictionary

• can be implemented with any of data structures we've seen

• especially well suited to bsts and hash tables

#### Block Chain

We want to transmit ordered blocks of data and know if any of them have been modified or reordered. Hashing can help.

• In the case of block n
• append the identifier of block n-1 to your current block data
• compute the hash of this new concatenated string
• if it below a target value (if it has a certain number of leading zeros)
• this is the identifier of block n
• otherwise, calculate a nonce (a value used on one occasion)
• the hash of block n-1's identifier, the block data, and the nonce is below the target value.
• The resulting hash below that value is the block n's identifier. • Application in cryptocurrency
• blocks can be transmitted openly over a network
• if one is intercepted and altered, the hash of its data, its nonce, plus the identifier of block n-1 will no longer match the identifier of block n recorded in block n+1.
• you could intercept block n+1 but you would need to recompute the nonce for block n to reflect your changes and get a new identifier to place in n+1...
• and do this for every block n in the chain
• because calculating the nonce is very computationally expensive, over a long enough chain, you are provided a degree of confidence that no block has been altered without the need to encrypt any part of the chain
• Applications in networking more generally
• cypher block chaining, invented in 1976, encrypts the block chain with a single key and removes the need for a computation intensive nonce Outline prepared by Sascha Strand from:

S.S. Skiena, The Algorithm Design Manual, 2nd ed., chapter 3
Springer-Verlag London Limited 2008

Image credits:

Sandeep Dayananda, Blockchain Enthusiast, 2017
https://www.quora.com/How-do-I-gain-an-in-depth-understanding-of-Blockchain