# Data Structures

## Part I of II

### Intro

```
+-----------------------------------+
| applications |
| databases |
+-----------------------------------+ +------+
| dictionaries (associative arrays) | | |
| priority queues | | |
+-----------------------------------+------------+ ADTs |
| stacks, queues | containers | |
| +------------+ |
| linked lists, trees | linked | |
| +------------+------+
| arrays, hash tables | contiguous |
+-----------------------------------+------------+
| OS memory abstraction |
+-----------------------------------+
```

The horizontal borders represent layers of abstraction on which structures above are built. We're focusing on data structures optimized for storage and retrieval.

Part I notes

- arrays
- linked lists
- stacks & queues
- trees
- priority queues

Part II notes

- hash tables
- dictionaries
- block chain

### Contiguous vs. Linked Data Structures

Contiguous data structures are single slabs of memory

- arrays
- matrices
- heaps
- hash tables

Linked data structures are chunks of memory bound together by pointers

- lists
- trees
- graph adjacency lists

### Arrays

Benefits of fixed-size array:

- constant-time access given the index
- space efficiency
- memory locality increases cache hit rate

Downsides of fixed-size array:

- fixed in size

Efficiency of unsorted fixed-size array:

efficiency | ||
---|---|---|

search | O(n) | |

insert | O(1) | until it fills up |

delete | O(n) | find target item, move the last item into the gap created |

successor | O(n) | |

predecessor | O(n) | |

minimum | O(n) | |

maximum | O(n) |

- Binary search
- increase performance of all retrieval operations
- decrease performance of all modifications

Efficiency of sorted, fixed-size array:

efficiency | ||
---|---|---|

search | O(lg n) | |

insert | O(n) | find the right spot and scoot everything else over |

delete | O(n) | find item, delete, scoot everything over |

successor | O(lg n) | |

predecessor | O(lg n) | |

minimum | O(1) | |

maximum | O(1) |

- Efficiencies assume:
- we're given the data structure and target value but not index
- we know the length of the array

### Dynamic Arrays

- Start with an array of size m
- When it's full, create a new array of size 2m
Copy contents of initial array to new array and free initial array

Compared to fixed-size array:

- can grow indefinitely
- could shrink if implemented to do so
- some insertions will take O(1) time and some will require running the O(n) re-size operation

Amortization of the re-size operation

*ad*+*mort-*-> 'until death'- how many times would the first element need to be recopied after
*n*insertions?- once after the first, second, fourth, eighth, insertion...
*lg n*doublings

- elements farther down the array will be copied fewer times
- half the elements move once, a quarter move twice, ...
- total number of movements
*m*in array of size*n*is 2*n*, which is in O(*n*) - this total cost of O(
*n*) movements can be spread out over all*n*elements for an amortized time of O(*n*/*n*) or O(1) per element

Efficiency of unsorted dynamic array:

efficiency | ||
---|---|---|

search | O(n) | |

insert | O(1) | O(n) re-size operation amortized over n elements |

delete | O(n) | |

successor | O(n) | |

predecessor | O(n) | |

minimum | O(n) | |

maximum | O(n) |

Efficiency of sorted dynamic array:

efficiency | ||
---|---|---|

search | O(lg n) | |

insert | O(n) | at most n operations to make space for the insert |

delete | O(n) | remove item and move subsequent items over to fill gap |

successor | O(lg n) | |

predecessor | O(lg n) | |

minimum | O(1) | |

maximum | O(1) |

Note that this is the same as the sorted, fixed-size array.

### Linked Structures

- Benefit over dynamic array is its modularity
- more complex data structures can be readily assembled and reconfigured with pointers
- efficiency involves time/space trade-off on insert, otherwise the same as a dynamic array

Efficiency of unsorted linked list:

efficiency | |
---|---|

search | O(n) |

insert | O(1) |

delete | O(n) |

successor | O(n) |

predecessor | O(n) |

minimum | O(n) |

maximum | O(n) |

Efficiency of sorted linked list:

efficiency | ||
---|---|---|

search | O(n) | |

insert | O(n) | |

delete | O(n) | |

successor | O(n) | |

predecessor | O(n) | |

minimum | O(1) | if ascending order |

maximum | O(n) |

### Stacks & Queues

Container: a data structure that permits storage and retrieval of data items independent of content

- distinguished by the retrieval order they support
- for these two "most important" types of containers, retrieval depends on insertion order
- FIFO
- LIFO
- Remember "F" in FIFO for "fair" as a human waiting in line.

Stacks and queues have the same average wait time

- Queues minimize the maximum wait time
- Stacks are simpler to implement
- depending on implementation, they could also be more efficient

### Trees

Wikipedia lists 99 ADTs categorized as trees

- parse tree
- decision tree
- AVL tree
- red-black tree
- k-ary tree
- ...

Binary Search Trees

- Sorted arrays provide fast search
- Linked lists provide flexible update
- We want to use binary search on linked lists
- Binary search trees implement exactly that

- Tree key terms and definitions
**node**: a point at which pathways branch or end of a path**leaf**: a node with no children, end of a path**fringe**: all the leaves**edge**: connection between two nodes**forest**: a set of disjoint treesdescribing node position

**level**: level of a node is 1 + |edges b/w node and root|**height**: height of a node is |edges b/w node and farthest leaf|**depth**: depth of a node is |edges b/w node and root|- node with value 14 is at level 3, with a height of 1 and depth of 2

**binary tree**is- a single node or
- a root node with another binary tree on its left and/or right

**binary search tree**is- binary tree with a value at each node
- all values in the tree to a node's left are less than that node's value
- all values in the tree to a node's right are greater than that node's value
- there are no duplicates (though you could implement a counter at each node)

Full and complete

- A binary tree is
**full**if every node has zero or two children. - A binary tree is
**complete**if every level is full except the bottom level which is filled from left to right.

- A binary tree is

- traversal
- breadth first
- depth first
**pre-order**: parent, left, right**in-order**: left, parent, right**post-order**: left, right, parent- note pre- and post- relative to parent placement

- insert node: always at a leaf
- delete node
- at leaf, just remove leaf
- node with one child, replace with child
- node with two children
- greatest element in left subtree or
- least element in right subtree

- Degeneration
*de-*+*genus*-> 'not of its kind'- creating a BST from simple insertions on sorted list will create a list
- heights over a single data set can range from
*n*to lg*n* - when balanced,
*h*= ceil(lg*n*) - we guarantee this by using insert operations from self-balancing trees.
- AVL tree
- red-black tree

Efficiency of BSTs:

efficiency | ||
---|---|---|

search | O(h) | |

insert | O(h) | |

delete | O(h) | two O(h) operations |

successor | O(h) | |

predecessor | O(h) | |

minimum | O(h) | |

maximum | O(h) |

### Priority Queues

- A queue where every element has a priority value
- enqueue will insert based on priority into the queue
- dequeue will remove the item with the highest priority

Efficiency of priority queues:

unsorted array | sorted array | balanced tree | |
---|---|---|---|

insert | O(1) | O(n) | O(lg n) |

find minimum | O(1) | O(1) | O(1) |

delete minimum | O(n) | O(n) or O(1)* | O(lg n) |

* depending on if you remote the gap created by a deleted element

### Part I Conclusion

Bold type indicates minimum value by row.

unsorted fixed array | sorted fixed array | unsorted dynamic array | sorted dynamic array | unsorted LL | sorted LL | balanced BST | |
---|---|---|---|---|---|---|---|

search | O(n) | O(lg n) |
O(n) | O(lg n) |
O(n) | O(n) | O(lg n) |

insert | O(1) | O(n) | O(n) | O(n) | O(1) | O(n) | O(lg n) |

delete | O(n) | O(n) | O(n) | O(n) | O(n) | O(n) | O(lg n) |

successor | O(n) | O(lg n) |
O(n) | O(lg n) |
O(n) | O(n) | O(lg n) |

predecessor | O(n) | O(lg n) |
O(n) | O(lg n) |
O(n) | O(n) | O(lg n) |

minimum | O(n) | O(1) | O(n) | O(1) | O(n) | O(1) | O(lg n) |

maximum | O(n) | O(1) | O(n) | O(1) | O(n) | O(n) | O(lg n) |

Outline prepared by Sascha Strand from:

S.S. Skiena,

The Algorithm Design Manual, 2nd ed., chapter 3

Springer-Verlag London Limited 2008

Image credits:

Victor S.Adamchik, CMU, 2009 https://www.cs.cmu.edu/~adamchik/15-121/lectures/Trees/trees.html