Data Structures
Part I of II
Intro
+-----------------------------------+
| applications |
| databases |
+-----------------------------------+ +------+
| dictionaries (associative arrays) | | |
| priority queues | | |
+-----------------------------------+------------+ ADTs |
| stacks, queues | containers | |
| +------------+ |
| linked lists, trees | linked | |
| +------------+------+
| arrays, hash tables | contiguous |
+-----------------------------------+------------+
| OS memory abstraction |
+-----------------------------------+
The horizontal borders represent layers of abstraction on which structures above are built. We're focusing on data structures optimized for storage and retrieval.
Part I notes
- arrays
- linked lists
- stacks & queues
- trees
- priority queues
Part II notes
- hash tables
- dictionaries
- block chain
Contiguous vs. Linked Data Structures
Contiguous data structures are single slabs of memory
- arrays
- matrices
- heaps
- hash tables
Linked data structures are chunks of memory bound together by pointers
- lists
- trees
- graph adjacency lists
Arrays
Benefits of fixed-size array:
- constant-time access given the index
- space efficiency
- memory locality increases cache hit rate
Downsides of fixed-size array:
- fixed in size
Efficiency of unsorted fixed-size array:
efficiency | ||
---|---|---|
search | O(n) | |
insert | O(1) | until it fills up |
delete | O(n) | find target item, move the last item into the gap created |
successor | O(n) | |
predecessor | O(n) | |
minimum | O(n) | |
maximum | O(n) |
- Binary search
- increase performance of all retrieval operations
- decrease performance of all modifications
Efficiency of sorted, fixed-size array:
efficiency | ||
---|---|---|
search | O(lg n) | |
insert | O(n) | find the right spot and scoot everything else over |
delete | O(n) | find item, delete, scoot everything over |
successor | O(lg n) | |
predecessor | O(lg n) | |
minimum | O(1) | |
maximum | O(1) |
- Efficiencies assume:
- we're given the data structure and target value but not index
- we know the length of the array
Dynamic Arrays
- Start with an array of size m
- When it's full, create a new array of size 2m
Copy contents of initial array to new array and free initial array
Compared to fixed-size array:
- can grow indefinitely
- could shrink if implemented to do so
- some insertions will take O(1) time and some will require running the O(n) re-size operation
Amortization of the re-size operation
- ad + mort- -> 'until death'
- how many times would the first element need to be recopied after n insertions?
- once after the first, second, fourth, eighth, insertion...
- lg n doublings
- elements farther down the array will be copied fewer times
- half the elements move once, a quarter move twice, ...
- total number of movements m in array of size n is 2n, which is in O(n)
- this total cost of O(n) movements can be spread out over all n elements for an amortized time of O(n/n) or O(1) per element
Efficiency of unsorted dynamic array:
efficiency | ||
---|---|---|
search | O(n) | |
insert | O(1) | O(n) re-size operation amortized over n elements |
delete | O(n) | |
successor | O(n) | |
predecessor | O(n) | |
minimum | O(n) | |
maximum | O(n) |
Efficiency of sorted dynamic array:
efficiency | ||
---|---|---|
search | O(lg n) | |
insert | O(n) | at most n operations to make space for the insert |
delete | O(n) | remove item and move subsequent items over to fill gap |
successor | O(lg n) | |
predecessor | O(lg n) | |
minimum | O(1) | |
maximum | O(1) |
Note that this is the same as the sorted, fixed-size array.
Linked Structures
- Benefit over dynamic array is its modularity
- more complex data structures can be readily assembled and reconfigured with pointers
- efficiency involves time/space trade-off on insert, otherwise the same as a dynamic array
Efficiency of unsorted linked list:
efficiency | |
---|---|
search | O(n) |
insert | O(1) |
delete | O(n) |
successor | O(n) |
predecessor | O(n) |
minimum | O(n) |
maximum | O(n) |
Efficiency of sorted linked list:
efficiency | ||
---|---|---|
search | O(n) | |
insert | O(n) | |
delete | O(n) | |
successor | O(n) | |
predecessor | O(n) | |
minimum | O(1) | if ascending order |
maximum | O(n) |
Stacks & Queues
Container: a data structure that permits storage and retrieval of data items independent of content
- distinguished by the retrieval order they support
- for these two "most important" types of containers, retrieval depends on insertion order
- FIFO
- LIFO
- Remember "F" in FIFO for "fair" as a human waiting in line.
Stacks and queues have the same average wait time
- Queues minimize the maximum wait time
- Stacks are simpler to implement
- depending on implementation, they could also be more efficient
Trees
Wikipedia lists 99 ADTs categorized as trees
- parse tree
- decision tree
- AVL tree
- red-black tree
- k-ary tree
- ...
Binary Search Trees
- Sorted arrays provide fast search
- Linked lists provide flexible update
- We want to use binary search on linked lists
- Binary search trees implement exactly that
- Tree key terms and definitions
- node: a point at which pathways branch or end of a path
- leaf: a node with no children, end of a path
- fringe: all the leaves
- edge: connection between two nodes
- forest: a set of disjoint trees
describing node position
- level: level of a node is 1 + |edges b/w node and root|
- height: height of a node is |edges b/w node and farthest leaf|
- depth: depth of a node is |edges b/w node and root|
- node with value 14 is at level 3, with a height of 1 and depth of 2
binary tree is
- a single node or
- a root node with another binary tree on its left and/or right
binary search tree is
- binary tree with a value at each node
- all values in the tree to a node's left are less than that node's value
- all values in the tree to a node's right are greater than that node's value
- there are no duplicates (though you could implement a counter at each node)
Full and complete
- A binary tree is full if every node has zero or two children.
- A binary tree is complete if every level is full except the bottom level which is filled from left to right.
- traversal
- breadth first
- depth first
- pre-order: parent, left, right
- in-order: left, parent, right
- post-order: left, right, parent
- note pre- and post- relative to parent placement
- insert node: always at a leaf
- delete node
- at leaf, just remove leaf
- node with one child, replace with child
- node with two children
- greatest element in left subtree or
- least element in right subtree
- Degeneration
- de- + genus -> 'not of its kind'
- creating a BST from simple insertions on sorted list will create a list
- heights over a single data set can range from n to lg n
- when balanced, h = ceil(lg n)
- we guarantee this by using insert operations from self-balancing trees.
- AVL tree
- red-black tree
Efficiency of BSTs:
efficiency | ||
---|---|---|
search | O(h) | |
insert | O(h) | |
delete | O(h) | two O(h) operations |
successor | O(h) | |
predecessor | O(h) | |
minimum | O(h) | |
maximum | O(h) |
Priority Queues
- A queue where every element has a priority value
- enqueue will insert based on priority into the queue
- dequeue will remove the item with the highest priority
Efficiency of priority queues:
unsorted array | sorted array | balanced tree | |
---|---|---|---|
insert | O(1) | O(n) | O(lg n) |
find minimum | O(1) | O(1) | O(1) |
delete minimum | O(n) | O(n) or O(1)* | O(lg n) |
* depending on if you remote the gap created by a deleted element
Part I Conclusion
Bold type indicates minimum value by row.
unsorted fixed array | sorted fixed array | unsorted dynamic array | sorted dynamic array | unsorted LL | sorted LL | balanced BST | |
---|---|---|---|---|---|---|---|
search | O(n) | O(lg n) | O(n) | O(lg n) | O(n) | O(n) | O(lg n) |
insert | O(1) | O(n) | O(n) | O(n) | O(1) | O(n) | O(lg n) |
delete | O(n) | O(n) | O(n) | O(n) | O(n) | O(n) | O(lg n) |
successor | O(n) | O(lg n) | O(n) | O(lg n) | O(n) | O(n) | O(lg n) |
predecessor | O(n) | O(lg n) | O(n) | O(lg n) | O(n) | O(n) | O(lg n) |
minimum | O(n) | O(1) | O(n) | O(1) | O(n) | O(1) | O(lg n) |
maximum | O(n) | O(1) | O(n) | O(1) | O(n) | O(n) | O(lg n) |
Outline prepared by Sascha Strand from:
S.S. Skiena, The Algorithm Design Manual, 2nd ed., chapter 3
Springer-Verlag London Limited 2008
Image credits:
Victor S.Adamchik, CMU, 2009 https://www.cs.cmu.edu/~adamchik/15-121/lectures/Trees/trees.html