Data Structures

Part I of II

Intro

   +-----------------------------------+
   | applications                      |
   | databases                         |
   +-----------------------------------+            +------+
   | dictionaries (associative arrays) |            |      |
   | priority queues                   |            |      |
   +-----------------------------------+------------+ ADTs |
   | stacks, queues                    | containers |      |
   |                                   +------------+      |
   | linked lists, trees               | linked     |      |
   |                                   +------------+------+
   | arrays, hash tables               | contiguous |
   +-----------------------------------+------------+
   | OS memory abstraction             |
   +-----------------------------------+

The horizontal borders represent layers of abstraction on which structures above are built. We're focusing on data structures optimized for storage and retrieval.

  • Part I notes

    • arrays
    • linked lists
    • stacks & queues
    • trees
    • priority queues
  • Part II notes

    • hash tables
    • dictionaries
    • block chain

Contiguous vs. Linked Data Structures

  • Contiguous data structures are single slabs of memory

    • arrays
    • matrices
    • heaps
    • hash tables
  • Linked data structures are chunks of memory bound together by pointers

    • lists
    • trees
    • graph adjacency lists

Arrays

  • Benefits of fixed-size array:

    • constant-time access given the index
    • space efficiency
    • memory locality increases cache hit rate
  • Downsides of fixed-size array:

    • fixed in size

Efficiency of unsorted fixed-size array:

efficiency
search O(n)
insert O(1) until it fills up
delete O(n) find target item, move the last item into the gap created
successor O(n)
predecessor O(n)
minimum O(n)
maximum O(n)
  • Binary search
    • increase performance of all retrieval operations
    • decrease performance of all modifications

Efficiency of sorted, fixed-size array:

efficiency
search O(lg n)
insert O(n) find the right spot and scoot everything else over
delete O(n) find item, delete, scoot everything over
successor O(lg n)
predecessor O(lg n)
minimum O(1)
maximum O(1)
  • Efficiencies assume:
    • we're given the data structure and target value but not index
    • we know the length of the array

Dynamic Arrays

  • Start with an array of size m
  • When it's full, create a new array of size 2m
  • Copy contents of initial array to new array and free initial array

  • Compared to fixed-size array:

    • can grow indefinitely
    • could shrink if implemented to do so
    • some insertions will take O(1) time and some will require running the O(n) re-size operation
  • Amortization of the re-size operation

    • ad + mort- -> 'until death'
    • how many times would the first element need to be recopied after n insertions?
      • once after the first, second, fourth, eighth, insertion...
      • lg n doublings
    • elements farther down the array will be copied fewer times
      • half the elements move once, a quarter move twice, ...
      • total number of movements m in array of size n is 2n, which is in O(n)
      • this total cost of O(n) movements can be spread out over all n elements for an amortized time of O(n/n) or O(1) per element

Efficiency of unsorted dynamic array:

efficiency
search O(n)
insert O(1) O(n) re-size operation amortized over n elements
delete O(n)
successor O(n)
predecessor O(n)
minimum O(n)
maximum O(n)

Efficiency of sorted dynamic array:

efficiency
search O(lg n)
insert O(n) at most n operations to make space for the insert
delete O(n) remove item and move subsequent items over to fill gap
successor O(lg n)
predecessor O(lg n)
minimum O(1)
maximum O(1)

Note that this is the same as the sorted, fixed-size array.

Linked Structures

  • Benefit over dynamic array is its modularity
    • more complex data structures can be readily assembled and reconfigured with pointers
    • efficiency involves time/space trade-off on insert, otherwise the same as a dynamic array

Efficiency of unsorted linked list:

efficiency
search O(n)
insert O(1)
delete O(n)
successor O(n)
predecessor O(n)
minimum O(n)
maximum O(n)

Efficiency of sorted linked list:

efficiency
search O(n)
insert O(n)
delete O(n)
successor O(n)
predecessor O(n)
minimum O(1) if ascending order
maximum O(n)

Stacks & Queues

  • Container: a data structure that permits storage and retrieval of data items independent of content

    • distinguished by the retrieval order they support
    • for these two "most important" types of containers, retrieval depends on insertion order
      • FIFO
      • LIFO
      • Remember "F" in FIFO for "fair" as a human waiting in line.
  • Stacks and queues have the same average wait time

  • Queues minimize the maximum wait time
  • Stacks are simpler to implement
    • depending on implementation, they could also be more efficient

Trees

  • Wikipedia lists 99 ADTs categorized as trees

    • parse tree
    • decision tree
    • AVL tree
    • red-black tree
    • k-ary tree
    • ...
  • Binary Search Trees

    • Sorted arrays provide fast search
    • Linked lists provide flexible update
    • We want to use binary search on linked lists
    • Binary search trees implement exactly that

bst

  • Tree key terms and definitions
    • node: a point at which pathways branch or end of a path
    • leaf: a node with no children, end of a path
    • fringe: all the leaves
    • edge: connection between two nodes
    • forest: a set of disjoint trees
    • describing node position

      • level: level of a node is 1 + |edges b/w node and root|
      • height: height of a node is |edges b/w node and farthest leaf|
      • depth: depth of a node is |edges b/w node and root|
      • node with value 14 is at level 3, with a height of 1 and depth of 2
    • binary tree is

      • a single node or
      • a root node with another binary tree on its left and/or right
    • binary search tree is

      • binary tree with a value at each node
      • all values in the tree to a node's left are less than that node's value
      • all values in the tree to a node's right are greater than that node's value
      • there are no duplicates (though you could implement a counter at each node)
    • Full and complete

      • A binary tree is full if every node has zero or two children.
      • A binary tree is complete if every level is full except the bottom level which is filled from left to right.

full_and_complete

  • traversal
    • breadth first
    • depth first
      • pre-order: parent, left, right
      • in-order: left, parent, right
      • post-order: left, right, parent
      • note pre- and post- relative to parent placement

euler_traversal

  • insert node: always at a leaf
  • delete node
    • at leaf, just remove leaf
    • node with one child, replace with child
    • node with two children
      • greatest element in left subtree or
      • least element in right subtree

node_insert node_delete_1 node_delete_2

  • Degeneration
    • de- + genus -> 'not of its kind'
    • creating a BST from simple insertions on sorted list will create a list
    • heights over a single data set can range from n to lg n
    • when balanced, h = ceil(lg n)
    • we guarantee this by using insert operations from self-balancing trees.
      • AVL tree
      • red-black tree

Efficiency of BSTs:

efficiency
search O(h)
insert O(h)
delete O(h) two O(h) operations
successor O(h)
predecessor O(h)
minimum O(h)
maximum O(h)

Priority Queues

  • A queue where every element has a priority value
    • enqueue will insert based on priority into the queue
    • dequeue will remove the item with the highest priority

Efficiency of priority queues:

unsorted array sorted array balanced tree
insert O(1) O(n) O(lg n)
find minimum O(1) O(1) O(1)
delete minimum O(n) O(n) or O(1)* O(lg n)

* depending on if you remote the gap created by a deleted element

Part I Conclusion

Bold type indicates minimum value by row.

unsorted fixed array sorted fixed array unsorted dynamic array sorted dynamic array unsorted LL sorted LL balanced BST
search O(n) O(lg n) O(n) O(lg n) O(n) O(n) O(lg n)
insert O(1) O(n) O(n) O(n) O(1) O(n) O(lg n)
delete O(n) O(n) O(n) O(n) O(n) O(n) O(lg n)
successor O(n) O(lg n) O(n) O(lg n) O(n) O(n) O(lg n)
predecessor O(n) O(lg n) O(n) O(lg n) O(n) O(n) O(lg n)
minimum O(n) O(1) O(n) O(1) O(n) O(1) O(lg n)
maximum O(n) O(1) O(n) O(1) O(n) O(n) O(lg n)

Outline prepared by Sascha Strand from:

S.S. Skiena, The Algorithm Design Manual, 2nd ed., chapter 3
Springer-Verlag London Limited 2008

Image credits:

Victor S.Adamchik, CMU, 2009 https://www.cs.cmu.edu/~adamchik/15-121/lectures/Trees/trees.html