I was curious to learn what has been updated since I took a version of COS 226 at a 🐻different university, so I am reading through the lecture slides for the Fall 2024 semester

I use these notes to keep track of my thoughts while reading and to record anything that sticks out to me as interesting.

1. Introduction

  • Bad programmers worry about the code. Good programmers worry about data structures and their relationships” by Linus Torvalds is a great quote

1. Union-find

  • The first pass of designing an algorithm can be simple concentric sweeps of the data. Optimize after fully understanding the simpler but slower algorithm.
  • When designing algorithms, consider creating an extra data structure (typically an array / list, or a map / dictionary) that will map items to information the algorithm needs or will generate (e.g. visited , parent, size )

2. Analysis of Algorithms

  • For 3SUM, a first-draft algorithm can be three nested loops. Understand the simpler, slower algorithm, then optimize
  • Algorithms topics focus on system-independent effects, while systems topics focus on system-dependent effects like hardware, systems software, and operating system. I always found the latter super-interesting
  • Reminder that (n choose k) = n!/k!(n-k)! = (n*…*(n-k+1))/k! This definition often comes in useful
  • There used to be confusion over the size of a kilobyte: 10^3 (1,000) or 2^10 (1,024). Now kilobytes are 10^3 (1,000) bytes, while kibibytes are 2^10 (1,024) bytes. The same holds for other prefixes: mega / mebi , giga / gibi
  • Modern processors in computers and mobile devices typically handle data in 64-bit (8-byte) blocks
  • In Java, bytes are 8 bits, short : 16 bits, int : 32 bits, long : 64 bits, float : 32 bits, double : 64 bits, char : 16 bit unicode, boolean s use an unspecified number of bits

3. Stacks & Queues: Resizable Arrays

  • Stacks and queues occur in real life as well as in computer science
  • The fundamental methods of a stack are push , pop , and isEmpty
  • The fundamental methods of a queue are enqueue , dequeue , and isEmpty
  • If implementing stacks and queues as an array, be aware that if the object reference in the underlying array is not explicitly set to null after popping or dequeuing the object, Java garbage collection will not understand that the object has been removed from the stack or queue
  • A common tactic in implementing a mutable data structure as a fixed-length array, is, upon overflow, to create a new array a certain percent larger than the old array. Then the old array is copied to the new larger array
    Java ArrayLists typically expand as necessary to 1.5x the previous array size
  • Python’s implementation of lists as resizeable arrays is mentioned. The time complexity of various Python list operations is examined here
  • A queue can be implemented with two stacks: a push stack and a pop stack. The items on the push stack are moved into the pop stack as necessary.
  • Generic array creation is not allowed: new T[n]. This is because once your code is compiled, type parameters like T are erased (to enable backwards compatibility and reduce runtime overhead), and cannot be used at runtime. Instead, an explicit cast is necessary: T[] a = (T[]) new Object[n])

4. Stacks & Queues: Linked Lists

  • Linked list implementations of lists are simpler to resize than array implementations, but take up more memory, and are slow when accessing random elements of the list and when accessing list in order (due to sequential locality and caching).
  • Typical linked list iteration code is for (Node n = first; n != null; n = n.next) {
  • Implementations typically need an empty-list case, where first and last pointers are null; and a normal case, where first and last pointers point to nodes
  • To implement the Iterable interface, your iterator method can either use an inner class that implements the Iterator interface, or it can just return an iterator on a instance variable of your class
  • A diagram of the Java Collections Framework is located here
  • A table summarizing the Java Collections Framework is located here, and is copied here as well
Interfaces Hash table Impl. Resizable array Impl. Tree Impl. Linked list Impl. Hash table + Linked list Impl. Synchronized Version
Set HashSet   TreeSet (implements SortedSet)   LinkedHashSet  
List   ArrayList   LinkedList    
Queue       LinkedList   BlockingQueue
Deque (implements Queue)   ArrayDeque   LinkedList    
Map HashMap   TreeMap (implements SortedMap)   LinkedHashMap ConcurrentMap

5. Elementary Sorts

  • Sorts are defined as requiring a binary relation that is transitive and comparable (a ≤ b and/or b ≤ a). I believe implementing the Java Comparable interface requires a total ordering, where it’s recommended that if ab and ba, then a.equals(b)
  • It’s claimed that Python uses first-class functions to enable sorting. However, this was changed in Python 3: “sorted() and list.sort() no longer accept the cmp argument providing a comparison function. Use the key argument instead
  • An example in the slides uses the raw type Comparable. This bypasses generic type checking and relies on runtime exceptions to determine when something goes wrong with the compareTo method.
  • The slides use a clever trick of defining each sort algorithm in terms of a common less comparison and an exch exchange methods to make analyzing running time easier.
  • Selection sort has O(n²) compares, O(n) exchanges, and O(1) extra space
  • The slides refer the fun blog post: Extra, Extra - Read All About It: Nearly All Binary Searches and Mergesorts are Broken. It’s very easy to get the fundamental algorithms wrong: a longstanding overflow bug involved using int mid = (low + high) / 2; instead of int mid = low + ((high - low) / 2);
  • 3-Sum in O(n²) time and O(1) extra space uses 3 pointers. For each i, set j to i+1 and k to end, and iterate j and k inward, marking all sums.
  • Insertion sort has worst-case O(n²) compares and exchanges
  • Binary insertion sort has worst-case O(n log n) compares and O(n²) exchanges
  • Binary search has worst-case O(n log n) compares

6. Mergesort

  • The slides provide good advice about not allocating a helper array inside a recursive method, because then you have as many arrays in memory as the depth of the recursive call stack. Instead allocate the helper array outside, and then pass it in to the recursive method.
  • Java uses quicksort on primitive types, and merge-sort variants on reference types.
  • Java has two sort methods: Arrays.parallelSort, which, on reference types, is a multithreaded merge sort variant, and Arrays.sort, which is single-threaded and uses Python’s Timsort combination of natural merge sort and insertion sort. Arrays.parallelSort is the preferred sort for large arrays, as it calls Arrays.sort for small subarrays. In addition, Collections.sort sorts a List by creating an array from the List, sorting that array with Timsort, then putting that result back into the List.
  • A clever, non-recursive variant of merge-sort is presented

7. Quicksort

  • The slides imply that newer languages and libraries often add some variation of merge / insertion / Timsort. The reasons seems to be that the latter has greatly improved with techniques like merging runs, and that the latter is easier to parallelize, which is useful with large datasets and multiple cores.
  • A median can be estimated by picking 3 items at random and calculating their median
  • The partition algorithm from quicksort can be used to find an item of a certain rank (e.g. median). This can be done in a while loop without recursion.
  • Java has two sort methods: Arrays.parallelSort and Arrays.sort.
    • For primitive types, they appear to be identical and both use a quicksort variant.
    • For reference types. Arrays.parallelSort is a multithreaded merge sort variant, while Arrays.sort is single-threaded and uses Python’s Timsort combination of natural merge sort and insertion sort.
    • Arrays.parallelSort is the preferred sort for large arrays, as it calls Arrays.sort for small subarrays.
  • In addition, Collections.sort sorts a List by creating an array from the List, sorting that array with Timsort, then putting that result back into the List.
  • The slides give the good advice of using the sorts built into the language libraries unless you’re absolutely sure that you need a custom solution. This advice is useful for most algorithms built into language libraries, as the authors of those libraries usually have detailed understanding of the internals of the languages, compilers, and interpreters.

8. Priority Queues

  • Java has two priority queues, the non-thread-safe PriorityQueue, and the thread-safe PriorityBlockingQueue. For PriorityQueue, “this implementation provides O(log(n)) time for the enqueuing and dequeuing methods (offer, poll, remove() and add); linear time for the remove(Object) and contains(Object) methods; and constant time for the retrieval methods (peek, element, and size)
  • Heapsort that uses max-heaps sort from lowest to highest, while heapsorts that use min-heaps sort from highest to lowest
  • There’s a good chart in the slides that’s useful enough to partially replicate here:
  in-place? stable? remarks
selection yes   n exchanges
insertion yes yes use for small n or partially sorted
merge   yes Θ(n log n) guarantee
timsort   yes improves merge sort when pre-existing order
quick yes   Θ(n log n) probabilistic guarantee; fastest in practice
heap yes   Θ(n log n) guarantee
? yes yes holy sorting grail

9. Elementary Symbol Tables

  • The slides include a quote recommending failing fast rather than silently accepting inputs that may cause problems later
  • Another way to build a counter is to use the Map.merge method: map.merge(key, 0, (count, _) -> count+1)
  • Java has ordered symbol tables, primarily using red-black trees

9. Binary Search Trees

  • Tree algorithms are naturally written recursively, but can then be transformed into an iterative version for greater speed and less memory usage. Sometimes you can make the algorithm tail recursive, allowing an easy transition to a while loop. A typical iterative, not recursive, tree algorithm pattern is Node n = root; while (n != null) { ... }. If you can’t make the algorithm tail-recursive, then you can use an explicit stack to hold the work still to be done

10. Balanced Search Trees

13. Geometric Applications of Binary Search Trees

  • I don’t know why this is the case, but when I write tree algorithms in particular, I assume they need to be more complicated then they really are. So I write a complicated first draft, and then simplify as I realize what is going on
  • The algorithms are described at a high-level, because the algorithms and data structures for computational geometry are too complex for a single lecture. The recommended course was last taught in 2022
  • Looking at the code for the flocking boids, it doesn’t appear that the k from the “move toward the center of mass of k nearest boids” rule is the same as the k from the “point away from k nearest boids” rule

14. Hash Tables

  • I believe the Object.hashCode method returns a representation of the memory location of the object. This limits the Object.equals method to “are these two things referencing the same underlying object?” (x == y), because if two objects are equals, then they must have the same hashCode. So user-defined classes should override the hashCode method so they can have meaningful equals methods
  • I was confused by the line “if used only least significant 32 bits [of Double.doubleToLongBits()], all integers between −2^21 and 2^21 would have same hash code (0)”. I read the documentation and remembered how, unlike the bit representation of an integer (3 is 0011, 15 is 1111) where the bits flow from least-significant to most-significant as the number rises, floating point numbers work differently.
    To a very, very rough approximation, floating point numbers are repesented as 1.(fraction) x 2^(biased exponent). So for the integer 3 (1.1000… * 2^1), the fraction is 1000…., while for 15 (1.1110… * 2^3), the fraction is 1110… So as the integers (represented as floating point numbers) rise, the bits flow from the most-significant to least-significant bit. So now I understand the initial line in the slides
  • There are three methods that help users quickly build hashCode methods for their classes:
    • Arrays.hashCode takes an array of primitive or reference values, hashes the values, then combines them into a hashCode
    • Objects.hash is a varargs version of Arrays.hashCode that makes it easy to generate hashCode methods for a user defined classes by passing in the fields of the class into the method
    • Arrays.deepHashCode is a deep version of the shallow Arrays.hashCode for situations where the array to be hashed contains arrays
  • A hashCode in Java is a signed integer, so take particular care when converting it into an array index
  • The great Donald Knuth quote is presented: “We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%.” If you look up the quote on the internet, some sources say Donald Knuth, some say Donald Knuth quoting Tony Hoare, and some say Donald Knuth quoting Tony Hoare quoting Donald Knuth 🔁

15. Graphs and Digraphs I

  • A path is defined as a “sequence of vertices connected by edges, with no repeated edges.” This is slightly different from how I originally learned it, which was no repeated vertices (and thus no repeated edges). Wikipedia says both definitions are fine as long as you’re clear which you are using
  • The Java implementation of graphs in the slides uses a bag for each vertex to record its adjacent vertices. Bags, which allow duplicate entries, are not in the Java libraries. The difference between a bag and a Set is that a bag can contain duplicate items. Replacing the bag with Sets disallows having multiple edges between the same two vertices, but I don’t think this is a common case. In any case, if that was important, you could create a Map for each vertex a, that mapped each neighbor vertex b to the number of edges between a and b
  • The standard depth-first search algorithm is presented: “Mark vertex v. Recursively visit all unmarked vertices w adjacent from v”. For marking vertex v, you can use a visited Set
  • The Wikipedia version is the following:
    procedure DFS(G, v) is
    label v as discovered
    for all directed edges from v to w that are in G.adjacentEdges(v) do
      if vertex w is not labeled as discovered then
        recursively call DFS(G, w)
    
  • The Wikipedia iterative version is the following:
    procedure DFS_iterative(G, v) is
    let S be a stack
    S.push(v)
    while S is not empty do
      v = S.pop()
      if v is not labeled as discovered then
        label v as discovered
        for all edges from v to w in G.adjacentEdges(v) do
          if w is not labeled as discovered then
            S.push(w)
    
  • In the above Wikipedia pseudocode, I wouldn’t use the term “discovered,” as the vertices are marked when the algorithm gets to them, not when they are discovered as a neighbor of the current vertex.
  • In all versions of depth-first search, vertices must be marked after the algorithm processes them, not when they are first discovered as a neighbor and placed in the stack. The latter is how breadth-first search marks vertices. If, in depth-first search, vertices are marked when they are first discovered, then the algorithm will not process the vertices in the correct, depth-first order.

16. Graphs and Digraphs II

  • The standard breadth-first search algorithm is presented: “Add vertex s to FIFO queue and mark s. Repeat until the queue is empty: remove the least recently added vertex v. For each unmarked vertex w adjacent from v: add w to queue and mark”.
  • The Wikipedia version is the following:
    procedure BFS(G, root) is
    let Q be a queue
    label root as explored
    Q.enqueue(root)
    while Q is not empty do
      v := Q.dequeue()
      for all edges from v to w in G.adjacentEdges(v) do
        if w is not labeled as explored then
          label w as explore
          Q.enqueue(w)
    
  • In the above Wikipedia pseudocode, I wouldn’t use the term “explored,” as the vertices are marked when they are discovered as a neighbor of the current vertex, not when the algorithm processes them
  • In breadth-first search, vertices are marked right away, when they are first discovered as a neighbor and placed in the queue. This makes sure the algorithm doesn’t place vertices in the queue multiple times
  • The topological sort algorithm works, because if ab in a directed acyclic graph, the b will always come before the a in a depth-first search postorder, so the a will come before the b when the depth-first search postorder is reversed. As Wikipedia puts it, “each node v is visited [by the postorder] only after all its dependencies are visited”. A reverse postorder is not the same as a preorder, where a and b can appear in any order depending on the rest of the graph
  • There’s several good tables in the slides, which I combine and partially reproduce here:
graph Breadth-first Search Depth-first Search Problem
s-t path yes yes Find a path between s and t
shortest s-t path yes   Find a path with the fewest edges between s to t
shorted directed cycle yes   Find the shorted cycle in a directed graph
Euler cycle   yes Find a cycle that uses each edge exactly once
Hamilton cycle     Find a cycle that uses each vertex exactly once
bipartiteness ([no] odd cycles) yes yes  
connected components yes yes Find connected components
strong components   yes every vertex is reachable from every other vertex
planarity   yes Draw in the plane with no crossing edges
graph isomorphism     Find an isomorphism between two graphs
single-source reachability yes yes  
topological sort   yes  

17. Minimum Spanning Trees

  • A general algorithm is presented to find minimum spanning trees: “T = ∅. Repeat until T is a spanning tree: Find a cut in G. e ← min-weight crossing edge. T ← T ∪ { e }
  • The greedy Kruskal’s algorithm is presented: “Consider edges in ascending order of weight. Add next edge to T unless doing so would create a cycle.” The edges are sorted by weight, and union-find is used to examine and combine vertex sets. O(E log E) time, O(E) space.
  • The Wikipedia pseudocode for Kruskal’s algorithm is as follows:
    algorithm Kruskal(G) is
    F:= ∅
    for each v in G.V do
      MAKE-SET(v)
    for each {u, v} in G.E ordered by weight({u, v}), increasing do
      if FIND-SET(u) ≠ FIND-SET(v) then
        F := F ∪ { {u, v} }
        UNION(FIND-SET(u), FIND-SET(v))
    return F
    
  • The greedy Prim’s algorithm is presented: “Start with vertex 0 and grow tree T. Repeat until V − 1 edges: add to T the min-weight edge with exactly one endpoint in T.” A priority queue is used to keep track of edges leading out of tree. O(E log V) time, O(V) space

18. Shortest Paths

  • A general shortest path algorithm is presented: “For each vertex v: distTo[v] = ∞. For each vertex v: edgeTo[v] = null. distTo[s] = 0. Repeat until distTo[v] values converge: Relax any edge
  • The dynamic programming Bellman-Ford algorithm is presented: “For each vertex v: distTo[v] = ∞. For each vertex v: edgeTo[v] = null. distTo[s] = 0. Repeat V-1 times: Relax each edge”. Negative cycles are not allowed. O(E V) time.
  • The greedy Dijkstra’s algorithm is presented: “For each vertex v: distTo[v] = ∞. For each vertex v: edgeTo[v] = null. T = ∅. distTo[s] = 0. Repeat until all vertices are marked: Select unmarked vertex v with the smallest distTo[] value. Mark v. Relax each edge incident from v.” A priority queue is used to track unmarked vertex v with the smallest distTo[] values. Negative weights are not allowed. O(E log V) time, so faster than Bellman-Ford.

19. Dynamic Programming

  • Make sure you don’t over-optimize, and you store enough information in your dynamic programming algorithm to trace back what path you took
  • Having a dynamic programming solution doesn’t necessarily make it fast, like change-making and the knapsack problem, which run in pseudo-polynomial time (polynomial in the numeric value, not the length, of the input)
  • Make sure the dependency order is dealth with correctly in your algorithm, and that there are no cycles, so the recurrence will terminate

20. Maxflows & Mincuts

  • The greedy Floyd-Fulkerson algorithm is presented: “Start with 0 flow. While there exists an augmenting path: Find an augmenting path P. Compute bottleneck capacity of P. Update flow on P by bottleneck capacity.” The Edmonds-Karp algorithm uses breadth-first search to find the shortest path to augment.

21. Randomness

  • The binomial distribution describes the probability of getting a certain number of successes in n independent trials where the probability of success is p. Its probability mass function is given by P(X=k) = (n choose k) * p^k * (1-p)^(n-k). The expected value is represented by E(X) = n*p
  • We would like to determine the number of possible cuts in a graph with V vertices. By how power sets work, there are 2^V possible subsets of V vertices. We remove the empty and the full subset, because a cut has to have at least 1 vertex on each side, giving 2^V - 2 possible subsets. Finally, we divide that number in two, because each cut represents two subsets, giving 2^(V-1) - 1 possible cuts
  • The randomized Karger’s algorithm is presented: “Assign a random weight (uniform between 0 and 1) to each edge. Run Kruskal’s MST algorithm until 2 connected components left. Return cut defined by connected components
  • Naive shuffling (where each array item can be replaced by any other array item) is biased, as opposed to a Fisher-Yates shuffle. There are n^n possible outcomes of naive shuffling, but n! permutations. Since n! does not divide n^n evenly, the shuffle is biased

22. Multiplicative Weights

  • It’s interesting that a lecture in the second-level computer science course is dedicated to machine learning topics. That would have been unusual when I took a version of the course
  • A simplified version of the AdaBoost algorithm is presented: “Initialize a double[n] array called weights and set all values to 1/n. Repeat T times: (Train a decision stump with the input weighted according to weights. Double weight of points incorrectly labelled by the decision stump (force the algorithm to do a better job on misclassified points). Normalize weights (divide each entry by the sum of weights).) To predict on new data use all decision stumps and take majority

23. Intractability

  • The Church-Turing thesis is indirectly referenced. It is defined by Wikipedia as “a function on the natural numbers can be calculated by an effective method if and only if it is computable by a Turing machine,” and by the Stanford Encyclopedia of Philosophy as “every effective computation can be carried out by a Turing machine
  • Changes or constraints in a problem can change its computational complexity. Also, worst-case complexity may not occur on typical real-world inputs. Finally, intractable problems can have approximation algorithms

Behnam Esfahbod, CC BY-SA 3.0 <https://creativecommons.org/licenses/by-sa/3.0>, via Wikimedia Commons

Behnam Esfahbod, CC BY-SA 3.0, via Wikimedia Commons

  • An interesting thought is presented in the slides that, in real life, creating something is harder than verifying something, so it’s not surprising if that’s the case in the computational world, with P != NP
  • X reduces to Y means that if you had an algorithm that solves Y efficiently, then you could use that algorithm to solve X efficiently. Y is harder than X
  • An NP-complete problem is an NP problem where every other NP problem can be reduced to it–it is harder than other NP problems. All NP-complete problems can be be reduced to each other (in the slides: “NP-complete problems are different manifestations of the same fundamentally hard problem”), though some reductions are harder than others. To prove an NP problem is NP-complete, reduce an NP-complete problem to it–it is harder than that NP-complete problem. The usefulness of this procedure is that you then know that your problem is (probably, if P != NP) intractable
  • If there is at least one item in B for each item in A, then |A| <= |B|. One of the proofs in the slides uses this, but stated in a more complex way, so I simplify it for my own thoughts here

24. Algorithm Design

  • Greedy algorithms are described as “rarely lead to provably optimal solutions but often used anyway in practice, especially for intractable problems
  • Dynamic programming is described as “Break up problem into a series of overlapping subproblems. Build up solutions to larger and larger subproblems
  • Divide and conquer is described as “Break up problem into two or more independent subproblems. Solve each subproblem recursively. Combine solutions to subproblems to form solution to original problem. Turn brute-force Θ(n²) algorithm into Θ(n log n) one
  • A randomized algorithm is described as an “algorithm whose performance (or output) depends on the results of random coin flips