Lecture 22: ArrayLists
Binary search over sorted ArrayLists, sorting ArrayLists
In the last lecture we began implementing several functions over ArrayLists as methods in a helper utility class. We continue that work in this lecture, designing methods to find an item in an ArrayList matching a predicate, and to sort an ArrayList according to some comparator.
22.1 Finding an item in an arbitrary ArrayList
// In ArrayUtils <T> ??? find(ArrayList<T> arr, IPred<T> whichOne) { ??? }
// In ArrayUtils // Returns the index of the first item passing the predicate, // or -1 if no such item was found <T> int find(ArrayList<T> arr, IPred<T> whichOne) { ??? }
// In ArrayUtils // Returns the index of the first item passing the predicate at or after the // given index, or -1 if no such item was found <T> int findHelp(ArrayList<T> arr, IPred<T> whichOne, int index) { if (whichOne.apply(arr.get(index)) { return index; } else { return findHelp(arr, whichOne, index + 1); } }
Do Now!
What’s wrong with this code?
// In ArrayUtils // Returns the index of the first item passing the predicate at or after the // given index, or -1 if no such item was found <T> int findHelp(ArrayList<T> arr, IPred<T> whichOne, int index) { if (index >= arr.size()) { return -1; } else if (whichOne.apply(arr.get(index)) { return index; } else { return findHelp(arr, whichOne, index + 1); } }
Do Now!
What would happen if we had used > instead of >=?
22.2 Finding an item in a sorted ArrayList – version 1
Suppose we happen to know that our ArrayList contains items that are comparable, and that the ArrayList itself is sorted. Can we do better than blindly scanning through the entire ArrayList? For concreteness, let’s assume our ArrayList is an ArrayList<String> and we’ll use the built-in comparisons on Strings. We’ll revisit this decision after we’ve developed the method, and generalize it to arbitrary element types.
0 1 2 3 4 5 6 7 8 [apple, banana, cherry, date, fig, grape, honeydew, kiwi, watermelon]
We know that words beginning with ‘g’ are not likely to appear at the very front of the dictionary, nor are they likely to appear at the back. Instead we start our search somewhere in the middle of the dictionary. In this case, the middle of our dictionary is index 4, “fig”. Because the dictionary is alphabetized, and “grape” comes after “fig” in the alphabet, we now know that all indices of 4 and below will definitely not contain the word we seek. Instead, we turn our attention to indices 5 (which is one more than the middle index, 4+1) through 8 (our upper bound on which indices might contain our word).
We could begin blindly scanning through all those items (and indeed, in this particular example, we’d luckily find our target on the very next try!), but our first approach of checking the “middle” index and eliminating half the dictionary in one shot worked so well; let’s try it again. This time, the middle index is 6 (or 7; either will work, but since indices must be integers, we will use integer division, allowing Java to truncate any fractional part and we’ll get 6 as our answer), “honeydew”. Since “grape” precedes “honeydew”, we now know that indices 6 and up will definitely not contain the word we seek. So we continue with indices 5 (our lower bound) through 5 (which is one less than the middle index, 6-1).
Happily, index 5 contains “grape”, so we return 5 as our answer.
Do Now!
What indices would we check if we were searching for “blueberry”?
Once again, we consider the entire ArrayList, from index 0 through index 8, and start our search at the middle index 4, “fig”, which is greater than our target word. So we eliminate indices 4 and up, and focus on indices 0 (our lower bound on where to find the word) through 3 (which is 4-1).
Our middle index is 2, corresponding to “cherry”, which is greater than “blueberry”, so we eliminate indices 2 and up, and focus on indices 0 (our lower bound) through 1 (which is 2-1).
Now our middle index is 0, “apple”, which is less than our target, so we eliminate index 0, and focus on indices 1 (which is 0+1) through 1 (our upper bound).
Index 1 contains “banana”, which is less than our target, so we eliminate index 1, and focus on indices 2 (which is 1+1) through 1 (our upper bound).
Now our bounds have crossed: our lower bound is greater than our upper bound, so there are no possible words in the dictionary that might be our target. We must not have the target word in our ArrayList; we therefore return -1.
// In ArrayUtils // Returns the index of the target string in the given ArrayList, or -1 if the string is not found // Assumes that the given ArrayList is sorted aphabetically int binarySearch(ArrayList<String> strings, String target) { ??? }
// In ArrayUtils // Returns the index of the target string in the given ArrayList, or -1 if the string is not found // Assumes that the given ArrayList is sorted aphabetically int binarySearchHelp_v1(ArrayList<String> strings, String target, int lowIdx, int highIdx) { int midIdx = (lowIdx + highIdx) / 2; if (target.compareTo(strings.get(midIdx)) == 0) { return midIdx; // found it! } else if (target.compareTo(strings.get(midIdx)) > 0) { return this.binarySearchHelp_v1(strings, target, midIdx + 1, highIdx); // too low } else { return this.binarySearchHelp_v1(strings, target, lowIdx, midIdx - 1); // too high } }
Do Now!
What’s wrong with this code?
// In ArrayUtils // Returns the index of the target string in the given ArrayList, or -1 if the string is not found // Assumes that the given ArrayList is sorted aphabetically int binarySearchHelp_v1(ArrayList<String> strings, String target, int lowIdx, int highIdx) { int midIdx = (lowIdx + highIdx) / 2; if (lowIdx > highIdx) { return -1; // not found } else if (target.compareTo(strings.get(midIdx)) == 0) { return midIdx; // found it! } else if (target.compareTo(strings.get(midIdx)) > 0) { return this.binarySearchHelp_v1(strings, target, midIdx + 1, highIdx); // too low } else { return this.binarySearchHelp_v1(strings, target, lowIdx, midIdx - 1); // too high } }
Do Now!
What would happen if we didn’t add or subtract 1 from midIdx in the recursive calls?
We start the search between indices 0 and 8. The middle index is 4, and “fig” is bigger than “clementine”, so we search from the lower bound to the middle index.
We search between indices 0 and 4. The middle index is 2, and “banana” is smaller than “clementine”, so we search from the middle index to the upper bound.
We search between indices 2 and 4. The middle index is 3, and “cherry” is smaller than “clementine”, so we search from the middle index to the upper bound.
We search between indices 3 and 4. The middle index is 3, and “cherry” is smaller than “clementine”, so we search from the middle index to the upper bound.
We search between indices 3 and 4...
Do Now!
What would happen if our exit condition were if (lowIdx >= highIdx)...?
// In ArrayUtils int binarySearch_v1(ArrayList<String> strings, String target) { return this.binarySearchHelp_v1(strings, target, 0, strings.size() - 1); }
22.3 Finding an item in a sorted ArrayList – version 2
Functionally, the code above works great: we’ve covered all cases, and it computes the correct answer. Aesthetically, though, it’s a bit...fiddly. All those adding and subtracting 1s from the indices is tricky to get right, and if we miss even one of them, our code could loop indefinitely. Perhaps there’s a cleaner, less brittle way we could organize our code to avoid these.
Recall our discussions from Fundies I about semi-open intervals: a semi-open interval \([m, n)\) consists of all numbers \(x\) such that \(m \leq x < n\), i.e. it includes \(m\) (and so is “closed” on the left) and excludes \(n\) (and so is “open” on the right). As a degenerate case, the interval \([m, m)\) is empty, because it must both include and exclude its edge values. How might we use this concept in our binary search?
Do Now!
What kind of intervals were we using in version 1 of our binary search code?
We never actually stated explicitly what lowIdx and highIdx meant in our code above! We just blindly manipulated them arithmetically, but never specifically gave them an interpretation. We can infer their meaning by looking at the initial call to binarySearchHelp_v1 in binarySearch_v1 itself: we pass in 0 for the lower bound, and strings.size() - 1 for the upper bound. Apparently, the lower bound means the lowest possible valid index where the data could be found, and the upper bound means the highest possible valid index where the data could be found. Because lowIdx and highIdx are inclusive bounds, they represent a closed interval.
Ironically, the mathematical terminology here is to say that closed intervals are not “closed under splitting.” Further ironically, semi-open intervals are “closed under splitting.” Mathematicians overload the term “closed” with multiple meanings.
Do Now!
Confirm this —use the definition of semi-open above.
// In ArrayUtils // Returns the index of the target string in the given ArrayList, or -1 if the string is not found // Assumes that the given ArrayList is sorted aphabetically // Assumes that [lowIdx, highIdx) is a semi-open interval of indices int binarySearchHelp_v2(ArrayList<String> strings, String target, int lowIdx, int highIdx) { int midIdx = (lowIdx + highIdx) / 2; if (lowIdx ??? highIdx) { return -1; // not found } else if (target.compareTo(strings.get(midIdx)) == 0) { return midIdx; // found it! } else if (target.compareTo(strings.get(midIdx)) > 0) { return this.binarySearchHelp_v2(strings, target, midIdx ???, highIdx); // too low } else { return this.binarySearchHelp_v2(strings, target, lowIdx, midIdx ???); // too high } }
We need a base case to determine when there are no valid indices left to check. This now falls out of the definition of semi-open intervals: the interval is empty when lowIdx >= highIdx.
Otherwise we split the interval in half. If the target is too high, then the midIdx is too big. We need to exclude it in the recursive call, and since the interpretation of the high index is that it’s excluded, we can simply pass midIdx directly, with no subtracting 1.
If the target is too low, then the midIdx is too small. We can exclude it from the recursive call by adding 1 to it. Sadly, this addition is necessary and can’t be eliminated, because indices are integers, not reals, and so we run the risk of infinitely recuring when computing midIdx that we get the exact same numbers we started with.
Do Now!
Suppose we didn’t add 1 in the last case. Construct a test case that causes the search to recur forever.
// In ArrayUtils // Returns the index of the target string in the given ArrayList, or -1 if the string is not found // Assumes that the given ArrayList is sorted aphabetically // Assumes that [lowIdx, highIdx) is a semi-open interval of indices int binarySearchHelp_v2(ArrayList<String> strings, String target, int lowIdx, int highIdx) { int midIdx = (lowIdx + highIdx) / 2; if (lowIdx >= highIdx) { return -1; // not found } else if (target.compareTo(strings.get(midIdx)) == 0) { return midIdx; // found it! } else if (target.compareTo(strings.get(midIdx)) > 0) { return this.binarySearchHelp_v2(strings, target, midIdx + 1, highIdx); // too low } else { return this.binarySearchHelp_v2(strings, target, lowIdx, midIdx); // too high } }
// In ArrayUtils int binarySearch_v2(ArrayList<String> strings, String target) { return this.binarySearchHelp_v2(strings, target, 0, strings.size()); }
22.4 Generalizing to arbitrary element types
// In ArrayUtils <T> int gen_binarySearch_v2(ArrayList<T> arr, T target, IComparator<T> comp) { return this.gen_binarySearchHelp_v2(arr, target, comp, 0, arr.size()); } <T> int gen_binarySearchHelp_v2(ArrayList<T> arr, T target, IComparator<T> comp, int lowIdx, int highIdx) { int midIdx = (lowIdx + highIdx) / 2; if (lowIdx >= highIdx) { return -1; } else if (comp.compare(target, strings.get(midIdx)) == 0) { return midIdx; } else if (comp.compare(target, strings.get(midIdx)) > 0) { return this.gen_binarySearchHelp_v2(strings, target, comp, midIdx + 1, highIdx); } else { return this.gen_binarySearchHelp_v2(strings, target, comp, lowIdx, midIdx); } }
22.5 Sorting an ArrayList
0 1 2 3 4 5 6 7 8 [kiwi, cherry, apple, date, banana, fig, watermelon, grape, honeydew]
0 1 2 3 4 5 6 7 8 [apple, cherry, kiwi, date, banana, fig, watermelon, grape, honeydew]
Do Now!
How did we decide that “apple” was the appropriate replacement for “kiwi”?
0 1 2 3 4 5 6 7 8 [apple, banana, kiwi, date, cherry, fig, watermelon, grape, honeydew]
Do Now!
How did we decide that “banana” was the appropriate replacement for “cherry”?
0 1 || 2 3 4 5 6 7 8 [apple, banana,|| kiwi, date, cherry, fig, watermelon, grape, honeydew] SORTED <--++--> NOT YET SORTED
MIN 0 1 || 2 3 4 5 6 7 8 [apple, banana,|| kiwi, date, cherry, fig, watermelon, grape, honeydew] SORTED <--++--> NOT YET SORTED Swap items at index 2 and index 4... 0 1 2 || 3 4 5 6 7 8 [apple, banana, cherry,|| date, kiwi, fig, watermelon, grape, honeydew] SORTED <--++--> NOT YET SORTED
// In ArrayUtil // EFFECT: Sorts the given list of strings alphabetically void sort(ArrayList<String> arr) { this.sortHelp(arr, 0); // (1) } // EFFECT: Sorts the given list of strings alphabetically, starting at the given index void sortHelp(ArrayList<String> arr, int minIdx) { if (minIdx >= arr.size()) { // (2) return; } else { // (3) int idxOfMinValue = ...find minimum value in not-yet-sorted part... this.swap(arr, minIdx, idxOfMinValue); this.sortHelp(arr, minIdx + 1); // (4) } }
// In ArrayUtil // EFFECT: Sorts the given list of strings alphabetically void sort(ArrayList<String> arr) { for (int idx = 0; // (1) idx < arr.size(); // (2) idx = idx + 1) { // (4) // (3) int idxOfMinValue = ...find minimum value in not-yet-sorted part... this.swap(arr, minIdx, idxOfMinValue); } }
A for loop consists of four parts, which are numbered here (and their corresponding parts are numbered in the recursive version of the code). First is the initialization statement, which declares the loop variable and initializes it to its starting value. This is run only once, before the loop begins. Second is the termination condition, which is checked before every iteration of the loop body. As soon as the condition evaluates to false, the loop terminates. Third is the loop body, which is executed every iteration of the loop. Fourth is the update statement, which is executed after each loop body and is used to advance the loop variable to its next value. Read this loop aloud as “For each value of idx starting at 0 and continuing while idx < arr.size(), advancing by 1, execute the body.”
for (int idx = bigNumber; idx >= smallNumber; idx = idx - 1) { ... }
for (int idx = smallOddNumber; idx < bigNumber; idx = idx + 2) { ... }
Exercise
Practice using the counted-for loop: design a method
<T> ArrayList<T> interleave(ArrayList<T> arr1, ArrayList<T> arr2) that takes two ArrayLists of the same size, and produces an output ArrayList consisting of one item from arr1, then one from arr2, then another from arr1, etc.Design a method
<T> ArrayList<T> unshuffle(ArrayList<T> arr) that takes an input ArrayList and produces a new list containing the first, third, fifth ... items of the list, followed by the second, fourth, sixth ... items.
22.6 Finding the minimum value
Exercise
Design the missing method to finish the sort method above: this method should find the minimum value in the not-yet-sorted part of the given ArrayList<String>.