Lecture 25: Iterator and Iterable

8.10

Lecture 25: Iterator and Iterable

How does a for-each loop actually work?

We’ve discussed how the three loop forms in Java work to repeatedly execute some computation — either once for each item in an ArrayList, once for every value of an integer variable as it counts, or simply indefinitely many times while some condition remains true. But one of these loops should seem a bit oddly specific. While-loops execute as long as some boolean condition—any boolean condition we want—remains true. Counted-for loops count, from whatever starting value we choose to whatever ending value we choose by whatever interval we choose. But for-each loops seem to only work on ArrayLists. What makes ArrayList so special that it deserves its own special syntax in the language? No other class we have seen gets such special treatment!

In fact, ArrayList is not special, and does not get its own syntax. Instead, it is simply one particular example of a far more general notion, one that we can take advantage of for our own classes, too.

25.1 How do for-each loops actually work?

A for-each loop looks like this:

for (T t : tList) { // Assume we have an ArrayList<T> named tList ...body...
}

Recall that when we introduced for-each loops, they were a replacement for a recursive implementation of map over ArrayLists. At the time, their main appeal was the ability to iterate over each item of the list, without having to mess about with manually counting indices. But suppose Java didn’t include for-each loops, and we wanted to reimplement their behavior ourselves using a more general-purpose loop, and yet still without referring to indices as much as possible. How could that be done?

25.1.1 Introducing Iterators

Clearly we need some form of loop. We’re assuming for-each loops do not exist. And we want to avoid counting indices. That leaves just while loops as our only option. Recall the skeleton of a while loop:

...setup...
while (something is true) {
...body...
}
...use results...

While loops are good for looping as long as necessary, while there’s more to be done. For-each loops essentially keep looping while there are more items to process. So perhaps the essence of our for-each loop’s behavior, when expressed as a while loop, might look like this:

// First attempt while (hasMoreElements(tList)) {
T t = nextItemOf(tList);
...body...
}

Except that can’t really work, because the hypothetical hasMoreElements and nextItemOf functions would have to keep track of additional state between calls — or else they’d just give the same answers every time. But these two functions seem like a clean, simple way to describe the process of iterating over data: while there’s more data, get the next item and process it, then repeat.

So: we want to preserve these two functions, but we need to keep track of state to make them work properly. Accordingly, we need to create a new object, whose sole responsibility is to help us iterate over tList. Moreover, when we phrase it this way, this behavior doesn’t sound very specific to ArrayLists; we might be able to iterate over almost anything!

Do Now!
What kind of Java abstraction is most appropriate here?

The while loop sketch above will work with any object that exposes these two functions as methods. This is a promise to provide a certain set of behaviors, so we should accordingly define a new interface. This interface is called an Iterator, and it is provided for us by Java itself. Its methods are slightly renamed from the sketch above:

According to our naming conventions, it really ought to be called IIterator, but that’s a clumsy name. At least it does start with a capital I!

// Represents the ability to produce a sequence of values // of type T, one at a time interface Iterator<T> {
// Does this sequence have at least one more value? boolean hasNext();
// Get the next value in this sequence // EFFECT: Advance the iterator to the subsequent value T next();
// EFFECT: Remove the item just returned by next() // NOTE: This method may not be supported by every iterator; ignore it for now void remove();
}

The job of an Iterator is to keep track of whatever state is necessary to produce values one-at-a-time from a sequence, be it an ArrayList, an IList, a Deque, or whatever else we choose. We’ll see momentarily how to implement this interface; first let’s see how to use it.

We can refine our while loop sketch above as follows:

// Second attempt Iterator<T> listIter = new ArrayListIterator<T>(tList);
while (listIter.hasNext()) {
T t = listIter.next();
...body...
}

As long as the listIter claims it can produce another item, the loop will repeat and get the next item. Notice the protocol: we always check hasNext before calling next. This ensures that we never overshoot the end of the list.

25.1.2 Introducing Iterables

So far, so good: we can describe the behavior of a for-each loop in terms of a while loop over an Iterator. But how does Java know what kind of iterator to construct? After all, the ArrayListIterator in the sketch above really is specific to ArrayLists; if we want our for-each loops to work over other kinds of things, we need another mechanism. Let’s say that an object is iterable if there exists some Iterator implementation for it. That sounds like another interface, which is also defined for us by Java:

// Represents anything that can be iterated over interface Iterable<T> {
// Returns an iterator over this collection Iterator<T> iterator();
}

In the actual Java implementation of ArrayList, we see something like this:

class ArrayList<T> implements Iterable<T> {
... lots of other details ...
// public Iterator<T> iterator() {
return new ArrayListIterator<T>(this);
}
}

Among many other details of the implementation, ArrayList declares that it implements the Iterable interface, and therefore can be iterated over.

Now we can finally refine our while-loop sketch completely:

// Final attempt: works for *any* Iterable Iterator<T> iterator = anyIterable.iterator();
while (iterator.hasNext()) {
T t = iterator.next();
...body...
}

This is far more general than just working over ArrayLists — any class that implements Iterable can be used with a for-each loop. This is yet another example of the power of good abstractions: by recognizing the common elements of iteration and being iterable, we can create a reusable notion that works wherever we choose to implement it.

25.2 Examples of Iterators for different data types

25.2.1 Iterators for ArrayLists — counting indices

Let’s see how to implement an Iterator for an ArrayList. Our initial skeleton will simply create a class that implements the desired interface, and contains a reference to the ArrayList to be iterated over:

class ArrayListIterator<T> implements Iterator<T> {
// the list of items that this iterator iterates over ArrayList<T> items;
// the index of the next item to be returned int nextIdx;
// Construct an iterator for a given ArrayList ArrayListIterator(ArrayList<T> items) {
this.items = items;
this.nextIdx = 0;
}

public boolean hasNext() {
...
}

public T next() {
...
}

public void remove() {
throw new UnsupportedOperationException("Don't do this!");
}
}

Do Now!
Implement hasNext and next.

Remember the the protocol: we always call hasNext before calling next, so the first call to hasNext really means, “Does this sequence have at least one item?”, and the first call to next means “Give me the first item”.

(This may seem a bit confusing, and it is. It is an example of the so-called “fencepost problem”: suppose you need to put up 60 feet of fencing, with a fencepost every 10 feet. How many fenceposts are needed? Answer: seven, because you need a fencepost before the very first section of fence:

X----------X----------X----------X----------X----------X----------X

Likewise, in our Iterator protocol, there will be one more call to hasNext than there will be to next, because we call it before every call to next, and then once more to find out there are no more items.)

Accordingly, when we call hasNext, the nextIdx field will refer to the index of the next item — starting at 0 — to be returned. Therefore, there is such an item if and only if nextIdx is a valid index of the ArrayList:

// In ArrayListIterator // Does this sequence (of items in the array list) have at least one more value? public boolean hasNext() {
return this.nextIdx < this.items.size();
}

In next, we need to get the next item, and advance nextIdx to the next index to be used:

// In ArrayListIterator // Get the next value in this sequence // EFFECT: Advance the iterator to the subsequent value public T next() {
T answer = this.items.get(this.nextIdx);
this.nextIdx = this.nextIdx + 1;
return answer;
}

25.2.2 Iterators for ILists — following links

Iterators don’t have to use indices as their state to step through the contents of an ArrayList; we can use whatever state we want. Let’s try implementing an Iterator for our IList.

Do Now!
Try to implement an Iterator<T> for IList<T>. What state should you store instead of indices?

The only information in an IList is the node itself, which is either a Cons or a Empty. So our code will begin:

class IListIterator<T> implements Iterator<T> {
IList<T> items;
IListIterator(IList<T> items) {
this.items = items;
}
public boolean hasNext() {
...
}
public T next() {
...
}
public void remove() {
throw new UnsupportedOperationException("Don't do this!");
}
}

As with the previous iterator, we have a next item when we are currently pointing at a Cons node:

// In IListIterator public boolean hasNext() {
return this.items.isCons();
}

And likewise we can return the item in the current Cons for our next value. To update the iterator, we advance items to refer to the current Cons’s rest:

// In IListIterator public T next() {
ConsList<T> itemsAsCons = this.items.asCons();
T answer = itemsAsCons.first;
this.items = itemsAsCons.rest;
return answer;
}

Do Now!
Define the isCons() and asCons() methods to complete this code.

Note: Yes, isCons and asCons aren’t great methods from an object-oriented standpoint. We know that double-dispatch would be a better design here. In this case, double-dispatch is overkill, as we’d need two visitor classes — one to implement isCons and one for asCons — which seems painfully heavyweight. Besides, it is more useful to think of Iterators as being tightly connected to the data they are iterating over: if the data definition changes, the iterator will have to change to match. And so, just as we allowed function objects to see fields of their parameters despite that not being a normal part of the template, we allow iterators to determine what type their argument might be.

We can make our ILists be Iterable, too:

// Declare that every IList is an Iterable: interface IList<T> extends Iterable<T> {
... everything as before ...
}

class ConsList<T> implements IList<T> {
... everything as before ...
public Iterator<T> iterator() {
return new IListIterator<T>(this);
}
}

class MtList<T> implements IList<T> {
... everything as before ...
public Iterator<T> iterator() {
return new IListIterator<T>(this);
}
}

And with those last few definitions, we can now use ILists in for-each loops, exactly as we could with ArrayLists.

for (T item : myList) {
// iterates forward through myList ...
}

25.2.3 Iteration in multiple directions

The concept of an Iterator is very flexible. Let’s work through several examples, showing how varied they can be.

Do Now!
Define a DequeForwardIterator that advances through a Deque, just as we did above with an IList.

Do Now!
Define a DequeReverseIterator that walks backward through a Deque from the last item to the first. The code should be very nearly identical to the forward iterator.

Some data structures can meaningfully support iteration in multiple orders. However, if we choose to make those data structures Iterable, then we have to choose a default iteration order to be used with for-each loops, and construct that particular iterator in the iterator() method. For Deques, we probably would choose the forward iteration direction, as it is the “most natural”. If we want to use the reverse iterator, we’d have to explicitly write the while-loop version ourselves:

class Deque<T> implements Iterable<T> {
public Iterator<T> iterator() {
// Choose a forward iteration by default return new DequeForwardIterator<T>(this.sentinel.next);
}
// But...also provide a reverse iterator if needed Iterator<T> reverseIterator() {
return new DequeReverseIterator<T>(this.sentinel.prev);
}
}

for (T item : myDeque) {
// iterates forward through myDeque ...
}

Iterator<T> revIter = myDeque.reverseIterator();
while (revIter.hasNext()) {
// iterates backward through myDeque T item = revIter.next();
...
}

25.2.4 Iterators for Fibonacci numbers — computing items on demand

Not every iterator needs to store an actual object from which it derives its data. Iterators simply represent sequences of values, and those values might just be computed on demand. Consider the following iterator:

class FibonacciIterator implements Iterator<Integer> {
int prevVal = 0;
int curVal = 1;
// There are always more Fibonacci numbers boolean hasNext() { return true; }
// Compute the next Fibonacci number Integer next() {
int answer = this.prevVal + this.curVal;
this.prevVal = this.curVal;
this.curVal = answer;
return answer;
}
public void remove() {
throw new UnsupportedOperationException("Don't do this!");
}
}

This iterator can produce an infinitely long stream of Fibonacci numbers, without requiring an explicit list to store them all in: it just stores the two most recent values, and computes the next one as required.

25.2.5 Higher-order Iterators

Much as we can have higher-order functions (as implemented in Java by higher-order function objects), we can have higher-order iterators, that use the sequence of values from one iterator and produce a different sequence of values. For example, we can define an iterator to take every other value from an iterator:

// Represents the subsequence of the first, third, fifth, etc. items from a given sequence class EveryOtherIter<T> implements Iterator<T> {
Iterator<T> source;
EveryOtherIter(Iterator<T> source) {
this.source = source;
}
public boolean hasNext() {
// this sequence has a next item if the source does return this.source.hasNext();
}
public T next() {
T answer = this.source.next(); // gets the answer, and advances the source // We need to have the source "skip" the next value if (this.source.hasNext() {
this.source.next(); // get the next value and ignore it }
return answer;
}
public void remove() {
// We can remove an item if our source can remove the item this.source.remove(); // so just delegate to the source }
}

We can also define an iterator that takes only the first \(n\) items from another iterator:

Do Now!
Do this.

class TakeN<T> implements Iterator<T> {
Iterator<T> source;
int howMany;
int countSoFar;
TakeN(Iterator<T> source, int n) {
this.source = source;
this.howMany = n;
this.countSoFar = 0;
}
public boolean hasNext() {
...
}
public T next() {
...
}
public void remove() {
// We can remove an item if our source can remove the item this.source.remove(); // so just delegate to the source }
}

When does this TakeN iterator have a next item? Only if we have taken fewer than \(n\) items, and the source iterator has a next item:

// In TakeN: public boolean hasNext() {
return (this.countSoFar < this.howMany) && this.source.hasNext();
}

To get the next item, we delegate to the source, but we also must increment the count of items returned so far:

// In TakeN: public T next() {
this.countSoFar = this.countSoFar + 1;
return this.source.next();
}

Exercise
Define a higher-order iterator that takes two iterators and alternates items from each of them.

25.2.6 Iterators over tree-shaped data

We can even define iterators over tree-shaped data. Let’s consider binary trees. There are many plausible orders for traversing a tree. For the following example tree (with data at the nodes, and nothing at the leaves):

      A
     / \
    /   \
   B     C
  / \   / \
 D   E F   G
/ \ /\/\  / \

we have (at least) the following standard orders:

A breadth-first traversal, which walks through each level of the tree from top to bottom, left to right:
A, B, C, D, E, F, G
A post-order traversal, which recursively produces all the children of a node before producing the node itself:
D, E, B, F, G, C, A
An in-order traversal, which recursively produces the left subtree of a node, then the node, then recursively produces the right subtree:
D, B, E, A, F, C, G
A pre-order traversal (or a depth-first traversal), which produces the node, then recursively produces the left subtree of the node, then the right subtree:
A, B, D, E, C, F, G

Every one of these can be implemented by a sufficiently clever iterator. Let’s try the breadth-first traversal first. The key challenge in implementing any of these is determining what state information is needed. Unlike with lists, once we process a node we have two nodes to process afterward, and we cannot get from one to the other. So unlike list iterators, which can get away with storing just a single index or a single IList reference, we’ll have to store a whole list of references of items that we have yet to process. This is usually referred to as making a worklist, and it’s a very common algorithmic technique.

So, suppose we are given a reference to the node A above. We know that we have a next item (because we’re not at a leaf), so we produce A as our next. What should go on the worklist? We need to process B and C (in that order), so we add them to our worklist. On the subsequent call to next, we need to process B, so we produce B, and need to process D and E. We still have to process C before we get to D or E (because we are proceeding in breadth-first order), so we add D and E to the back of our worklist. Our next item to process is C, which is at the front of our worklist. It looks like we need to be able to add items to the end of our list, and remove items from the front of our list. Fortunately, we have a data structure that’s perfectly capable of such operations: a Deque! We’ll use the deque to store a list of binary-tree nodes, and process them one at a time. Our implementation of a breadth-first traversal will look like this:

class BreadthFirstIterator<T> implements Iterator<T> {
Deque<IBinaryTree<T>> worklist;
BreadthFirstIterator(IBinaryTree<T> source) {
this.worklist = new Deque<IBinaryTree<T>>();
this.addIfNotLeaf(source);
}
// EFFECT: only adds the given binary-tree if it's not a leaf void addIfNotLeaf(IBinaryTree bt) {
if (bt.isNode()) {
this.worklist.addAtTail(bt);
}
}
public boolean hasNext() {
// we have a next item if the worklist isn't empty return this.worklist.size() > 0;
}
public T next() {
// Get (and remove) the first item on the worklist -- // and we know it must be a BTNode BTNode<T> node = this.worklist.removeAtHead().asNode();
// Add the children of the node to the tail of the list this.addIfNotLeaf(node.left);
this.addIfNotLeaf(node.right);
// return the answer return node.data;
}
public void remove() {
throw new UnsupportedOperationException("Don't do this!");
}
}

We are using our Deque as a queue, where items are added at the end of the queue and removed from the front. (Think of a queue as standing in line at the supermarket: people queue up at the end of the line, and exit from the front of the line.)

Do Now!
Try implementing a PreOrderIterator for a tree. The code is very similar to BreadthFirstIterator.

Following similar reasoning as above, suppose we are given a reference to the node A. We’ll produce A as the first item, and then we need to process B and C. Next we’ll produce B, and then need to process D and E...but we must process them before we get back to C. So instead of adding the items to the tail of our Deque, we’ll add them to the front:

class PreOrderIterator<T> implements Iterator<T> {
Deque<IBinaryTree<T>> worklist;
PreOrderIterator(IBinaryTree<T> source) {
this.worklist = new Deque<IBinaryTree<T>>();
this.addIfNotLeaf(source);
}
// EFFECT: only adds the given binary-tree if it's not a leaf void addIfNotLeaf(IBinaryTree bt) {
if (bt.isNode()) {
this.worklist.addAtHead(bt); // DIFFERENT FROM ABOVE }
}
public boolean hasNext() {
// we have a next item if the worklist isn't empty return this.worklist.size() > 0;
}
public T next() {
// Get (and remove) the first item on the worklist -- // and we know it must be a BTNode BTNode<T> node = this.worklist.removeAtHead().asNode();
// Add the children of the node to the tail of the list this.addIfNotLeaf(node.right); // SWAPPED this.addIfNotLeaf(node.left); // FROM ABOVE // return the answer return node.data;
}
public void remove() {
throw new UnsupportedOperationException("Don't do this!");
}
}

We are now using our Deque as a stack, where items are pushed onto the front of the stack and also removed from the front. (Think of a stack as a pile of dishes: they are piled on top of each other, and removed from the top; the bottommost dish was the first one to be added, and the last one to be removed.) We have to swap the order of adding node.right and node.left because we need to preserve their order when we finally do remove them.

Exercise
Try implementing post-order and in-order traversals as iterators. They are somewhat subtler than the two we have done so far; in particular, figuring out what to add to the worklist is tricky.

contents ← prev up next →

	General
	Texts
	Lectures
	Syllabus
	Recitations
	Assignments
	Pair Programming Overview
	Code style
	Documentation

	Lecture 1: Data Definitions in Java
	Lecture 2: Data Definitions: Unions
	Lecture 3: Methods for simple classes
	Lecture 4: Methods for unions
	Lecture 5: Methods for self-referential lists
	Lecture 6: Accumulator methods
	Lecture 7: Accumulator methods, continued
	Lecture 8: Practice Design
	Lecture 9: Abstract classes and inheritance
	Lecture 10: Customizing constructors for correctness and convenience
	Lecture 11: Defining sameness for complex data, part 1
	Lecture 12: Defining sameness for complex data, part 2
	Lecture 13: Abstracting over behavior
	Lecture 14: Abstractions over more than one argument
	Lecture 15: Abstracting over types
	Lecture 16: Visitors
	Lecture 17: Mutation
	Lecture 18: Mutation inside structures
	Lecture 19: Mutation, aliasing and testing
	Lecture 20: Mutable data structures
	Lecture 21: Array Lists
	Lecture 22: Array Lists
	Lecture 23: For-each loops and Counted-for loops
	Lecture 24: While loops
	Lecture 25: Iterator and Iterable
	Lecture 26: Hashing and Equality
	Lecture 27: Introduction to Big-O Analysis
	Lecture 28: Quicksort and Mergesort
	Lecture 29: Priority Queues and Heapsort
	Lecture 30: Breadth-first search and Depth-first search on graphs
	Lecture 31: Dijkstra’s Algorithm for single-source shortest paths
	Lecture 32: Minimum Spanning Trees

25.1	How do for-each loops actually work?
25.2	Examples of Iterators for different data types