8.10
Lecture 25: Iterator and Iterable
How does a for-each loop actually work?
We’ve discussed how the three loop forms in Java work to repeatedly execute some computation
— either once for each item in an ArrayList, once for every value of an integer variable
as it counts, or simply indefinitely many times while some condition remains true.
But one of these loops should seem a bit oddly specific. While-loops execute as long as some boolean condition—any
boolean condition we want—remains true. Counted-for loops count, from whatever starting value we choose to whatever
ending value we choose by whatever interval we choose. But for-each loops seem to only work on ArrayLists. What makes
ArrayList so special that it deserves its own special syntax in the language? No other class we have seen
gets such special treatment!
In fact, ArrayList is not special, and does not get its own syntax. Instead, it is simply one
particular example of a far more general notion, one that we can take advantage of for our own classes, too.
25.1 How do for-each loops actually work?
A for-each loop looks like this:
for (T t : tList) { ...body... |
} |
Recall that when we introduced for-each loops, they were a replacement for a recursive
implementation of map over ArrayLists. At the time, their main appeal
was the ability to iterate over each item of the list, without having to mess about with
manually counting indices. But suppose Java didn’t include for-each loops, and we wanted to reimplement their behavior ourselves
using a more general-purpose loop, and yet still without referring to indices as much as possible. How could that
be done?
25.1.1 Introducing Iterators
Clearly we need some form of loop. We’re assuming for-each loops do not exist. And we want to
avoid counting indices. That leaves just while loops as our only option. Recall the skeleton of
a while loop:
...setup... |
while (something is true) { |
...body... |
} |
...use results... |
While loops are good for looping as long as necessary, while there’s more to be done.
For-each loops essentially keep looping while there are more items to process. So perhaps the essence of
our for-each loop’s behavior, when expressed as a while loop, might look like this:
while (hasMoreElements(tList)) { |
T t = nextItemOf(tList); |
...body... |
} |
Except that can’t really work, because the hypothetical hasMoreElements and nextItemOf functions
would have to keep track of additional state between calls — or else they’d just give the same answers
every time. But these two functions seem like a clean, simple way to describe the process of iterating
over data: while there’s more data, get the next item and process it, then repeat.
So: we want to preserve these two functions, but we need to keep track of state to make them work properly.
Accordingly, we need to create a new object, whose sole responsibility is to help us iterate over
tList. Moreover, when we phrase it this way, this behavior doesn’t sound very specific
to ArrayLists; we might be able to iterate over almost anything!
What kind of Java abstraction is most appropriate here?
The while loop sketch above will work with any object that exposes these two functions as methods. This is
a promise to provide a certain set of behaviors, so we should accordingly define a new interface.
This interface is called an Iterator, and it is provided for us by Java itself. Its methods are slightly renamed from the sketch above:
According to our naming conventions, it really ought to be called IIterator, but that’s a clumsy
name. At least it does start with a capital I!
interface Iterator<T> { |
boolean hasNext(); |
T next(); |
void remove(); |
} |
The job of an Iterator is to keep track of whatever state is necessary to produce
values one-at-a-time from a sequence, be it an ArrayList, an IList, a Deque,
or whatever else we choose. We’ll see momentarily how to implement this interface; first let’s see
how to use it.
We can refine our while loop sketch above as follows:
Iterator<T> listIter = new ArrayListIterator<T>(tList); |
while (listIter.hasNext()) { |
T t = listIter.next(); |
...body... |
} |
As long as the listIter claims it can produce another item, the loop will repeat and get the next
item. Notice the protocol: we always check hasNext before calling next. This ensures
that we never overshoot the end of the list.
25.1.2 Introducing Iterables
So far, so good: we can describe the behavior of a for-each loop in terms of a while loop over an Iterator.
But how does Java know what kind of iterator to construct? After all, the ArrayListIterator in the sketch
above really is specific to ArrayLists; if we want our for-each loops to work over other kinds of things,
we need another mechanism. Let’s say that an object is iterable if there exists some Iterator implementation for it.
That sounds like another interface, which is also defined for us by Java:
interface Iterable<T> { |
Iterator<T> iterator(); |
} |
In the actual Java implementation of ArrayList, we see something like this:
class ArrayList<T> implements Iterable<T> { |
... lots of other details ... |
public Iterator<T> iterator() { |
return new ArrayListIterator<T>(this); |
} |
} |
Among many other details of the implementation, ArrayList declares that it implements the Iterable
interface, and therefore can be iterated over.
Now we can finally refine our while-loop sketch completely:
Iterator<T> iterator = anyIterable.iterator(); |
while (iterator.hasNext()) { |
T t = iterator.next(); |
...body... |
} |
This is far more general than just working over ArrayLists — any class
that implements Iterable can be used with a for-each loop. This is yet another
example of the power of good abstractions: by recognizing the common elements of iteration
and being iterable, we can create a reusable notion that works wherever we choose to
implement it.
25.2 Examples of Iterators for different data types
25.2.1 Iterators for ArrayLists — counting indices
Let’s see how to implement an Iterator for an ArrayList. Our initial
skeleton will simply create a class that implements the desired interface, and contains
a reference to the ArrayList to be iterated over:
class ArrayListIterator<T> implements Iterator<T> { |
ArrayList<T> items; |
int nextIdx; |
ArrayListIterator(ArrayList<T> items) { |
this.items = items; |
this.nextIdx = 0; |
} |
|
public boolean hasNext() { |
... |
} |
|
public T next() { |
... |
} |
|
public void remove() { |
throw new UnsupportedOperationException("Don't do this!"); |
} |
} |
Implement hasNext and next.
Remember the the protocol: we always call hasNext before calling next, so the first
call to hasNext really means, “Does this sequence have at least one item?”, and the first call to next
means “Give me the first item”.
(This may seem a bit confusing, and it is. It is an example of the so-called “fencepost problem”: suppose
you need to put up 60 feet of fencing, with a fencepost every 10 feet. How many fenceposts are needed? Answer: seven,
because you need a fencepost before the very first section of fence:
X----------X----------X----------X----------X----------X----------X
Likewise, in our Iterator protocol, there will be one more call to hasNext than there will be to next,
because we call it before every call to next, and then once more to find out there are no more items.)
Accordingly, when we call hasNext, the nextIdx field will refer to the index of the next item — starting at 0 —
to be returned. Therefore, there is such an item if and only if nextIdx is a valid index of the ArrayList:
public boolean hasNext() { |
return this.nextIdx < this.items.size(); |
} |
In next, we need to get the next item, and advance nextIdx to the next index to be used:
public T next() { |
T answer = this.items.get(this.nextIdx); |
this.nextIdx = this.nextIdx + 1; |
return answer; |
} |
25.2.2 Iterators for ILists — following links
Iterators don’t have to use indices as their state to step through the contents of an ArrayList; we can use
whatever state we want. Let’s try implementing an Iterator for our IList.
Try to implement an Iterator<T> for IList<T>. What state should you store instead of indices?
The only information in an IList is the node itself, which is either a Cons or a Empty. So
our code will begin:
class IListIterator<T> implements Iterator<T> { |
IList<T> items; |
IListIterator(IList<T> items) { |
this.items = items; |
} |
public boolean hasNext() { |
... |
} |
public T next() { |
... |
} |
public void remove() { |
throw new UnsupportedOperationException("Don't do this!"); |
} |
} |
As with the previous iterator, we have a next item when we are currently pointing at a Cons node:
public boolean hasNext() { |
return this.items.isCons(); |
} |
And likewise we can return the item in the current Cons for our next value. To update the iterator,
we advance items to refer to the current Cons’s rest:
public T next() { |
ConsList<T> itemsAsCons = this.items.asCons(); |
T answer = itemsAsCons.first; |
this.items = itemsAsCons.rest; |
return answer; |
} |
Define the isCons() and asCons() methods to complete this code.
Note: Yes, isCons and asCons aren’t great methods from an
object-oriented standpoint. We know that double-dispatch would be a better
design here. In this case, double-dispatch is overkill, as we’d need two
visitor classes — one to implement isCons and one for asCons
— which seems painfully heavyweight. Besides, it is more useful to think of
Iterators as being tightly connected to the data they are iterating
over: if the data definition changes, the iterator will have to change to
match. And so, just as we allowed function objects to see fields of their
parameters despite that not being a normal part of the template, we allow
iterators to determine what type their argument might be.
We can make our ILists be Iterable, too:
interface IList<T> extends Iterable<T> { |
... everything as before ... |
} |
class ConsList<T> implements IList<T> { |
... everything as before ... |
public Iterator<T> iterator() { |
return new IListIterator<T>(this); |
} |
} |
class MtList<T> implements IList<T> { |
... everything as before ... |
public Iterator<T> iterator() { |
return new IListIterator<T>(this); |
} |
} |
And with those last few definitions, we can now use ILists in for-each loops, exactly as we could
with ArrayLists.
for (T item : myList) { |
... |
} |
25.2.3 Iteration in multiple directions
The concept of an Iterator is very flexible. Let’s work through several examples, showing how varied
they can be.
Define a DequeForwardIterator that advances through a Deque, just as we did above with an IList.
Define a DequeReverseIterator that walks backward through a Deque from the last item to the first. The code should
be very nearly identical to the forward iterator.
Some data structures can meaningfully support iteration in multiple orders. However, if we choose to make those
data structures Iterable, then we have to choose a default iteration order to be used with for-each loops,
and construct that particular iterator in the iterator() method. For Deques, we probably would choose
the forward iteration direction, as it is the “most natural”. If we want to use the reverse iterator,
we’d have to explicitly write the while-loop version ourselves:
class Deque<T> implements Iterable<T> { |
public Iterator<T> iterator() { |
return new DequeForwardIterator<T>(this.sentinel.next); |
} |
Iterator<T> reverseIterator() { |
return new DequeReverseIterator<T>(this.sentinel.prev); |
} |
} |
for (T item : myDeque) { |
... |
} |
Iterator<T> revIter = myDeque.reverseIterator(); |
while (revIter.hasNext()) { |
T item = revIter.next(); |
... |
} |
25.2.4 Iterators for Fibonacci numbers — computing items on demand
Not every iterator needs to store an actual object from which it derives its data. Iterators simply
represent sequences of values, and those values might just be computed on demand. Consider the following iterator:
class FibonacciIterator implements Iterator<Integer> { |
int prevVal = 0; |
int curVal = 1; |
boolean hasNext() { return true; } |
Integer next() { |
int answer = this.prevVal + this.curVal; |
this.prevVal = this.curVal; |
this.curVal = answer; |
return answer; |
} |
public void remove() { |
throw new UnsupportedOperationException("Don't do this!"); |
} |
} |
This iterator can produce an infinitely long stream of Fibonacci numbers, without requiring
an explicit list to store them all in: it just stores the two most recent values, and computes the
next one as required.
25.2.5 Higher-order Iterators
Much as we can have higher-order functions (as implemented in Java by higher-order function objects),
we can have higher-order iterators, that use the sequence of values from one iterator and produce a different
sequence of values. For example, we can define an iterator to take every other value from an iterator:
class EveryOtherIter<T> implements Iterator<T> { |
Iterator<T> source; |
EveryOtherIter(Iterator<T> source) { |
this.source = source; |
} |
public boolean hasNext() { |
return this.source.hasNext(); |
} |
public T next() { |
T answer = this.source.next(); if (this.source.hasNext() { |
this.source.next(); } |
return answer; |
} |
public void remove() { |
this.source.remove(); } |
} |
We can also define an iterator that takes only the first \(n\) items from another iterator:
class TakeN<T> implements Iterator<T> { |
Iterator<T> source; |
int howMany; |
int countSoFar; |
TakeN(Iterator<T> source, int n) { |
this.source = source; |
this.howMany = n; |
this.countSoFar = 0; |
} |
public boolean hasNext() { |
... |
} |
public T next() { |
... |
} |
public void remove() { |
this.source.remove(); } |
} |
When does this TakeN iterator have a next item? Only if we have taken fewer than \(n\)
items, and the source iterator has a next item:
public boolean hasNext() { |
return (this.countSoFar < this.howMany) && this.source.hasNext(); |
} |
To get the next item, we delegate to the source, but we also must increment the count of items
returned so far:
public T next() { |
this.countSoFar = this.countSoFar + 1; |
return this.source.next(); |
} |
Define a higher-order iterator that takes two iterators and alternates items from each of them.
25.2.6 Iterators over tree-shaped data
We can even define iterators over tree-shaped data. Let’s consider binary trees. There are many plausible orders for
traversing a tree. For the following example tree (with data at the nodes, and nothing at the leaves):
A
/ \
/ \
B C
/ \ / \
D E F G
/ \ /\/\ / \
we have (at least) the following standard orders:
A breadth-first traversal, which walks through
each level of the tree from top to bottom, left to right:
A post-order traversal, which recursively produces all the children of a node before
producing the node itself:
An in-order traversal, which recursively produces the left subtree of a node,
then the node, then recursively produces the right subtree:
A pre-order traversal (or a depth-first traversal), which produces the node, then recursively produces the
left subtree of the node, then the right subtree:
Every one of these can be implemented by a sufficiently clever iterator. Let’s try the breadth-first traversal first. The key challenge in
implementing any of these is determining what state information is needed. Unlike with lists, once we process a node we have
two nodes to process afterward, and we cannot get from one to the other. So unlike list iterators, which can get away with
storing just a single index or a single IList reference, we’ll have to store a whole list of references of items
that we have yet to process. This is usually referred to as making a worklist, and it’s a very common algorithmic technique.
So, suppose we are given a reference to the node A above. We know that we have a next item (because we’re not at a leaf),
so we produce A as our next. What should go on the worklist? We need to process B and C (in that order), so we add them
to our worklist. On the subsequent call to next, we need to process B, so we produce B, and need to process D and E.
We still have to process C before we get to D or E (because we are proceeding in breadth-first order), so we add
D and E to the back of our worklist. Our next item to process is C, which is at the front of our
worklist. It looks like we need to be able to add items to the end of our list, and remove items from the front of our list.
Fortunately, we have a data structure that’s perfectly capable of such operations: a Deque! We’ll use the deque to
store a list of binary-tree nodes, and process them one at a time. Our implementation of a breadth-first traversal
will look like this:
class BreadthFirstIterator<T> implements Iterator<T> { |
Deque<IBinaryTree<T>> worklist; |
BreadthFirstIterator(IBinaryTree<T> source) { |
this.worklist = new Deque<IBinaryTree<T>>(); |
this.addIfNotLeaf(source); |
} |
void addIfNotLeaf(IBinaryTree bt) { |
if (bt.isNode()) { |
this.worklist.addAtTail(bt); |
} |
} |
public boolean hasNext() { |
return this.worklist.size() > 0; |
} |
public T next() { |
BTNode<T> node = this.worklist.removeAtHead().asNode(); |
this.addIfNotLeaf(node.left); |
this.addIfNotLeaf(node.right); |
return node.data; |
} |
public void remove() { |
throw new UnsupportedOperationException("Don't do this!"); |
} |
} |
We are using our Deque as a queue, where items are added at the end of the queue and removed from the front.
(Think of a queue as standing in line at the supermarket: people queue up at the end of the line, and exit from the front of the line.)
Try implementing a PreOrderIterator for a tree. The code is very similar to BreadthFirstIterator.
Following similar reasoning as above, suppose we are given a reference to the node A. We’ll produce A as the first item,
and then we need to process B and C. Next we’ll produce B, and then need to process D and E...but
we must process them before we get back to C. So instead of adding the items to the tail of our Deque,
we’ll add them to the front:
class PreOrderIterator<T> implements Iterator<T> { |
Deque<IBinaryTree<T>> worklist; |
PreOrderIterator(IBinaryTree<T> source) { |
this.worklist = new Deque<IBinaryTree<T>>(); |
this.addIfNotLeaf(source); |
} |
void addIfNotLeaf(IBinaryTree bt) { |
if (bt.isNode()) { |
this.worklist.addAtHead(bt); } |
} |
public boolean hasNext() { |
return this.worklist.size() > 0; |
} |
public T next() { |
BTNode<T> node = this.worklist.removeAtHead().asNode(); |
this.addIfNotLeaf(node.right); this.addIfNotLeaf(node.left); return node.data; |
} |
public void remove() { |
throw new UnsupportedOperationException("Don't do this!"); |
} |
} |
We are now using our Deque as a stack, where items are pushed onto the front of the stack and also removed from the front.
(Think of a stack as a pile of dishes: they are piled on top of each other, and removed from the top; the bottommost dish was the
first one to be added, and the last one to be removed.) We have to swap the order of adding node.right and node.left
because we need to preserve their order when we finally do remove them.
Try implementing post-order and in-order traversals as iterators. They are somewhat subtler than the two we have done so far;
in particular, figuring out what to add to the worklist is tricky.