Introduction Interfaces
The Collection Interface
The Set Interface
The List Interface
The Map Interface
Object Ordering
The SortedSet Interface
The SortedMap Interface
Implementations
General Purpose Implementations
Wrapper Implementations
Convenience Implementations
Algorithms Custom Implementations Interoperability
Compatibility
API Design
Solving Common Collections Problems
Lesson: Introduction
What Is a Collection?
A collection (sometimes called a container) is simply an object that groups multiple elements into a single unit.
Collections are used to store, retrieve and manipulate data, and to transmit data from one method to another.
Collections typically represent data items that form a natural group, like a poker hand (a collection of cards), a mail folder (a collection of letters), or a telephone directory (a collection of name-to-phone-number mappings).
If you've used Java -- or just about any other programming language -- you're already familiar with collections. Collection implementations in earlier versions of Java included Vector , Hashtable , and array. While earlier versions of Java contained collection implementations, they did not contain a collections framework.
What Is a Collections Framework?
A collections framework is a unified architecture for representing and manipulating collections.
All collections frameworks contain three things:
o Interfaces: abstract data types representing collections. Interfaces allow collections to be manipulated independently of the details of their representation. In object-oriented languages like Java, these interfaces generally form a hierarchy.
o Implementations: concrete implementations of the collection interfaces. In essence, these are reusable data structures.
o Algorithms: methods that perform useful computations, like searching and sorting, on objects that implement collection interfaces. These algorithms are said to be polymorphic because the same method can be used on many different implementations of the appropriate collections interface. In essence, algorithms are reusable functionality.
The best known examples of collections frameworks are the C++ Standard Template Library (STL), and Smalltalk's collection classes.
What are the Benefits of a Collections Framework?
• It reduces programming effort: By providing useful data structures and algorithms, a collections framework frees you to concentrate on the important parts of your program, rather than the low-level plumbing required to make it work. By facilitating interoperability among unrelated APIs (as described below), the collections framework frees you from writing oodles of adapter objects or conversion code to connect APIs.
• It increases program speed and quality: The collections framework does this primarily by providing high-performance, high-quality implementations of useful data structures and algorithms. Also, because the various implementations of each interface are interchangeable, programs can be easily tuned by switching collection implementations. Finally, because you're freed from the drudgery of writing your own data structures, you'll have more time to devote to improving the quality and performance of the rest of the program.
• It allows interoperability among unrelated APIs: The collections interfaces will become the "lingua franca" by which APIs pass collections back and forth. If my network administration API furnishes a Collection of node names, and your GUI toolkit expects a Collection of column headings, our APIs will interoperate seamlessly even though they were written independently.
• It reduces the effort to learn and use new APIs: Many APIs naturally take collections on input and output. In the past, each such API had a little "sub-API" devoted to manipulating its collections. There was little consistency among these ad-hoc collections sub-APIs, so you had to learn each one from scratch and it was easy to make mistakes when using them. With the advent of standard collections interfaces, the problem goes away.
• It reduces effort to design new APIs: This is the flip-side of the previous advantage: designers and implementers don't have to reinvent the wheel each time they create an API that relies on collections. They just use the standard collections interfaces.
• It fosters software reuse: New data structures that conform to the standard collection interfaces are by nature reusable. The same goes for new algorithms that operate on objects that implement these interfaces.
Are there any Drawbacks?
Historically, collections frameworks have been quite complex, which gave them a reputation for having a steep learning curve. We believe that Java's new collections framework breaks with this tradition, as you will learn for yourself in the following lessons.
Lesson: Interfaces
The core collection interfaces are the interfaces used to manipulate collections, and to pass them from one method to another.
The basic purpose of these interfaces is to allow collections to be manipulated independently of the details of their representation.
The core collection interfaces are the heart and soul of the collections framework.
When you understand how to use these interfaces, you know most of what there is to know about the framework. The core collections interfaces are shown below:
The core collection interfaces form a hierarchy: a Set is a special kind of Collection, and a SortedSet is a special kind of Set, and so forth.
Note also that the hierarchy consists of two distinct trees: a Map is not a true Collection.
To keep the number of core collection interfaces manageable, the JDK doesn't provide separate interfaces for each variant of each collection type. (Among the possible variants are immutable, fixed-size, and append-only.) Instead, the modification operations in each interface are designated optional: a given implementation may not support some of these operations. If an unsupported operation is invoked, a collection throws an UnsupportedOperationException . Implementations are responsible for documenting which of the optional operations they support. All of the JDK's general purpose implementations support all of the optional operations.
The four sections that follow teach you how to use each of the four basic core collection interfaces. In particular, they describe the idioms that allow you to use these interfaces effectively.
Collection
The Collection interface is the root of the collection hierarchy.
A Collection represents a group of objects, known as its elements.
Some Collection implementations allow duplicate elements and others do not.
Some are ordered and others unordered.
The JDK doesn't provide any direct implementations of this interface: It provides implementations of more specific subinterfaces like Set and List.
This interface is the least common denominator that all collections implement.
Collection is used to pass collections around and manipulate them when maximum generality is desired.
Set
A Set is a collection that cannot contain duplicate elements.
As you might expect, this interface models the mathematical set abstraction. It is used to represent sets like the cards comprising a poker hand, the courses making up a student's schedule, or the processes running on a machine.
List
A List is an ordered collection (sometimes called a sequence).
Lists can contain duplicate elements.
The user of a List generally has precise control over where in the List each element is inserted.
The user can access elements by their integer index (position).
If you've used Vector , you're already familiar with the general flavor of List.
Map
A Map is an object that maps keys to values.
Maps cannot contain duplicate keys:
Each key can map to at most one value.
If you've used Hashtable , you're already familiar with the general flavor of Map.
The last two core collection interfaces (SortedSet and SortedMap) are merely sorted versions of Set and Map. In order to understand these interfaces, you have to know how order is maintained among objects. Even if you don't plan to use SortedSet or SortedMap, read the following section if you plan to sort Lists.
Object Ordering
There are two ways to order objects:
The Comparable interface provides automatic natural order on classes that implement it, while the
Comparator interface gives the programmer complete control over object ordering. Note that these are not core collection interfaces, but underlying infrastructure.
Now that you know all about object ordering, here are the last two core collection interfaces:
SortedSet
A SortedSet is a Set that maintains its elements in ascending order.
Several additional operations are provided to take advantage of the ordering.
The SortedSet interface is used for things like word lists and membership rolls.
SortedMap
A SortedMap is a Map that maintains its mappings in ascending key order.
It is the Map analogue of SortedSet.
The SortedMap interface is used for apps like dictionaries and telephone directories.
The Collection Interface
A Collection represents a group of objects, known as its elements.
The primary use of the Collection interface is to pass around collections of objects where maximum generality is desired. For example, by convention all general-purpose collection implementations (which typically implement some subinterface of Collection like Set or List) have a constructor that takes a Collection argument. This constructor initializes the new Collection to contain all of the elements in the specified Collection. This constructor allows the caller to create a Collection of a desired implementation type, initially containing all of the elements in any given Collection, whatever its subinterface or implementation type. Suppose you have a Collection, c, which may be a List, a Set, or some other kind of Collection.
The following one-liner creates a new ArrayList (an implementation of the List interface), initially containing all of the elements in c: List l = new ArrayList(c);
The Collection interface is shown below:
public interface Collection {
// Basic Operations
int size();
boolean isEmpty();
boolean contains(Object element);
boolean add(Object element); // Optional
boolean remove(Object element); // Optional
Iterator iterator();
// Bulk Operations
boolean containsAll(Collection c);
boolean addAll(Collection c); // Optional
boolean removeAll(Collection c); // Optional
boolean retainAll(Collection c); // Optional
void clear(); // Optional
// Array Operations
Object[] toArray();
Object[] toArray(Object a[]);
}
The interface does about what you'd expect, given that a Collection represents a group of objects.
It has methods to tell you how many elements are in the collection (size, isEmpty), to check if a given object is in the collection (contains), to add and remove an element from the collection (add, remove), and to provide an iterator over the collection (iterator).
The add method is defined generally enough so that it makes sense for collections that allow duplicates as well as those that don't. It guarantees that the Collection will contain the specified element after the call completes, and returns true if the Collection changes as a result of the call.
Similarly, the remove method is defined to remove a single instance of the specified element from the Collection, assuming the Collection contains the element, and to return true if the Collection was modified as a result.
Iterators
The object returned by the iterator method deserves special mention.
It is an Iterator , which is very similar to an Enumeration , but differs in two respects:
• Iterator allows the caller to remove elements from the underlying collection during the iteration with well-defined semantics.
• Method names have been improved.
The first point is important:
There was no safe way to remove elements from a collection while traversing it with an Enumeration. The semantics of this operation were ill-defined, and differed from implementation to implementation.
The Iterator interface is shown below:
public interface Iterator {
boolean hasNext();
Object next();
void remove(); // Optional
}
The hasNext method is identical in function to Enumeration.hasMoreElements, and the next method is identical in function to Enumeration.nextElement.
The remove method removes from the underlying Collection the last element that was returned by next. The remove method may be called only once per call to next, and throws an exception if this condition is violated.
Note that Iterator.remove is the only safe way to modify a collection during iteration; the behavior is unspecified if the underlying collection is modified in any other way while the iteration is in progress.
The following snippet shows you how to use an Iterator to filter a Collection, that is, to traverse the collection, removing every element that does not satisfy some condition:
static void filter(Collection c) {
for (Iterator i = c.iterator(); i.hasNext(); )
if (!cond(i.next()))
i.remove();
}
Two things should be kept in mind when looking at this simple piece of code:
• The code is polymorphic: it works for any Collection that supports element removal, regardless of implementation. That's how easy it is to write a polymorphic algorithm under the collections framework!
• It would have been impossible to write this using Enumeration instead of Iterator, because there's no safe way to remove an element from a collection while traversing it with an Enumeration.
Bulk Operations
The bulk operations perform some operation on an entire Collection in a single shot.
They are shorthands in the sense that each of them can be simulated, perhaps less efficiently, using the operations described above.
o containsAll: Returns true if the target Collection contains all of the elements in the specified Collection (c).
o addAll: Adds all of the elements in the specified Collection to the target Collection.
o removeAll: Removes from the target Collection all of its elements that are also contained in the specified Collection.
o retainAll: Removes from the target Collection all of its elements that are not also contained in the specified Collection. That is to say, it retains only those elements in the target Collection that are also contained in the specified Collection.
o clear: Removes all elements from the Collection.
The addAll, removeAll, and retainAll methods all return true if the target Collection was modified in the process of executing the operation.
As a simple example of the power of the bulk operations, consider following idiom to remove all instances of a specified element, e from a Collection, c.:
c.removeAll(Collections.singleton(e));
More specifically, suppose that you want to remove all of the null elements from a Collection:
c.removeAll(Collections.singleton(null));
This idiom uses Collections.singleton, which is a static factory method that returns an immutable Set containing only the specified element.
Array Operations
The toArray methods are provided as a bridge between collections and older APIs that expect arrays on input.
They allow the contents of a Collection to be translated into an array.
The simple form with no arguments creates a new array of Object.
The more complex form allows the caller to provide an array or to choose the runtime type of the output array.
For example, suppose c is a Collection The following snippet dumps the contents of c into a newly allocated array of Object whose length is identical to the number of elements in c:
Object[] a = c.toArray();
Suppose c is known to contain only strings. The following snippet dumps the contents of c into a newly allocated array of String whose length is identical to the number of elements in c:
String[] a = (String[]) c.toArray(new String[0]);
The Set Interface
A Set is a Collection that cannot contain duplicate elements. Set models the mathematical set abstraction.
The Set interface extends Collection and contains no methods other than those inherited from Collection.
It adds the restriction that duplicate elements are prohibited.
Set also adds a stronger contract on the behavior of the equals and hashCode operations, allowing Set objects with different implementation types to be compared meaningfully.
Two Set objects are equal if they contain the same elements.
The Set interface is shown below:
public interface Set {
// Basic Operations
int size();
boolean isEmpty();
boolean contains(Object element);
boolean add(Object element); // Optional
boolean remove(Object element); // Optional
Iterator iterator();
// Bulk Operations
boolean containsAll(Collection c);
boolean addAll(Collection c); // Optional
boolean removeAll(Collection c); // Optional
boolean retainAll(Collection c); // Optional
void clear(); // Optional
// Array Operations
Object[] toArray();
Object[] toArray(Object a[]);
}
The JDK contains two general-purpose Set implementations.
o HashSet , which stores its elements in a hash table, is the best-performing implementation.
o TreeSet , which stores its elements in a red-black tree, guarantees the order of iteration.
Here's a simple but useful Set idiom.
o Suppose you have a Collection, c, and you want to create another Collection containing the same elements, but with all duplicates eliminated. The following one-liner does the trick:
o Collection noDups = new HashSet(c);
o It works by creating a Set (which, by definition, cannot contain duplicates) initially containing all the elements in c. It uses the "standard Collection constructor" described in the Collection interface lesson.
Basic Operations
The size operation returns the number of elements in the Set (its cardinality).
The isEmpty method does exactly what you think it does.
The add method adds the specified element to the Set if it's not already present, and returns a boolean indicating whether the element was added.
Similarly, the remove method removes the specified element from the Set if it's present, and returns a boolean indicating whether the element was present.
The iterator method returns an Iterator over the Set.
Here's a little program that takes the words in its argument list and prints out any duplicate words, the number of distinct words, and a list of the words with duplicates eliminated:
import java.util.*;
public class FindDups {
public static void main(String args[]) {
Set s = new HashSet();
for (int i=0; i
System.out.println("Duplicate detected: "+args[i]);
System.out.println(s.size()+" distinct words detected: "+s);
}
}
Now let's run the program:
% java FindDups i came i saw i left
Duplicate detected: i
Duplicate detected: i
4 distinct words detected: [came, left, saw, i]
Note that the example code always refers to the collection by its interface type (Set), rather than by its implementation type (HashSet).
This is a strongly recommended programming practice, as it gives you the flexibility to change implementations merely by changing the constructor. If the variables used to store a collection, or the parameters used to pass it around, are declared to be of the collection's implementation type rather than its interface type, then all such variables and parameters must be changed to change the collection's implementation type.
Furthermore, there's no guarantee that the resulting program will work; if the program uses any non-standard operations that are present in the original implementation type but not the new one, the program will fail. Referring to collections only by their interface keeps you honest, in the sense that it prevents you from using any non-standard operations.
The implementation type of the Set in the example above is HashSet, which makes no guarantees as to the order of the elements in the Set.
If you want the program to print the word list in alphabetical order, all you have to do is to change the set's implementation type from HashSet to TreeSet.
Making this trivial one-line change causes the command line in the previous example to generate the following output:
% java FindDups i came i saw i left
Duplicate word detected: i
Duplicate word detected: i
4 distinct words detected: [came, i, left, saw]
Bulk Operations
The bulk operations are particularly well suited to Sets: they perform standard set-algebraic operations. Suppose s1 and s2 are Sets. Here's what the bulk operations do:
s1.containsAll(s2): Returns true if s2 is a subset of s1. (For example, set s1 is a subset of s2 if set s2 contains all the elements in s1.)
s1.addAll(s2): Transforms s1 into the union of s1 and s2. (The union of two sets is the set containing all the elements contained in either set.)
s1.retainAll(s2): Transforms s1 into the intersection of s1 and s2. (The intersection of two sets is the set containing only the elements that are common in both sets.)
s1.removeAll(s2): Transforms s1 into the (asymmetric) set difference of s1 and s2. (For example, the set difference of s1 - s2 is the set containing all the elements found in s1 but not in s2.)
To calculate the union, intersection, or set difference of two sets non-destructively (without modifying either set), the caller must copy one set before calling the appropriate bulk operation. The resulting idioms are shown below:
Set union = new HashSet(s1);
union.addAll(s2);
Set intersection = new HashSet(s1);
intersection.retainAll(s2);
Set difference = new HashSet(s1);
difference.removeAll(s2);
The implementation type of the result Set in the above idioms is HashSet, which is, as mentioned above, the best all-around Set implementation in the JDK. However, any general-purpose Set implementation could be substituted.
Let's revisit the FindDups example program above. Suppose you want to know which words in the argument list occur only once and which occur more than once, but you do not want any duplicates printed out repeatedly. This effect can be achieved by generating two sets, one containing every word in the argument list, and the other containing only the duplicates. The words that occur only once are the set difference of these two sets, which we know how to compute. Here's how the resulting program looks:
import java.util.*;
public class FindDups2 {
public static void main(String args[]) {
Set uniques = new HashSet();
Set dups = new HashSet();
for (int i=0; i
dups.add(args[i]);
uniques.removeAll(dups); // Destructive set-difference
System.out.println("Unique words: " + uniques);
System.out.println("Duplicate words: " + dups);
}
}
Now let's run the revised program with the same same argument list we used before:
% java FindDups2 i came i saw i left
Unique words: [came, left, saw]
Duplicate words: [i]
A less common set-algebraic operation is the symmetric set difference: the set of elements contained in either of two specified sets, but not contained in both of them. The following code calculates the symmetric set difference of two sets non-destructively:
Set symmetricDiff = new HashSet(s1);
symmetricDiff.addAll(s2);
Set tmp = new HashSet(s1);
tmp.retainAll(s2);
symmetricDiff.removeAll(tmp);
Array Operations
The array operations don't do anything special for Sets beyond what they do for any other Collection. They are described in the Collection interface lesson.
The List Interface
A List is an ordered Collection (sometimes called a sequence).
Lists may contain duplicate elements.
In addition to the operations inherited from Collection, the List interface includes operations for:
• Positional Access: manipulate elements based on their numerical position in the list.
• Search: search for a specified object in the list and return its numerical position.
• List Iteration: extend Iterator semantics to take advantage of the list's sequential nature.
• Range-view: perform arbitrary range operations on the list.
The List interface is shown below:
public interface List extends Collection {
// Positional Access
Object get(int index);
Object set(int index, Object element); // Optional
void add(int index, Object element); // Optional
Object remove(int index); // Optional
abstract boolean addAll(int index, Collection c); // Optional
// Search
int indexOf(Object o);
int lastIndexOf(Object o);
// Iteration
ListIterator listIterator();
ListIterator listIterator(int index);
// Range-view
List subList(int from, int to);
}
The JDK contains two general-purpose List implementations.
ArrayList , which is generally the best-performing implementation, and
LinkedList which offers better performance under certain circumstances.
Also, Vector has been retrofitted to implement List. For more information on implementations, see the Implementations lesson.
Comparison to Vector
If you've used Vector , you're already familiar with the general flavor of List. (Of course, List is an interface, while Vector is a concrete implementation.)
List fixes several minor API deficiencies in Vector.
For starters, commonly used Vector operations such as elementAt and setElementAt have been given much shorter names. When you consider that these two operations are the List analogue of square brackets for arrays, it becomes apparent that shorter names are highly desirable.
Consider the following assignment statement: a[i] = a[j].times(a[k]);
The Vector equivalent is: v.setElementAt(v.elementAt(j).times(v.elementAt(k)), i);
The List equivalent is: v.set(i, v.get(j).times(v.get(k)));
You may already have noticed that the set method, which replaces setElementAt, reverses the order of the arguments so that they match the corresponding array operation.
Consider this assignment statement: beatle[5] = "Billy Preston";
The Vector equivalent is: beatle.setElementAt("Billy Preston", 5);
The List equivalent is: beatle.set(5, "Billy Preston");
For consistency's sake, the add(int, Object) method, which replaces insertElementAt(Object, int), also reverses the order of the arguments.
The various range operations in Vector (indexOf, lastIndexOf(setSize) have been replaced by a single range-view operation (subList), which is far more powerful and consistent.
Collection Operations
The operations inherited from Collection all do about what you'd expect them to do, assuming you're already familiar with them from Collection.
The remove operation always removes the first occurrence of the specified element from the list.
The add and addAll operations always append the new element(s) to the end of the list. Thus, the following idiom concatenates one list to another: list1.addAll(list2);
Here's a non-destructive form of this idiom, which produces a third List consisting of the second list appended to the first:
List list3 = new ArrayList(list1);
list3.addAll(list2);
Note that the idiom, in its non-destructive form, takes advantage of ArrayList's standard Collection constructor.
Like the Set interface, List strengthens the requirements on the equals and hashCode methods so that two List objects can be compared for logical equality without regard to their implementation classes.
Two List objects are equal if they contain the same elements in the same order.
Positional Access and Search Operations
The basic positional access operations (get, set, add and remove) behave just like their longer-named counterparts in Vector (elementAt, setElementAt, insertElementAt and removeElementAt) with one noteworthy exception.
The set and remove operations return the old value that is being overwritten or removed; the Vector counterparts (setElementAt and removeElementAt) return nothing (void).
The search operations indexOf and lastIndexOf behave exactly like the identically named operations in Vector.
The addAll operation inserts all of the elements of the specified Collection starting at the specified position. The elements are inserted in the order they are returned by the specified Collection's iterator. This call is the positional access analogue of Collection's addAll operation.
Here's a little function to swap two indexed values in a List. It should look familiar from Programming 101 (assuming you stayed awake):
private static void swap(List a, int i, int j) {
Object tmp = a.get(i);
a.set(i, a.get(j));
a.set(j, tmp);
}
Of course there's one big difference. This is a polymorphic algorithm: it swaps two elements in any List, regardless of its implementation type. "Big deal," you say, "What's it good for?" Funny you should ask. Take a look at this:
public static void shuffle(List list, Random rnd) {
for (int i=list.size(); i>1; i--)
swap(list, i-1, rnd.nextInt(i));
}
This algorithm (which is included in the JDK's Collections class) randomly permutes the specified List using the specified source of randomness. It's a bit subtle: It runs up the list from the bottom, repeatedly swapping a randomly selected element into the current position. Unlike most naive attempts at shuffling, it's fair (all permutations occur with equal likelihood, assuming an unbiased source of randomness) and fast (requiring exactly list.size()-1 iterations). The following short program uses this algorithm to print the words in its argument list in random order:
import java.util.*;
public class Shuffle {
public static void main(String args[]) {
List l = new ArrayList();
for (int i=0; i
Collections.shuffle(l, new Random());
System.out.println(l);
}
}
In fact, we can make this program even shorter and faster. The Arrays class has a static factory method called asList that allows an array to be viewed as a List. This method does not copy the array; changes in the List write through to the array, and vice-versa. The resulting List is not a general-purpose List implementation, in that it doesn't implement the (optional) add and remove operations: arrays are not resizable. Taking advantage of Arrays.asList and calling an alternate form of shuffle that uses a default source of randomness, you get the following tiny program , whose behavior is identical to the previous program:
import java.util.*;
public class Shuffle {
public static void main(String args[]) {
List l = Arrays.asList(args);
Collections.shuffle(l);
System.out.println(l);
}
}
Iterators
As you'd expect, the Iterator returned by List's iterator operation returns the elements of the list in proper sequence. Additionally, List provides a richer iterator, called a ListIterator, that allows you to traverse the list in either direction, modify the list during iteration, and obtain the current position of the iterator. The ListIterator interface is summarized below (including the three methods it inherits from Iterator):
public interface ListIterator extends Iterator {
boolean hasNext();
Object next();
boolean hasPrevious();
Object previous();
int nextIndex();
int previousIndex();
void remove(); // Optional
void set(Object o); // Optional
void add(Object o); // Optional
}
The three methods that ListIterator inherits from Iterator (hasNext, next, and remove) are intended to do exactly the same thing in both interfaces. The hasPrevious and previous operations are exact analogues of hasNext and next. The former operations refer to the element before the (implicit) cursor, whereas the latter refer to the element after the cursor. The previous operation moves the cursor backwards whereas next moves it forwards.
Here's the standard idiom for iterating backwards through a list:
for (ListIterator i=l.listIterator(l.size()); i.hasPrevious(); ) {
Foo f = (Foo) i.previous();
...
}
Note the argument to listIterator in the above idiom. The List interface has two forms of the listIterator method. The form with no arguments returns a ListIterator positioned at the beginning of the list, and the form with an int argument returns a ListIterator positioned at the specified index. The index refers to the element that would be returned by an initial call to next. If the index value of n is used, then the initial call to next would return null, since it would point just past the end of the list. An initial call to previous would return the element whose index was index-1. In a list of length n, there are n+1 valid values for index, from 0 to n, inclusive.
Intuitively speaking, the cursor is always between two elements, the one that would be returned by a call to previous and the one that would be returned by a call to next. The n+1 valid index values correspond to the n+1 "gaps" between elements, from the gap before the first element to the gap after the last one. The diagram below shows the five possible cursor positions in a list containing four elements.
Element(0) Element(1) Element(2) Element(3)
^ ^ ^ ^ ^
Index: 0 1 2 3 4
Calls to next and previous can be intermixed, but you have to be a bit careful. The first call to previous after a sequence of calls to next returns the same element as the last call to next. Similarly, the first call to next after a sequence of calls to previous returns the same element as the last call to previous. This will become obvious if you stare at the foregoing text long enough. If it doesn't, go get yourself a steaming hot mug of Java, and try again.
It should come as no surprise that the nextIndex method returns the index of the element that would be returned by a subsequent call to next, and previousIndex returns the index of the element that would be returned by a subsequent call to previous. These calls are typically used for one of two purposes: To report the position where something was found, or to record the position of the ListIterator so that another ListIterator with identical position can be created.
It should also come as no surprise that the number returned by nextIndex is always one greater than the number returned by previousIndex. This implies the behavior of the two boundary cases: a call to previousIndex when the cursor is before the initial element returns -1, and a call to nextIndex when the cursor is after the final element returns list.size()+1. To make all of this concrete, here's a possible implementation of List.indexOf:
public int indexOf(Object o) {
for (ListIterator i = listIterator(); i.hasNext(); )
if (o==null ? i.next()==null : o.equals(i.next()))
return i.previousIndex();
return -1; // Object not found
}
Note that the indexOf method returns i.previousIndex() though it is traversing the list in the forward direction. This is because i.nextIndex() would return the index of the element that we are about to examine, and we want to return the index of the element that we just examined.
The Iterator interface provides the remove operation to remove from the Collection the last element returned by next. The ListIterator interface provides two additional operations to modify the list: set and add. The set method "overwrites" the last element returned by next or previous with the specified element. It is demonstrated by the following polymorphic algorithm to replace all occurrences of one specified value with another:
public static void replace(List l, Object val, Object newVal) {
for (ListIterator i = l.listIterator(); i.hasNext(); )
if (val==null ? i.next()==null : val.equals(i.next()))
i.set(newVal);
}
The only bit of trickiness in this example is the equality test between val and i.next. We have to special-case an val value of null in order to prevent a NullPointerException.
The add method inserts a new element into the list, immediately before the current cursor position. This method is illustrated in the following polymorphic algorithm to replace all occurrences of a specified value with the sequence of values contained in the specified list:
public static void replace(List l, Object val, List newVals) {
for (ListIterator i = l.listIterator(); i.hasNext(); ) {
if (val==null ? i.next()==null : val.equals(i.next())) {
i.remove();
for (Iterator j = newVals.iterator(); j.hasNext(); )
i.add(j.next());
}
}
}
Range-view Operation
The range-view operation, subList(int fromIndex, int toIndex), returns a List view of the portion of this list whose indices range from fromIndex, inclusive, to toIndex, exclusive. This half-open range mirrors the typical for-loop:
for (int i=fromIndex; i
}
As the term view implies, the returned List is backed by the List on which subList was called, so changes in the former List are reflected in the latter.
This method eliminates the need for explicit range operations (of the sort that commonly exist for arrays). Any operation that expects a List can be used as a range operation by passing a subList view instead of a whole List. For example, the following idiom removes a range of elements from a list:
list.subList(fromIndex, toIndex).clear();
Similar idioms may be constructed to search for an element in a range:
int i = list.subList(fromIndex, toIndex).indexOf(o);
int j = list.subList(fromIndex, toIndex).lastIndexOf(o);
Note that the above idioms return the index of the found element in the subList, not the index in the backing List.
Any polymorphic algorithm that operates on a List (like the replace and shuffle examples, above) works with the List returned by subList.
Here's a polymorphic algorithm whose implementation uses subList to deal a hand from a deck. That is to say, it returns a new List (the "hand") containing the specified number of elements taken from the end of the specified List (the "deck"). The elements returned in the hand are removed from the deck.
public static List dealHand(List deck, int n) {
int deckSize = deck.size();
List handView = deck.subList(deckSize-n, deckSize);
List hand = new ArrayList(handView);
handView.clear();
return hand;
}
The literal-minded might say that this program deals from the bottom of the deck, but I prefer to think that the computer is holding the deck upside down. At any rate, for many common List implementations, like ArrayList, the performance of removing elements from the end of the list is substantially better than that of removing elements from the beginning.
Here's a program using the dealHand method in combination with Collections.shuffle to generate hands from a normal 52-card deck. The program takes two command line arguments: the number of hands to deal and the number of cards in each hand.
import java.util.*;
class Deal {
public static void main(String args[]) {
int numHands = Integer.parseInt(args[0]);
int cardsPerHand = Integer.parseInt(args[1]);
// Make a normal 52-card deck
String[] suit = new String[] {"spades", "hearts",
"diamonds", "clubs"};
String[] rank = new String[]
{"ace","2","3","4","5","6","7","8",
"9","10","jack","queen","king"};
List deck = new ArrayList();
for (int i=0; i
Collections.shuffle(deck);
for (int i=0; i
}
}
Let's run the program:
% java Deal 4 5
[8 of hearts, jack of spades, 3 of spades, 4 of spades, king of diamonds]
[4 of diamonds, ace of clubs, 6 of clubs, jack of hearts, queen of hearts]
[7 of spades, 5 of spades, 2 of diamonds, queen of diamonds, 9 of clubs]
[8 of spades, 6 of diamonds, ace of spades, 3 of hearts, ace of hearts]
While the subList operation is extremely powerful, some care must be exercised when using it. The semantics of the List returned by subList become undefined if elements are added to or removed from the backing List in any way other than via the returned List. Thus, it's highly recommended that you use the List returned by subList only as a transient object, to perform one or a sequence of range operations on the backing List. The longer you use the subList object, the greater the probability that you'll compromise it by modifying the backing List directly (or through another subList object).
Algorithms
Most of the polymorphic algorithms in the Collections class apply specifically to List. Having all of these algorithms at your disposal makes it very easy to manipulate lists. Here's a summary of these algorithms, which are described in more detail in the Algorithms lesson.
• sort(List): Sorts a List using a merge sort algorithm, which provides a fast, stable sort. (A stable sort is one that does not reorder equal elements.)
• shuffle(List): Randomly permutes the elements in a List. (Shown above.)
• reverse(List): Reverses the order of the elements in a List.
• fill(List, Object): Overwrites every element in a List with the specified value.
• copy(List dest, List src): Copies the source List into the destination List.
• binarySearch(List, Object): Searches for an element in an ordered List using the binary search algorithm.
The Map Interface
A Map is an object that maps keys to values.
A map cannot contain duplicate keys:
Each key can map to at most one value.
The Map interface is shown below:
public interface Map {
// Basic Operations
Object put(Object key, Object value);
Object get(Object key);
Object remove(Object key);
boolean containsKey(Object key);
boolean containsValue(Object value);
int size();
boolean isEmpty();
// Bulk Operations
void putAll(Map t);
void clear();
// Collection Views
public Set keySet();
public Collection values();
public Set entrySet();
// Interface for entrySet elements
public interface Entry {
Object getKey();
Object getValue();
Object setValue(Object value);
}
}
The JDK contains two new general-purpose Map implementations.
HashMap , which stores its entries in a hash table, is the best-performing implementation.
TreeMap , which stores its entries in a red-black tree, guarantees the order of iteration. Also,
Hashtable has been retrofitted to implement Map. For more information on implementations, see the Implementations lesson.
Comparison to Hashtable
If you've used Hashtable, you're already familiar with the general flavor of Map. (Of course Map is an interface, while Hashtable is a concrete implementation.) Here are the major differences:
• Map provides Collection-views in lieu of direct support for iteration via Enumeration objects. Collection-views greatly enhance the expressiveness of the interface, as discussed later in this lesson.
• Map allows you to iterate over keys, values, or key-value pairs; Hashtable did not provide the third option.
• Map provides a safe way to remove entries in the midst of iteration; Hashtable did not.
• Further, Map fixes a minor deficiency in the Hashtable interface. Hashtable has a method called contains, which returns true if the Hashtable contains a given value. Given its name, you'd expect this method to return true if the Hashtable contained a given key, as the key is the primary access mechanism for a Hashtable. The Map interface eliminates this source of confusion by renaming the method containsValue. Also, this improves the consistency of the interface: containsValue parallels containsKey nicely.
Basic Operations
The basic operations (put, get, remove, containsKey, containsValue, size, and isEmpty) behave exactly like their counterparts in Hashtable. Here's a simple program to generate a frequency table of the words found in its argument list. The frequency table maps each word to the number of times it occurs in the argument list.
import java.util.*;
public class Freq {
private static final Integer ONE = new Integer(1);
public static void main(String args[]) {
Map m = new HashMap();
// Initialize frequency table from command line
for (int i=0; i
m.put(args[i], (freq==null ? ONE :
new Integer(freq.intValue() + 1)));
}
System.out.println(m.size()+" distinct words detected:");
System.out.println(m);
}
}
The only thing even slightly tricky about this program is the second argument of the put statement. It's a conditional expression that has the effect of setting the frequency to one if the word has never been seen before, or one more than its current value if the word has already been seen. Let's run the program:
% java Freq if it is to be it is up to me to delegate
8 distinct words detected:
{to=3, me=1, delegate=1, it=2, is=2, if=1, be=1, up=1}
Suppose you'd prefer to see the frequency table in alphabetical order. All you have to do is change the implementation type of the Map from HashMap to TreeMap. Making this four character change causes the program to generate the following output from the same command line:
8 distinct words detected:
{be=1, delegate=1, if=1, is=2, it=2, me=1, to=3, up=1}
Are interfaces cool, or what?
Like the Set and List interfaces, Map strengthens the requirements on the equals and hashCode methods so that two Map objects can be compared for logical equality without regard to their implementation types. Two Map objects are equal if they represent the same key-value mappings.
By convention, all Map implementations provide constructors that take a Map object and initialize the new Map to contain all of the key-value mappings in the specified Map. This standard Map constructor is entirely analogous to the standard collection constructor for Collection implementations. It allows the caller to create a Map of a desired implementation type that initially contains all of the mappings in another Map, regardless of the other Map's implementation type. For example, suppose you have a Map, named m. The following one-liner creates a new HashMap initially containing all of the same key-value mappings as m:
Map copy = new HashMap(m);
Bulk Operations
The clear operation does exactly what you think it does: it removes all of the mappings from the Map. The putAll operation is the Map analogue of the Collection interface's addAll operation. In addition to its obvious use of dumping one Map into another, it has a second, more subtle, use. Suppose a Map is used to represent a collection of attribute-value pairs; the putAll operation, in combination with the standard Map constructor, provides a neat way to implement attribute map creation with default values. Here's a static factory method demonstrating this technique:
static Map newAttributeMap(Map defaults, Map overrides) {
Map result = new HashMap(defaults);
result.putAll(overrides);
return result;
}
Collection Views
The Collection-view methods allow a Map to be viewed as a Collection in three ways:
• keySet: the Set of keys contained in the Map.
• values: The Collection of values contained in the Map. This Collection is not a Set, as multiple keys can map to the same value.
• entrySet: The Set of key-value pairs contained in the Map. The Map interface provides a small nested interface called Map.Entry that is the type of the elements in this Set.
The Collection-views provide the only means to iterate over a Map. Here's an example illustrating the standard idiom for iterating over the keys in a Map:
for (Iterator i=m.keySet().iterator(); i.hasNext(); )
System.out.println(i.next());
The idiom for iterating over values is analogous. Here's the idiom for iterating over key-value pairs:
for (Iterator i=m.entrySet().iterator(); i.hasNext(); ) {
Map.Entry e = (Map.Entry) i.next();
System.out.println(e.getKey() + ": " + e.getValue());
}
When first presented with these idioms, many people worry that they may be slow because the Map has to create a new Collection object each time a Collection-view operation is called. Rest easy: This is not the case. There's no reason that a Map can't always return the same object each time it is asked for a given Collection-view. This is precisely what all of the JDK's Map implementations do.
With all three Collection-views, calling an Iterator's remove operation removes the associated entry from the backing Map (assuming the Map supports element removal to begin with). With the entrySet view, it is also possible to change the value associated with a key, by calling a Map.Entry's setValue method during iteration (again, assuming the Map supports value modification to begin with). Note that these are the only safe ways to modify a Map during iteration; the behavior is unspecified if the underlying Map is modified in any other way while the iteration is in progress.
The Collection-views support element removal in all its many forms: the remove, removeAll, retainAll, and clear operations, as well as the Iterator.remove operation. (Yet again, this assumes that the backing Map supports element removal.)
The Collection-views do not support element addition under any circumstances. It would make no sense for the keySet and values views, and it's unnecessary for the entrySet view, as the backing Map's put and putAll provide the same functionality.
Fancy Uses of Collection-Views: Map Algebra
When applied to the Collection-views, the bulk operations (containsAll, removeAll and retainAll) are a surprisingly potent tool. For starters, suppose you want to know whether one Map is a submap of another, that is, whether the first Map contains all of the key-value mappings in the second. The following idiom does the trick:
if (m1.entrySet().containsAll(m2.entrySet())) {
...
}
Along similar lines, suppose that you want to know if two Map objects contain mappings for all the same keys:
if (m1.keySet().equals(m2.keySet())) {
...
}
Suppose you have a map that represents a collection of attribute-value pairs, and two sets, representing required attributes and permissible attributes. (The permissible attributes include the required attributes.) The following snippet determines whether the attribute map conforms to these constraints, and prints a detailed error message if it doesn't:
boolean valid = true;
Set attributes = attributeMap.keySet();
if(!attributes.containsAll(requiredAttributes)) {
Set missing = new HashSet(requiredAttributes);
missing.removeAll(attributes);
System.out.println("Missing required attributes: "+missing);
valid = false;
}
if (!permissibleAttributes.containsAll(attributes)) {
Set illegal = new HashSet(attributes);
illegal.removeAll(permissibleAttributes);
System.out.println("Contains illegal attributes: "+illegal);
valid = false;
}
if (valid)
System.out.println("OK");
Suppose you want to know all the keys common to two Map objects:
Set commonKeys = new HashSet(a.keySet());
commonKeys.retainAll(b.keySet());
A similar idiom gets you the common values, and the common key-value pairs. Extra care is needed if you get the common key-value pairs, as the elements of the resulting Set, which are Map.Entry objects, may become invalid if the Map is modified.
All the idioms presented thus far have been "non-destructive": They don't modify the backing Map. Here are a few that do. Suppose you want to remove all the key-value pairs that one Map has in common with another:
m1.entrySet().removeAll(m2.entrySet());
Suppose you want to remove from one Map all the keys that have mappings in another:
m1.keySet().removeAll(m2.keySet());
And what happens when you start mixing keys and values in the same bulk operation? Suppose you have a Map called managers that maps each employee in a company to the employee's manager. We'll be deliberately vague about the types of the key and value objects. It doesn't matter, so long as they're the same. Now suppose you want to know who all the "individual contributors" are. (This is corporate-speak for employees who are not managers.) The following one-liner tells you exactly what you want to know:
Set individualContributors = new HashSet(managers.keySet());
individualContributors.removeAll(managers.values());
Suppose you want to fire all of the employees who report directly to some manager (we'll call him herbert):
Employee herbert = ... ;
managers.values().removeAll(Collections.singleton(herbert));
Note that this idiom makes use of Collections.singleton, a static factory method that returns an immutable Set with the single, specified element.
Once you've done this, you may have a bunch of employees whose managers no longer work for the company (if any of herbert's direct-reports were themselves managers). The following code tells you all of the employees whose manager no longer works for the company:
Map m = new HashMap(managers);
m.values().removeAll(managers.keySet());
Set slackers = m.keySet();
This example is a bit tricky. First it makes a temporary copy of the Map. Then it removes from the temporary copy all entries whose (manager) value is a key in the original Map. Recall that the original Map contains an entry for every employee. Thus, the remaining entries in the temporary Map comprise all the entries from the original Map whose (manager) values are no longer employees. The keys in the temporary copy, then, represent precisely the employees that we're looking for. If you stare at the example for a little while, this should all become clear. If it doesn't, now would be a good time to get a steaming hot mug of freshly brewed Java.
There are many, many more idioms like the ones contained in this section but it would be impractical and tedious to list them all. Once you get the hang of it, it's not that hard to come up with the right one when you need it.
Multimaps
• A multimap is like a map, except it can map each key to multiple values.
• The Collections Framework doesn't include an interface for multimaps, because they aren't used all that commonly. It's a fairly simple matter to use a Map whose values are List objects as a multimap. This technique is demonstrated in the next code example, which reads a dictionary containing one word per line (all lowercase) and prints out all of the permutation groups in the dictionary that meet a size criterion. A permutation group is a bunch of words all of which contain exactly the same letters, but in a different order. The program takes two arguments on the command line: the name of the dictionary file, and the minimum size of permutation group to print out. Permutation groups containing fewer words than the specified minimum are not printed.
There is a standard trick for finding permutation groups: for each word in the dictionary, alphabetize the letters in the word (that is, reorder the word's letters into alphabetical order) and put an entry into a multimap, mapping the alphabetized word to the original word. For example, the word "bad" causes an entry mapping "abd" into "bad" to be put into the multimap. A moment's reflection will show that all of the words to which any given key maps form a permutation group. It's a simple matter to iterate over the keys in the multimap, printing out each permutation group that meets the size constraint.
The following program is a straightforward implementation of this technique. The only tricky part is the alphabetize method, which returns a string containing the same characters as its argument, in alphabetical order. This routine (which has nothing to do with the Collections Framework) implements a slick bucket sort. It assumes that the word being alphabetized consists entirely of lowercase alphabetic characters.
import java.util.*;
import java.io.*;
public class Perm {
public static void main(String[] args) {
int minGroupSize = Integer.parseInt(args[1]);
// Read words from file and put into simulated multimap
Map m = new HashMap();
try {
BufferedReader in =
new BufferedReader(new FileReader(args[0]));
String word;
while((word = in.readLine()) != null) {
String alpha = alphabetize(word);
List l = (List) m.get(alpha);
if (l==null)
m.put(alpha, l=new ArrayList());
l.add(word);
}
} catch(IOException e) {
System.err.println(e);
System.exit(1);
}
// Print all permutation groups above size threshold
for (Iterator i = m.values().iterator(); i.hasNext(); ) {
List l = (List) i.next();
if (l.size() >= minGroupSize)
System.out.println(l.size() + ": " + l);
}
}
private static String alphabetize(String s) {
int count[] = new int[256];
int len = s.length();
for (int i=0; i
StringBuffer result = new StringBuffer(len);
for (char c='a'; c<='z'; c++)
for (int i=0; i
return result.toString();
}
}
Running the program on an 80,000 word dictionary takes about 16 seconds on an aging UltraSparc 1. With a minimum permutation group size of eight, it produces the following output:
% java Perm dictionary.txt 8
9: [estrin, inerts, insert, inters, niters, nitres, sinter,
triens, trines]
8: [carets, cartes, caster, caters, crates, reacts, recast, traces]
9: [capers, crapes, escarp, pacers, parsec, recaps, scrape, secpar,
spacer]
8: [ates, east, eats, etas, sate, seat, seta, teas]
12: [apers, apres, asper, pares, parse, pears, prase, presa, rapes,
reaps, spare, spear]
9: [anestri, antsier, nastier, ratines, retains, retinas, retsina,
stainer, stearin]
10: [least, setal, slate, stale, steal, stela, taels, tales, teals,
tesla]
8: [arles, earls, lares, laser, lears, rales, reals, seral]
8: [lapse, leaps, pales, peals, pleas, salep, sepal, spale]
8: [aspers, parses, passer, prases, repass, spares, sparse, spears]
8: [earings, erasing, gainers, reagins, regains, reginas, searing,
seringa]
11: [alerts, alters, artels, estral, laster, ratels, salter, slater,
staler, stelar, talers]
9: [palest, palets, pastel, petals, plates, pleats, septal, staple,
tepals]
8: [enters, nester, renest, rentes, resent, tenser, ternes, treens]
8: [peris, piers, pries, prise, ripes, speir, spier, spire]
Many of these words seem a bit bogus, but that's not the program's fault; they're in the dictionary file. Here's the dictionary file we used. It is derived from the Public Domain ENABLE benchmark reference word list, at http://personal.riverusers.com/~thegrendel/
Object Ordering
A List l may be sorted as follows:
Collections.sort(l);
If the list consists of String elements, it will be sorted into lexicographic (alphabetical) order. If it consists of Date elements, it will be sorted into chronological order. How does Java know how to do this? It's magic! Well, no. Actually String and Date both implement the Comparable interface. The Comparable interfaces provides a natural ordering for a class, which allows objects of that class to be sorted automatically. The following table summarizes the JDK classes that implement Comparable:
Class Natural Ordering
Byte signed numerical
Character unsigned numerical
Long signed numerical
Integer signed numerical
Short signed numerical
Double signed numerical
Float signed numerical
BigInteger signed numerical
BigDecimal signed numerical
File system-dependent lexicographic on pathname.
String lexicographic
Date chronological
CollationKey locale-specific lexicographic
If you try to sort a list whose elements do not implement Comparable, Collections.sort(list) will throw a ClassCastException . Similarly, if you try to sort a list whose elements cannot be compared to one another, Collections.sort will throw a ClassCastException. Elements that can be compared to one another are called mutually comparable. While it is possible to have elements of different types be mutually comparable, none of the JDK types listed above permit inter-class comparison.
This is all you really need to know about the Comparable interface if you just want to sort lists of comparable elements, or create sorted collections of them. The next section will be of interest to you if you want to implement your own Comparable type.
Writing Your Own Comparable Types
The Comparable interface consists of a single method:
public interface Comparable {
public int compareTo(Object o);
}
The compareTo method compares the receiving object with the specified object, and returns a negative integer, zero, or a positive integer as the receiving object is less than, equal to, or greater than the specified Object. If the specified object cannot be compared to the receiving object, the method throws a ClassCastException.
Here's a class representing a person's name that implements Comparable:
import java.util.*;
public class Name implements Comparable {
private String firstName, lastName;
public Name(String firstName, String lastName) {
if (firstName==null || lastName==null)
throw new NullPointerException();
this.firstName = firstName;
this.lastName = lastName;
}
public String firstName() {return firstName;}
public String lastName() {return lastName;}
public boolean equals(Object o) {
if (!(o instanceof Name))
return false;
Name n = (Name)o;
return n.firstName.equals(firstName) &&
n.lastName.equals(lastName);
}
public int hashCode() {
return 31*firstName.hashCode() + lastName.hashCode();
}
public String toString() {return firstName + " " + lastName;}
public int compareTo(Object o) {
Name n = (Name)o;
int lastCmp = lastName.compareTo(n.lastName);
return (lastCmp!=0 ? lastCmp :
firstName.compareTo(n.firstName));
}
}
To keep the example short, the class is somewhat limited: It doesn't support middle names, it demands both a first and a last name, and it is not internationalized in any way. Nonetheless, it illustrates several important points:
• Name objects are immutable. All other things being equal, immutable types are the way to go, especially for objects that will be used as elements in Sets, or keys in Maps. These collections will break if you modify their elements or keys while they're in the collection.
• The constructor checks its arguments for null. This ensures that all Name objects are well-formed, so that none of the other methods will ever throw a NullPointerException.
• The hashCode method is redefined. This is essential for any class that redefines the equals method. It is required by the general contract for Object.equals. (Equal objects must have equal hash codes.)
• The equals method returns false if the specified object is null, or of an inappropriate type. The compareTo method throws a runtime exception under these circumstances. Both of these behaviors are required by the general contracts of the respective methods.
• The toString method has been redefined to print the Name in human-readable form. This is always a good idea, especially for objects that are going to get put into collections. The various collection types' toString methods depend on the toString methods of their elements, keys and values.
Since this section is about element ordering, let's talk a bit more about Name's compareTo method. It implements the standard name-ordering algorithm, where last names take precedence over first names. This is exactly what you want in a natural ordering. It would be very confusing if the natural ordering were unnatural!
Take a look at how compareTo is implemented, because it's quite typical. First, you cast the Object argument to the appropriate type. This throws the appropriate exception (ClassCastException) if the arguments type is inappropriate. Then you compare the most significant part of the object (in this case, the last name). Often, you can just use the natural ordering of the part's type. In this case, the part is a String, and the natural (lexicographic) ordering is exactly what's called for. If the comparison results in anything other than zero (which represents equality), you're done: you just return the result. If the most significant parts are equal, you go on to compare the next-most-significant parts. In this case, there are only two parts (first name and last name). If there were more parts, you'd proceed in the obvious fashion, comparing parts until you found two that weren't equal (or you were comparing the least-significant parts), at which point you'd return the result of the comparison.
Just to show that it all works, here's a little program that builds a list of Name objects and sorts them :
import java.util.*;
class NameSort {
public static void main(String args[]) {
Name n[] = {
new Name("John", "Lennon"),
new Name("Karl", "Marx"),
new Name("Groucho", "Marx"),
new Name("Oscar", "Grouch")
};
List l = Arrays.asList(n);
Collections.sort(l);
System.out.println(l);
}
}
If you run this program, here's what it prints:
[Oscar Grouch, John Lennon, Groucho Marx, Karl Marx]
There are four restrictions on the behavior of the compareTo method, which we won't go over now because they're fairly technical and boring and are better left in the API documentation. It's really important that all classes that implement Comparable obey these restrictions, so read the documentation for Comparable if you're writing a class that implements it. Attempting to sort a list of objects that violate these restrictions has undefined behavior. Technically speaking, these restrictions ensure that the natural ordering is a partial order on the objects of a class that implements it; this is necessary to ensure that sorting is well-defined.
Comparators
OK, so now you know about natural ordering. But what if you want to sort some objects in some order other than their natural order? Or what if you want to sort some objects that don't implement Comparable? To do either of these things, you'll need to provide a Comparator . A Comparator is simply an object that encapsulates an ordering. Like the Comparable interface, the Comparator interface consists of a single method:
public interface Comparator {
int compare(Object o1, Object o2);
}
The compare method compares its two arguments, returning a negative integer, zero, or a positive integer as the first argument is less than, equal to, or greater than the second. If either of the arguments has an inappropriate type for the Comparator, the compare method throws a ClassCastException.
Much of what was said about Comparable in the previous section applies to Comparator as well. Writing a compare method is nearly identical to writing a compareTo method, except that the former gets both objects passed in as arguments. The compare method has to obey the same four "technical restrictions" as Comparable's compareTo method, for the same reason: a Comparator must induce a partial order on the objects it compares.
Suppose you have a class called EmployeeRecord:
public class EmployeeRecord implements Comparable {
public Name name();
public int employeeNumber();
public Date hireDate();
...
}
Let's assume that the natural ordering of EmployeeRecord objects is Name-ordering (as defined in the previous example) on employee name. Unfortunately the boss has asked us for a list of employees in order of seniority. This means we actually have to do some work, but not much. Here's a program that will produce the required list:
import java.util.*;
class EmpSort {
static final Comparator SENIORITY_ORDER = new Comparator() {
public int compare(Object o1, Object o2) {
EmployeeRecord r1 = (EmployeeRecord) o1;
EmployeeRecord r2 = (EmployeeRecord) o2;
return r2.hireDate().compareTo(r1.hireDate());
}
};
static final Collection employees = ... ; // Employee Database
public static void main(String args[]) {
List emp = new ArrayList(employees);
Collections.sort(emp, SENIORITY_ORDER);
System.out.println(emp);
}
}
The Comparator in the above program is reasonably straightforward. It casts its arguments to EmployeeRecord, and relies on the natural ordering of Date applied to the hireDate() accessor method. Note that the Comparator passes the hire-date of its second argument to its first, rather than vice-versa. This is because the employee who was hired most recently is least senior: sorting in order of hire-date would put the list in reverse seniority-order. Another way to achieve the same effect would be to maintain the argument order but negate the result of the comparison:
return -r1.hireDate().compareTo(r2.hireDate());
The two techniques are equally preferable. Use whichever looks best to you.
The Comparator in the above program works fine for sorting a List, but it does have one deficiency: it cannot be used to order a sorted collection (such as TreeSet ) because it generates a strictly partial ordering. What this means is that this comparator equates unequal objects. In particular, any two employees who were hired on the same date will compare as equal. When you're sorting a List, this doesn't matter, but when you're using the Comparator to order a sorted collection, it's fatal. If you insert multiple employees who were hired on the same date into a TreeSet with this Comparator, only the first one will be added to the set. The second will be seen as a duplicate element, and ignored.
To fix this problem, all you have to do is tweak the Comparator so that it produces a total ordering. In other words, tweak it so that the only elements that are seen as equal when using compare are those that are also seen as equal when compared using equals. The way to do this is to do a two-part comparison (like we did for Name) where the first part is the one that we're actually interested in (in this case, the hire-date), and the second part is attribute that uniquely identifies the object. In this case, the employee number is the obvious attribute to use as the second part. Here's the Comparator that results:
static final Comparator SENIORITY_ORDER = new Comparator() {
public int compare(Object o1, Object o2) {
EmployeeRecord r1 = (EmployeeRecord) o1;
EmployeeRecord r2 = (EmployeeRecord) o2;
int dateCmp = r2.hireDate().compareTo(r1.hireDate());
if (dateCmp != 0)
return dateCmp;
return (r1.employeeNumber() < r2.employeeNumber() ? -1 :
(r1.employeeNumber() == r2.employeeNumber() ? 0 : 1));
}
};
One last note. You might be tempted to replace the final return statement in the Comparator with the simpler:
return r1.employeeNumber() - r2.employeeNumber();
Don't do it unless you're absolutely sure that no one will ever have a negative employee number! This trick does not work in general, as the signed integer type is not big enough to represent the difference of two arbitrary signed integers. If i is a large positive integer and j is a large negative integer, i-j will overflow and return a negative integer. The resulting Comparator violates one of the four technical restrictions that we keep talking about (transitivity), and produces horrible, subtle bugs. This is not a purely theoretical concern; people get burned by it.
The SortedSet Interface
A SortedSet is a Set that maintains its elements in ascending order, sorted according to the elements' natural order, or according to a Comparator provided at SortedSet creation time. (Natural order and Comparators are discussed in the previous section, on Object Ordering.) In addition to the normal Set operations, the Set interface provides operations for:
• Range-view: Performs arbitrary range operations on the sorted set.
• Endpoints: Returns the first or last element in the sorted set.
• Comparator access: Returns the Comparator used to sort the set (if any).
The SortedSet interface is shown below:
public interface SortedSet extends Set {
// Range-view
SortedSet subSet(Object fromElement, Object toElement);
SortedSet headSet(Object toElement);
SortedSet tailSet(Object fromElement);
// Endpoints
Object first();
Object last();
// Comparator access
Comparator comparator();
}
Set Operations
The operations that SortedSet inherits from Set behave identically on sorted sets and normal sets with two exceptions:
• The Iterator returned by the iterator operation traverses the sorted set in order.
• The array returned by toArray contains the sorted set's elements in order.
Although the interface doesn't guarantee it, the toString method of the JDK's SortedSet implementations returns a string containing all the elements of the sorted set, in order.
Standard Constructors
By convention, all Collection implementations provide a standard constructor that takes a Collection, and SortedSet implementations are no exception. This constructor creates a SortedSet object that orders its elements according to their natural order. Additionally, by convention, SortedSet implementations provide two other standard constructors:
• One that takes a Comparator and returns a new (empty) SortedSet sorted according to the specified Comparator.
• One that takes a SortedSet and returns a new SortedSet containing the same elements as the given SortedSet, and sorted according to the same Comparator (or using the elements' natural ordering, if the specified SortedSet did too). Note that the compile-time type of the argument determines whether this constructor is invoked in preference to the ordinary Set constructor, and not the runtime type!
The first of these standard constructors is the normal way to create an empty SortedSet with an explicit Comparator. The second is similar in spirit to the standard Collection constructor: it creates a copy of a SortedSet with the same ordering, but with a programmer-specified implementation type.
Range-view Operations
The Range-view operations are somewhat analogous to those provided by the List interface, but there is one big difference. Range-views of a sorted set remain valid even if the backing sorted set is modified directly. This is feasible because the endpoints of a range view of a sorted set are absolute points in the element-space, rather than specific elements in the backing collection (as is the case for lists). A range-view of a sorted set is really just a window onto whatever portion of the set lies in the designated part of the element-space. Changes to the range-view write back to the backing sorted set and vice-versa. Thus, it's OK to use range-views on sorted sets for long periods of time (unlike range-views on lists).
Sorted sets provide three range-view operations. The first, subSet takes two endpoints (like subList). Rather than indices, the endpoints are objects. They must be comparable to the elements in the sorted set (using the set's Comparator or the natural ordering of its elements, whichever the set uses to order itself). Like subList the range is half-open, including its low endpoint but excluding the high one.
Thus, the following one-liner tells you how many words between "doorbell" and "pickle", including "doorbell" but excluding "pickle", are contained in a SortedSet of strings called dictionary:
int count = dictionary.subSet("doorbell", "pickle").size();
Similarly, the following one-liner removes all of the elements beginning with the letter "f" (a rather heavy-handed approach to censorship?):
dictionary.subSet("f", "g").clear();
A similar trick can be used to print a table telling you how many words begin with each letter:
for (char ch='a'; ch<='z'; ch++) {
String from = new String(new char[] {ch});
String to = new String(new char[] {(char)(ch+1)});
System.out.println(from + ": " +
dictionary.subSet(from, to).size());
}
Suppose that you want to view a closed interval (which contains both its endpoints) instead of an open interval. If the element type allows for the calculation of the successor a given value (in the element-space), merely request the subSet from lowEndpoint to successor(highEndpoint) . Although it isn't entirely obvious, the successor of a string s in String's natural ordering is s+"\0" (that is, s with a null character appended).
Thus, the following one-liner tells you how many words between "doorbell" and "pickle," including "doorbell" and "pickle," are contained in a the dictionary:
int count = dictionary.subSet("doorbell", "pickle\0").size();
A similarly technique can be used to view an open interval (which contains neither endpoint). The open interval view from lowEndpoint to highEndpoint is the half-open interval from successor(lowEndpoint) to highEndpoint. To calculate the number of words between "doorbell" and "pickle", excluding both:
int count = dictionary.subSet("doorbell\0", "pickle").size();
The SortedSet interface contains two more range-view operations, headSet and tailSet, both of which take a single Object argument. The former returns a view of the initial portion of the backing SortedSet, up to but not including the specified object. The latter returns a view of the final portion of the the backing SortedSet, beginning with the specified object, and continuing to the end of the backing SortedSet Thus, the following code allows you to view the dictionary as two disjoint "volumes" (a-m and n-z):
SortedSet volume1 = dictionary.headSet("n");
SortedSet volume2 = dictionary.tailSet("n");
Endpoint Operations
The SortedSet interface contains operations to return the first and last elements in the sorted set, called (not surprisingly) first and last. In addition to their obvious uses, last allows a workaround for a deficiency in the SortedSet interface. One thing you'd like to do with a SortedSet is to go into the interior of the set and iterate forwards or backwards. It's easy enough to go forwards from the interior: Just get a tailSet and iterate over it. Unfortunately, there's no easy way to go backwards.
The following idiom obtains the first element in a sorted set that is less than a specified object o in the element-space:
Object predecessor = ss.headSet(o).last();
This is a fine way to go one element backwards from a point in the interior of a sorted set. It could be applied repeatedly to iterate backwards, but unfortunately this is very inefficient, requiring a lookup for each element returned.
Comparator Accessor
The SortedSet interface contains an accessor method called comparator that returns the Comparator used to sort the set, or null if the set is sorted according to the natural order of its elements. This method is provided so that sorted sets can be copied into new sorted sets with the same ordering. It is used by the standard SortedSet constructor, described above.
The SortedMap Interface
A SortedMap is a Map that maintains its entries in ascending order, sorted according to the keys' natural order, or according to a Comparator provided at SortedMap creation time. (Natural order and Comparators are discussed in the section on Object Ordering.) In addition to the normal Map operations, the Map interface provides operations for:
• Range-view: Performs arbitrary range operations on the sorted map.
• Endpoints: Returns the first or last key in the sorted map.
• Comparator access: Returns the Comparator used to sort the map (if any).
public interface SortedMap extends Map {
Comparator comparator();
SortedMap subMap(Object fromKey, Object toKey);
SortedMap headMap(Object toKey);
SortedMap tailMap(Object fromKey);
Object firstKey();
Object lastKey();
}
This interface is the Map analogue of SortedSet .
Map Operations
The operations that SortedMap inherits from Map behave identically on sorted maps and normal maps with two exceptions:
• The Iterator returned by the iterator operation on any of the sorted map's Collection-views traverse the collections in key-order.
• The arrays returned by the Collection-views' toArray operations contain the keys, values, or entries in key-order.
Although it isn't guaranteed by the interface, the toString method of the Collection-views in all the JDK's SortedMap implementations returns a string containing all the elements of the view, in key-order.
For example, consider the following snippet:
SortedMap m = new TreeMap();
m.put("Sneezy", "common cold");
m.put("Sleepy", "narcolepsy");
m.put("Grumpy", "seasonal affective disorder");
System.out.println(m.keySet());
System.out.println(m.values());
System.out.println(m.entrySet());
Running this snippet produces this output:
[Grumpy, Sleepy, Sneezy]
[seasonal affective disorder, narcolepsy, common cold]
[Grumpy=seasonal affective disorder, Sleepy=narcolepsy, Sneezy=common cold]
Standard Constructors
By convention, all Map implementations provide a standard constructor that takes a Map, and SortedMap implementations are no exception. This constructor creates a SortedMap object that orders its entries according to their keys' natural order. Additionally, by convention, SortedMap implementations provide two other standard constructors:
• One that takes a Comparator and returns a new (empty) SortedMap sorted according to the specified Comparator.
• One that takes a SortedMap and returns a new SortedMap containing the same mappings as the given SortedMap, and sorted according to the same Comparator (or using the elements' natural ordering, if the specified SortedMap did too). Note that it is the compile-time type of the argument that determines whether this constructor is invoked in preference to the ordinary Map constructor, and not its runtime type!
The first of these standard constructors is the normal way to create an empty SortedMap with an explicit Comparator. The second is similar in spirit to the standard Map constructor: It creates a copy of a SortedMap with the same ordering, but with a programmer specified implementation type.
The following snippet illustrates how this works:
final Comparator FUNNY_COMPARATOR = ... ;
Map m = new TreeMap(FUNNY_COMPARATOR);
// ... code to populate m
Map m2 = new TreeMap(m); // invokes TreeMap(Map)
Map m3 = new TreeMap((SortedMap)m) // invokes TreeMap(SortedMap)
Note that m2.comparator() will return null while m3.comparator() will return FUNNY_COMPARATOR. In other words, m2 is sorted according to its keys' natural ordering, while m3 is sorted according to the ordering induced by FUNNY_COMPARATOR.
Comparison to SortedSet
Because this interface is a precise Map analogue of SortedSet, all of the idioms and code examples in the SortedSet section apply to SortedMap, with only trivial modifications.
Lesson: Implementations
Implementations are the actual data objects used to store collections, which implement the core collection interfaces described in the previous lesson. The sections that follow describe three kinds of implementations:
General-purpose Implementations
General-purpose implementations are the public classes that provide the primary implementations of the core collection interfaces.
Wrapper Implementations
Wrapper implementations are used in combination with other implementations (often the general-purpose implementations) to provide added functionality.
Convenience Implementations
Convenience implementations are mini-implementations, typically made available via static factory methods that provide convenient, efficient alternatives to the general-purpose implementations for special collections (like singleton sets).
Additionally, you can build your own implementations, based on the JDK's abstract implementations. This is described in a separate lesson because it's an advanced topic. It's not particularly hard, but relatively few people will need to do it.
General Purpose Implementations
The general-purpose implementations are summarized in the table below. The table highlights their regular naming pattern: names are all of the form
Implementations
Hash Table Resizable Array Balanced Tree Linked List
Interfaces Set HashSet TreeSet
List ArrayList LinkedList
Map HashMap TreeMap
JDK 1.2 provides two implementations of each interface (with the exception of Collection , which has no direct implementations, but serves as a least common denominator for the other collection interfaces). In each case, one implementation is clearly the primary implementation: the one to use, all other things being equal. The primary implementations are HashSet, ArrayList and HashMap. Note that the SortedSet and SortedMap interfaces do not have rows in the table above. Each of these interfaces has one implementation and these implementations (TreeSet and TreeMap) are listed in the Set and Map rows.
Not only do the implementations have consistent names, but they have consistent behavior as well. All of them implement all the optional operations contained in their interfaces. All permit null elements, keys and values. Each one is unsynchronized. All have fail-fast iterators, which detect illegal concurrent modification during iteration and fail quickly and cleanly, rather than risking arbitrary, non-deterministic behavior at an undetermined time in the future. All are Serializable, and all support a public clone method.
The fact that the new implementations are unsynchronized represents a break with the past: Vector and Hashtable, which were introduced in JDK 1.0, are synchronized. The new approach was taken because it was recognized that collections are frequently used in a manner where the synchronization is of no benefit. Such uses include single-threaded use, read-only use, and use as part of a larger data object that does its own synchronization. In general, it is good API design practice not to make users pay for a feature that they generally don't use. Further, unnecessary synchronization can result in deadlock under certain circumstances.
If you need a synchronized collection, the synchronization wrappers, described in the next section, allow any collection to be transformed into a synchronized collection. Thus, synchronization is optional for the new collection implementations where it was mandatory for the old.
As a rule of thumb, you should be thinking about the interfaces rather than the implementations. That is why there are no programming examples in this lesson. For the most part, the choice of implementation affects only performance. The preferred style, as was mentioned in the interfaces lesson, is to choose an implementation when a collection is created, and immediately assign the new collection to a variable of the corresponding interface type (or pass the collection to a method expecting an argument of the interface type). In this way, the program does not become dependent on any added methods in a given implementation, leaving the programmer or maintainer with the freedom to change implementations at the drop of a hat, if performance concerns dictate that it is the right thing to do.
The general purposes implementations are briefly discussed below. The performance of the implementations is described using words like constant, log, linear, n log(n) and quadratic. These words refer to the asymptotic upper bound on the time complexity of performing the operation. All of this is quite a mouthful, and it doesn't matter much if you don't know what it means. If you're interested, any good algorithms textbook explains this stuff. One thing to keep in mind is that this sort of performance metric has its limitations. Sometimes the nominally slower implementation may be faster for the collection size that you're actually using. When in doubt, measure the performance.
Set
The two general purpose Set implementations are HashSet and TreeSet . It's very straightforward to decide which of these two to use. HashSet is much faster (constant time vs. log time for most operations), but offers no ordering guarantees. If you need to use the operations in the SortedSet, or in-order iteration is important to you, use TreeSet. Otherwise, use HashSet. It's a fair bet that you'll end up using HashSet most of the time.
One thing worth keeping in mind about HashSet is that iteration is linear in the sum of the number of entries and the number of buckets (the capacity). Thus, it's important to choose an appropriate initial capacity if iteration performance is important. Choosing a capacity that's too high can waste space as well as time. The default initial capacity is 101, and that's often more than you need. The initial capacity may be specified using the int constructor. To allocate a HashSet whose initial capacity is 17:
Set s= new HashSet(17);
HashSets have one other "tuning parameter" called the load factor. If you care deeply about the space consumption of your HashSet, read the HashSet documentation for more information. Otherwise just live with the default. If you accept the default load factor but you do want to specify an initial capacity, pick a number that's about twice the size that you expect the Set to grow to. If your guess is way off, it may have to grow or you may waste a bit of space, but either way it's no big problem. If you know a prime number of about the right size, use it. If not, use an odd number. Or use an even number. It doesn't really matter much; these things might make the HashSet perform a wee bit better, but nothing to write home about.
TreeSet has no tuning parameters. With the exception of clone, neither HashSet nor TreeSet have any operations other than those required by their respective interfaces (Set and TreeSet).
List
The two general purpose List implementations are ArrayList and LinkedList . Most of the time, you'll probably use ArrayList. It offers constant time positional access, and it's just plain fast, because it does not have to allocate a node object for each element in the List, and it can take advantage of the native method System.arraycopy when it has to move multiple elements at once. Think of ArrayList as Vector without the synchronization overhead.
If you frequently add elements to the beginning of the List, or iterate over the List deleting elements from its interior, you might want to consider LinkedList. These operations are constant time in a LinkedList but linear time in an ArrayList. But you pay a big price! Positional access is linear time in a LinkedList and constant time in an ArrayList. Furthermore, the constant factor for LinkedList is much worse. If you think that you want to use a LinkedList, measure the performance with both LinkedList and ArrayList. You may be surprised.
ArrayList has one tuning parameter, the initial capacity. It refers to the number of elements the ArrayList can hold before it has to grow. There's not much to say about it. The only ArrayList operations that are not required by List are ensureCapacity and trimToSize (which alter the excess capacity), and clone.
LinkedList has no tuning parameters, and seven optional operations, one of which is clone. The other six are addFirst, getFirst, removeFirst, addLast, getLast, and removeLast; I have very mixed feelings about them. They make it a bit more convenient to use a LinkedList as a queue or a double-ended queue (dequeue), but they prevent you from easily switching representations when you discover that ArrayList is faster.
If you need synchronization, a Vector will be slightly faster than an ArrayList synchronized with Collections.synchronizedList, but Vector has loads of legacy operations, so be extra careful to always manipulate the Vector with the List interface, or you'll be stuck with it for life.
If your List is fixed in size (that is, you'll never use remove, add or any of the bulk operations other than containsAll) you have a third option that's definitely worth considering. See Arrays.asList in the convenience implementations section.
Map
The two general purpose Map implementations are HashMap and TreeMap . The situation for Map is exactly analogous to Set. If you need SortedMap operations or in-order Collection-view iteration, go for TreeMap; otherwise, go for HashMap. Everything else in the Set section also applies to Map so just re-read it.
Completeness requires that we mention Hashtable . As with Vector and ArrayList, if you need synchronization, a Hashtable will be slightly faster than a HashMap synchronized with Collections.synchronizedMap. Again, Hashtable has loads of legacy operations, so be extra careful always to manipulate it with the Map interface, or you'll be stuck with it for life.
Wrapper Implementations
Wrapper implementations are implementations that delegate all of their real work to a specified collection, but add some extra functionality on top of what this collection offers. For design patterns fans, this is an example of the decorator pattern. While it may seem a bit exotic, it's really pretty straightforward.
These implementations are anonymous: rather than providing a public class, the JDK provides a static factory method. All of these implementations are found in the Collections API which consists solely of static methods.
Synchronization Wrappers
The synchronization wrappers add automatic synchronization (thread-safety) to an arbitrary collection. There is one static factory method for each of the six core collection interfaces:
public static Collection synchronizedCollection(Collection c);
public static Set synchronizedSet(Set s);
public static List synchronizedList(List list);
public static Map synchronizedMap(Map m);
public static SortedSet synchronizedSortedSet(SortedSet s);
public static SortedMap synchronizedSortedMap(SortedMap m);
Each of these methods returns a synchronized (thread-safe) Collection backed by the specified collection. In order to guarantee serial access, it is critical that all access to the backing collection is accomplished through the returned collection. The easy way to guarantee this is to not to keep a reference to the backing collection. Creating the synchronized collection like this does the trick:
List list = Collections.synchronizedList(new ArrayList());
A collection created in this fashion is every bit as thread-safe as a "normally" synchronized collection like a Vector .
In the face of concurrent access, it is imperative that the user manually synchronize on the returned collection when iterating over it. This is because iteration is accomplished via multiple calls into the collection, which must be composed into a single atomic operation. The idiom to iterate over a wrapper-synchronized collection is:
Collection c = Collections.synchronizedCollection(myCollection);
synchronized(c) {
Iterator i = c.iterator(); // Must be in synchronized block!
while (i.hasNext())
foo(i.next());
}
Failure to follow this advice may result in non-deterministic behavior.
The idiom for iterating over a Collection-view of a synchronized Map is similar, but with one wrinkle. It is imperative that the user manually synchronize on the synchronized Map when iterating over any of its Collection-views, rather than synchronizing on the Collection-view itself:
Map m = Collections.synchronizedMap(new HashMap());
...
Set s = m.keySet(); // Needn't be in synchronized block
...
synchronized(m) { // Synchronizing on m, not s!
Iterator i = s.iterator(); // Must be in synchronized block
while (i.hasNext())
foo(i.next());
}
One minor downside of the wrapper implementation approach is that you do not have the ability to execute any non-interface operations of a wrapped implementation. So, for instance, in the List example above, one cannot call ArrayList's ensureCapacity operation on the wrapped ArrayList.
Unmodifiable Wrappers
The unmodifiable wrappers are conceptually similar to the synchronization wrappers, but simpler. Rather than adding functionality to the wrapped collection, they take it away. In particular, they take away the ability to modify the collection, by intercepting all of the operations that would modify the collection, and throwing an UnsupportedOperationException. The unmodifiable wrappers have two main uses:
• To make a collection immutable once it has been built. In this case, it's good practice not to maintain a reference to the backing collection. This absolutely guarantees immutability.
• To allow "second-class citizens" read-only access to your data structures. You keep a reference to the backing collection, but hand out a reference to the wrapper. In this way, the second-class citizens can look but not touch, while you maintain full access.
Like the synchronization wrappers, there is one static factory method for each of the six core collection interfaces:
public static Collection unmodifiableCollection(Collection c);
public static Set unmodifiableSet(Set s);
public static List unmodifiableList(List list);
public static Map unmodifiableMap(Map m);
public static SortedSet unmodifiableSortedSet(SortedSet s);
public static SortedMap unmodifiableSortedMap(SortedMap m);
Convenience Implementations
This section describes several mini-implementations that can be more convenient and more efficient then the general purpose implementations when you don't need the full power of a general purpose implementation. All of the implementations in this section are made available via static factory methods or exported constants, rather than public classes.
List-view of an Array
The Arrays.asList method returns a List-view of its array argument. Changes to the List write through to the array and vice-versa. The size of the collection is that of the array, and cannot be changed. If the add or remove method is called on the List, an UnsupportedOperationException will result.
The normal use of this implementation is as a bridge between array-based and collection-based APIs. It allows you to pass an array to a method expecting a Collection or a List. However, this implementation has another use. If you need a fixed-size List, it's more efficient than any general-purpose List implementation. Here's the idiom:
List l = Arrays.asList(new Object[size]);
Note that a reference to the backing array is not retained.
Immutable Multiple-Copy List
Occasionally you'll need an immutable List consisting of multiple copies of the same element. The Collections.nCopies method returns such a List. This implementation has two main uses. The first is to initialize a newly created List. For example, suppose you want an ArrayList initially consisting of 1000 null elements. The following incantation does the trick:
List l = new ArrayList(Collections.nCopies(1000, null));
Of course, the initial value of each element needn't be null. The second main use is to grow an existing List. For example, suppose you want to add 69 copies of the string fruit bat to the end of a List. It' not clear why you'd want to do such a thing, but let's just suppose you did. Here's how you'd do it:
lovablePets.addAll(Collections.nCopies(69, "fruit bat"));
By using the form of addAll that takes an index as well as a Collection, you can add the new elements to the middle of a List instead of at the end.
Immutable Singleton Set
Sometimes you'll need an immutable singleton Set, which consists of a single, specified element. The Collections.singleton method returns such a Set. One use of this implementation is this idiom to remove all occurrences of a specified element from a Collection:
c.removeAll(Collections.singleton(e));
There's a related idiom to remove from a Map all elements that map to a specified value. For example, suppose you have a Map, profession, that maps people to their line of work. Suppose you want to eliminate all of the lawyers. This one-liner will do the deed:
profession.values().removeAll(Collections.singleton(LAWYER));
One more use of this implementation is to provide a single input value to a method that is written to accept a Collection of values.
Empty Set and Empty List Constants
The Collections class provides two constants, representing the empty Set and the empty List, Collections.EMPTY_SET and Collections.EMPTY_LIST . It's not clear that these constants really qualify as implementations, but this lesson seemed like as good a place to mention them as any. The main use of these constants is as input to methods that take a Collection of values, when you don't want to provide any values at all.
Lesson: Algorithms
The polymorphic algorithms described in this section are pieces of reusable functionality provided by the JDK. All of them come from the Collections class. All take the form of static methods whose first argument is the collection on which the operation is to be performed. The great majority of the algorithms provided by the Java platform operate on List objects, but a couple of them (min and max) operate on arbitrary Collection objects. The algorithms are described below.
Sorting
The sort algorithm reorders a List so that its elements are ascending order according to some ordering relation. Two forms of the operation are provided. The simple form just takes a List and sorts it according to its elements' natural ordering. If you're unfamiliar with the concept of natural ordering, now would be a good time to read the Object Ordering section.
The sort operation uses a slightly optimized merge sort algorithm. If you don't know what this means but you do care, see any basic textbook on algorithms. The important things to know about this algorithm are that it is:
• Fast: This algorithm is guaranteed to run in n log(n) time, and runs substantially faster on nearly sorted lists. Empirical studies showed it to be as fast as a highly optimized quicksort. Quicksort is generally regarded to be faster than merge sort, but isn't stable, and doesn't guarantee n log(n) performance.
• Stable: That is to say, it doesn't reorder equal elements. This is important if you sort the same list repeatedly on different attributes. If a user of a mail program sorts his in-box by mailing date, and then sorts it by sender, the user naturally expects that the now-contiguous list of messages from a given sender will (still) be sorted by mailing date. This is only guaranteed if the second sort was stable.
Here's a trivial little program that prints out its arguments in lexicographic (alphabetical) order.
import java.util.*;
public class Sort {
public static void main(String args[]) {
List l = Arrays.asList(args);
Collections.sort(l);
System.out.println(l);
}
}
Let's run the program:
% java Sort i walk the line
[i, line, the, walk]
The program was included only to show you that I have nothing up my sleeve: The algorithms really are as easy to use as they appear to be. I won't insult your intelligence by including any more silly examples.
The second form of sort takes a Comparator in addition to a List and sorts the elements with the Comparator. Remember the permutation group example at the end of the Map section? It printed out the permutation groups in no particular order. Suppose you wanted to print them out in reverse order of size, largest permutation group first. The following example below shows you how to achieve this with the help of the second form of the sort method.
Recall that the permutation groups are stored as values in a Map, in the form of List objects. The revised printing code iterates through the Map's values-view, putting every List that passes the minimum-size test into a List of Lists. Then, the code sorts this List using a Comparator that expects List objects, and implements reverse-size ordering. Finally, the code iterates through the now-sorted List, printing its elements (the permutation groups). This code replaces the printing code at the end of Perm's main method:
// Make a List of all permutation groups above size threshold
List winners = new ArrayList();
for (Iterator i = m.values().iterator(); i.hasNext(); ) {
List l = (List) i.next();
if (l.size() >= minGroupSize)
winners.add(l);
}
// Sort permutation groups according to size
Collections.sort(winners, new Comparator() {
public int compare(Object o1, Object o2) {
return ((List)o2).size() - ((List)o1).size();
}
});
// Print permutation groups
for (Iterator i=winners.iterator(); i.hasNext(); ) {
List l = (List) i.next();
System.out.println(l.size() + ": " + l);
}
Running this program on the same dictionary in the Map section, with the same minimum permutation group size (eight) produces the following output:
% java Perm2 dictionary.txt 8
12: [apers, apres, asper, pares, parse, pears, prase, presa, rapes,
reaps, spare, spear]
11: [alerts, alters, artels, estral, laster, ratels, salter, slater,
staler, stelar, talers]
10: [least, setal, slate, stale, steal, stela, taels, tales, teals,
tesla]
9: [estrin, inerts, insert, inters, niters, nitres, sinter, triens,
trines]
9: [capers, crapes, escarp, pacers, parsec, recaps, scrape, secpar,
spacer]
9: [anestri, antsier, nastier, ratines, retains, retinas, retsina,
stainer, stearin]
9: [palest, palets, pastel, petals, plates, pleats, septal, staple,
tepals]
8: [carets, cartes, caster, caters, crates, reacts, recast, traces]
8: [ates, east, eats, etas, sate, seat, seta, teas]
8: [arles, earls, lares, laser, lears, rales, reals, seral]
8: [lapse, leaps, pales, peals, pleas, salep, sepal, spale]
8: [aspers, parses, passer, prases, repass, spares, sparse, spears]
8: [earings, erasing, gainers, reagins, regains, reginas, searing,
seringa]
8: [enters, nester, renest, rentes, resent, tenser, ternes, treens]
8: [peris, piers, pries, prise, ripes, speir, spier, spire]
Shuffling
The shuffle algorithm does the opposite of what sort does: it destroys any trace of order that may have been present in a List. That is to say, it reorders the List, based on input from a source of randomness, such that all possible permutations occur with equal likelihood (assuming a fair source of randomness). This algorithm is useful in implementing games of chance. For example, it could be used to shuffle a List of Card objects representing a deck. Also, it's useful for generating test cases.
There are two forms of this operation. The first just takes a List and uses a default source of randomness. The second requires the caller to provide a Random object to use as a source of randomness. The actual code for this algorithm is used as an example in the List section.
Routine Data Manipulation
The Collections class provides three algorithms for doing routine data manipulation on List objects. All of these algorithms are pretty straightforward:
• reverse: Reverses the order of the elements in a List.
• fill: Overwrites every element in a List with the specified value. This operation is useful for re-initializing a List.
• copy: Takes two arguments, a destination List and a source List, and copies the elements of the source into the destination, overwriting its contents. The destination List must be at least as long as the source. If it is longer, the remaining elements in the destination List are unaffected.
Searching
The binarySearch algorithm searches for a specified element in a sorted List using the binary search algorithm. There are two forms of this algorithm. The first takes a List and an element to search for (the "search key"). This form assumes that the List is sorted into ascending order according to the natural ordering of its elements. The second form of the call takes a Comparator in addition to the List and the search key, and assumes that the List is sorted into ascending order according to the specified Comparator. The sort algorithm (described above) can be used to sort the List prior to calling binarySearch.
The return value is the same for both forms: if the List contains the search key, its index is returned. If not, the return value is (-(insertion point) - 1), where the insertion point is defined as the point at which the value would be inserted into the List: the index of the first element greater than the value, or list.size() if all elements in the List are less than the specified value. This admittedly ugly formula was chosen to guarantee that the return value will be >= 0 if and only if the search key is found. It's basically a hack to combine a boolean ("found") and an integer ("index") into a single int return value.
The following idiom, usable with both forms of the binarySearch operation, looks for the specified search key, and inserts it at the appropriate position if it's not already present:
int pos = Collections.binarySearch(l, key);
if (pos < 0)
l.add(-pos-1, key);
Finding Extreme Values
The min and max algorithms return, respectively, the minimum and maximum element contained in a specified Collection. Both of these operations come in two forms. The simple form takes only a Collection, and returns the minimum (or maximum) element according to the elements' natural ordering. The second form takes a Comparator in addition to the Collection and returns the minimum (or maximum) element according to the specified Comparator.
These are the only algorithms provided by the Java platform that work on arbitrary Collection objects, as opposed to List objects. Like the fill algorithm above, these algorithms are quite straightforward to implement. They are included in the Java platform solely as a convenience to programmers.
Lesson: Custom Implementations
Many programmers will never need to implement their own collections classes. You can go pretty far using the implementations described in the previous sections of this lesson. Someday, however, you might find yourself wanting to write your own implementation of a core collection interface. It turns out that it's easy to do with the abstract implementations provided by the Java platform. But before we discuss how to write an implementation, let's discuss why you might want to do such a thing.
Reasons to Write Your Own Implementation
This list enumerates several kinds of collections you might want to implement. It is not intended to be exhaustive.
• Persistent: All of the built-in collection implementations reside in main memory, and vanish when the VM exits. Suppose you want a collection that will still be present the next time the VM starts. One fine way to implement such a collection is to build a veneer over an external database. Such a collection might conceivably be concurrently accessible by multiple VMs, since it resides outside the VM.
• Application-specific: This is a very broad category. One example is an unmodifiable Map containing real-time telemetry data. The keys might represent locations, and the values could be read from sensors at these locations in response to the get operation.
• Highly Concurrent: The built-in collections are not designed to support high concurrency. The synchronization wrappers (and the legacy implementations) lock the entire collection every time it's accessed. Suppose you're building a server and you need a Map implementation that can be accessed by many threads concurrently. It is reasonably straightforward to build a hash table that locks each bucket separately, allowing multiple threads to access the table concurrently (assuming they're accessing keys that hash to different buckets).
• High-performance, Special-purpose: There are many data structures that take advantage of restricted usage to offer better performance than is possible with general-purpose implementations. For example, consider a Set whose elements are restricted to a small, fixed universe. Such a Set can be represented as a bit-vector, which offers blindingly fast performance as well low memory usage. Another example concerns a List containing long runs of identical element values. Such lists, which occur frequently in text processing, can be run-length encoded: runs can be represented as a single object containing the repeated element and the number of consecutive repetitions. This example is interesting because it trades off two aspects of performance: It requires far less space than an ArrayList, but more time.
• High-performance, General-purpose: The engineers who designed the collections framework tried to provide the best general-purpose implementations for each interface, but there are many, many data structures that could have been used, and new ones are invented every day. Maybe you can come up with something faster!
• Enhanced functionality: Suppose you need a Map (or Set) implementation that offers constant time access, as well as insertion-order iteration. This combination can be achieved with a hash table, all of whose elements are further joined, in insertion order, into a doubly-linked list. Alternatively, suppose you need an efficient bag implementation (also known as a multiset): a Collection that offers constant time access while allowing duplicate elements. It's reasonably straightforward to implement such a collection atop a HashMap.
• Convenience: You may want additional convenience implementations beyond those offered by the Java platform. For instance, you may have a frequent need for immutable Map objects representing a single key-value mapping, or List objects representing a contiguous range of Integers, or whatever.
• Adapter: Suppose you are using some legacy API that has its own ad hoc collections API. You can write an adapter implementation that permits these collections to operate in the Java Collections Framework. An adapter implementation is a thin veneer that wraps objects of one type and makes them behave like objects of another type by translating operations on the latter type into operations on the former.
How to Write a Custom Implementation
Writing a custom implementation is surprisingly easy with the aid of the abstract implementations furnished by the Java platform. Abstract implementations are skeletal implementations of the core collection interfaces designed expressly to facilitate custom implementations. We'll start with an example. Here's an implementation of Arrays.asList .
public static List asList(Object[] a) {
return new ArrayList(a);
}
private static class ArrayList extends AbstractList
implements java.io.Serializable
{
private Object[] a;
ArrayList(Object[] array) {
a = array;
}
public Object get(int index) {
return a[index];
}
public Object set(int index, Object element) {
Object oldValue = a[index];
a[index] = element;
return oldValue;
}
public int size() {
return a.length;
}
}
Believe it or not, this is almost exactly the implementation contained in the JDK. It's that simple! You provide a constructor, the get, set, and size methods, and AbstractList does all the rest. You get the ListIterator, bulk operations, search operations, hash code computation, comparison and string representation for free.
Suppose you want to make the implementation a bit faster. The API documentation for the abstract implementations describes precisely how each method is implemented so you'll know which methods to override in order to get the performance that you want. The performance of the implementation above is fine, but it can be improved a bit. In particular, the toArray method iterates over the List copying one element at a time. Given the internal representation, it's a lot faster and more sensible just to clone the array:
public Object[] toArray() {
return (Object[]) a.clone();
}
With the addition of this override, and a similar one for toArray(Object[]), this implementation is exactly the one found in the JDK. In the interests of full disclosure, it's a bit tougher to use the other abstract implementations because they require you to write your own iterator, but it's still not that hard.
The abstract implementations are summarized below:
• AbstractCollection : A Collection that is neither a Set nor a List, such as a bag. At a minimum, you must provide the iterator and the size method.
• AbstractSet : A Set. Use is identical to AbstractCollection.
• AbstractList : A List backed by a random-access data store (such as an array). At a minimum, you must provide the positional access methods (get(int) and, optionally, set(int), remove(int), and add(int)) and the size method. The abstract class takes care of listIterator (and iterator).
• AbstractSequentialList : A List backed by a sequential-access data store (such as a linked list). At a minimum, you must provide the listIterator and size methods. The abstract class takes care of the positional access methods. (This is the opposite of AbstractList.)
• AbstractMap : A Map. At a minimum you must provide the entrySet view. This is typically implemented with the AbstractSet class. If the Map is modifiable, you must also provide the put method.
The process of writing a custom implementation is summarized below:
1. Choose the appropriate abstract implementation class from the list above.
2. Provide implementations for all of the class's abstract methods. If your custom collection is to be modifiable, you'll have to override one or more concrete methods as well. The API documentation for the abstract implementation class will tell you which methods to override.
3. Test and, if necessary, debug the implementation. You now have a working custom collection implementation!
4. If you're concerned about performance, read the abstract implementation class's API documentation for all of the methods whose implementations you're inheriting. If any of them seem too slow, override them. If you override any methods, be sure to measure the performance of the method before and after the override! How much effort you put into tweaking the performance should be a function of how much use the implementation will get, and how performance-critical the use. (Often, this step is best omitted.)
No comments:
Post a Comment