Imagine a bacon-wrapped Ferrari. Still not better than our free technical reports.
See all our reports

"Copy-on-Iterate" Java Idiom considered broken

This is a story of an interesting support request handled by our guru Lauri Tulmin. The inquiry was about a rare occurring ArrayIndexOutOfBoundsException that JRebel seemed to cause in Wicket. After some investigation he discovered that the root exception was thrown from the following Wicket code:

private final Map<imodifiable, Entry> modifiableToEntry =
   new ConcurrentHashMap();
 
  public void start(Duration pollFrequency) {
    this.task = new Task("ModificationWatcher");
 
    this.task.run(pollFrequency, new ICode() {
      public void run(Logger log) {
        Iterator iter = new ArrayList(
          ModificationWatcher.this.modifiableToEntry.values()).iterator();
        while (iter.hasNext())

Specifically the exception was thrown from the constructor of ArrayList. How was that possible?

Let’s take a step back and review the reasons for this code. One issue with collections is that when iterating a ConcurrentModificationException is thrown if a collection is changed (usually by a different thread). This is done to protect the Iterator from unpredictable behaviour. A common idiom to avoid it is to copy the collection before iteration like that:

for(Iterator i = new ArrayList(collection).iterator(); i.hasNext();) {...}

Note that the collection must either to be synchronized or otherwise suitable for a multi-threaded environment.

This idiom is a very common one, as easily proven by a simple Google Code search. In fact we used it several times in the JRebel code and is present in several places in Wicket as well. So how could this cause an ArrayIndexOutOfBoundsException?

When Lauri investigated in depth he found out that this idiom is inherently unsafe in a multi-threaded environment (even when the underlying collection is synchronized!). The reason for that is the way ArrayList was constructed before Java 1.6. In my 1.5 Java SDK source code it looks like this:

public ArrayList(Collection<? extends E> collection) {
  int size = collection.size();
  firstIndex = 0;
  array = newElementArray(size + (size / 10));
  collection.toArray(array);
  lastIndex = size;
  modCount = 1;
}

The issue is that there is a race condition between the time the size is recorded and the collection.toArray(array) is called. During that time it is conceivable (and indeed possible) that the size of the collection is changed by a different thread. If the size is now bigger, the array copying in toArray() fails with the dreaded ArrayIndexOutOfBoundsException. When researching the issue further we found that the constructor in Java 1.6 ArrayList has been changed to avoid this issue, so at least on Oracle Java 1.6 or later you are safe. Unfortunately I couldn’t find a specific bug referring to that issue, so likely as not it’s an accidental fix.

So what should you do to avoid this issue? One way is to put the synchronized block around the whole loop, but this requires that you only access that collection from other threads in the same synchronized block. An easier solution is to use the toArray()method like this:

for (Iterator i = Arrays.asList(collection.toArray()).iterator(); i.hasNext();)
  • http://www.als-bi.com/ashirley/ Andrew Shirley

    Thanks for an interesting post, however, it did pique our curiosity as collection.toArray(array) should cope with being given an undersized array.

    Instead collection.toArray(array) should return a new array of the correct size (which would result in a list of nulls, not an ArrayIndexOutOfBoundsException as the result is ignored and the List's internal array left untouched).

    The bug is with the implementation of ConcurrentHashMap's value's toArray which is covered by the following bug.

    http://bugs.sun.com/view_bug.do?bug_id=5067025

  • http://zeroturnaround.com Jevgeni Kabanov

    toArray() does cope with it, but if you watch the code in the version of ArrayList() that I had, it calls collection.toArray(array) and ignores the return value. The resized array would be the return value and it's not the CHM that's at fault here.

  • Andrey

    Isn't it obvious that reading of the collection modifiableToEntry.values() must be synchronized?

  • http://zeroturnaround.com Jevgeni Kabanov

    It's a ConcurrentHashMap, it's already is synchronized. And this issue occurs when you use a synchronized map (if you don't use a synchronized map in a concurrent environment this issue is the last thing you should worry about).

  • Andrey

    ConcurrentHashMap and synchronized map are synchronized but the code reading it is not.
    Nobody guaranties that ArrayList constructor behaves correctly as it's not supposed to work in multi-threaded environment.

    Code like this (copy on iterate) is usually used to modify the original collection during iteration without ConcurrentModificationException. But all this should be done in the same thread.

    If you are already using ConcurrentMap what's the point in trying to get a snapshot for iteration?
    You can safely iterate over it without copying.

  • http://zeroturnaround.com Jevgeni Kabanov

    >If you are already using ConcurrentMap what's the point in trying to get a snapshot for iteration?

    For ConcurrentMap in particular I think the Wicket folks got overexcited, as it doesn't throw the ConcurrentModificationException. But if it'd be a synchronized map, you wouldn't be able to safely iterate over it just because it's synchronized. In between the method calls another thread could change the map and then the ConcurrentModificationException would be thrown.

  • martin-g

    I just filed a ticket for improvement : https://issues.apache.org/jira/browse/WICKET-2964 . Probably it will be improved for the next version.

  • ckbe

    ConcurrentHashMap is not “synchronized” as in Collections.synchronizedMap(new HashMap()).
    It's concurrent, it does some kind of internal synchronization but holds no mutex on it's instance.

  • guest

    Actually, this is not a problem of the “Copy-on-Iterate” idiom. This is a matter of either the ArrayList constructor, or a matter of the use of the arraylist constructor. This problem could easily happened in other context. It is well known that the Arraylist class does not fit well with concurrent environments. In my point of view, it is not a good idea to use it without any protection in such an environment. That's why I am not surprised when the kind of bugs that you have just mentioned are discovered.

  • http://zeroturnaround.com Jevgeni Kabanov

    Fair enough :)

  • http://zeroturnaround.com Jevgeni Kabanov

    What difference does it make? Yes, CHM uses multiple monitors to sync, but semantically the result is the same, if faster.

  • Jeffrey Smith

    The Collections.toArray() methods must be internally synchronized, or else they
    will suffer the same fate. I suggest you look at their 1.6 implementations to verify
    there is no internal race condition. I believe there is a race condition in the
    ArrayList.toArray() methods, because that class is specifically documented as
    not thread-safe. The internal changes to “elementData” and “size” fields are
    separated in time without synchronization, and therefore are not thread-safe.

    If there is no way to freeze changes from all threads for the collection while a copy
    is constructed, then using the Iterator is clearly a client design flaw. The Iterator
    API is not intended for multi-threaded access. The Collections.toArray() methods
    are not atomic, unless the implementation specifically states that it is atomic.

    ConcurrentModificationException thrown by the Iterator is intended solely to detect client
    design flaws and bugs during application development. The ArrayIndexOutOfBoundsException
    clearly shows that the client application is incorrectly using an ArrayList in a multi-threaded
    context.

  • Jeffrey Smith

    By the way, your purported solution won't work, because toArray() is not atomic.

    Changing:
    collection.toArray(array);

    to:
    array = collection.toArray(array);

    will not avoid an array indexing error within the toArray() method. The exposure
    to a concurrent modification is still present.

    The assignments:
    collection.toArray(array);
    lastIndex = size;

    should at least be changed to:
    array = collection.toArray(array);
    lastIndex = array.length;

    but there is still an exposure within toArray() while constructing the
    copy.

    The simple solution is not to use unsynchronized collections within a
    multi-threaded context.

  • ckbe

    The difference is, that if you decide at one point to use a synchronized() on a ConcurrentHashMap, you must synchronize each and every access explicitly and loose the benefit of the ConcurrentHashMap (The concurrency).
    Whereas if you use the simpler synchronizedMap() and just do a synchronized() block around for example the iteration loop (or the constructor call of the ArrayList) everything works fine.

  • ckbe

    Not that easy. Each and every use (modification or read) of the ConcurrentHashMap must then be synchronized, since the CHM doesn't use itself as a mutex. And then you got a “NonConcurrentButTwiceSynchronizedHashMap” .. ;-)

  • ckbe

    Easiest fix would be:
    Use a Collections.synchronizedMap(modifiableToEntry) and a synchronized(modifiableToEntry) {} block round the ArrayList Constructor.

  • Andrey

    I meant reading/modifying :)
    Any way what for to copy the collection values? why not to iterate directly over CHM?

  • http://zeroturnaround.com Jevgeni Kabanov

    In this case the underlying collection is assumed to be synchronized, so toArray() is also synchronized and isn't exposed to anything.

  • http://www.google.com/profiles/108393075678473797769 ron

    I do not see the need for any copying here, after all it is a CHM we are working on. As the doc says:

    “The view's iterator is a “weakly consistent” iterator that will never throw ConcurrentModificationException, and guarantees to traverse elements as they existed upon construction of the iterator, and may (but is not guaranteed to) reflect any modifications subsequent to construction.”

    So i won't have any trouble directly iterating on modifiableToEntry.values(). Of course, the above implies, that the underlying collection might get modified during iteration, which might have implications on application logic, but it will not throw any Exception.

    Basically this means that if an entry is inserted or removed at a segment or index in the CHM which the iterator has already seen, nothing happens. If an entry is inserted or removed at a segment or index in the CHM which the iterator has not seen yet, the iterator will just deliver the new entry (or not deliver a removed entry).

    I also want to mention, that Collections.synchronizedCollection does not provide a synchronized iterator() method, either. The user has to manually synchronize iterator creation, which, as opposed to CHM, is easily possible by syncronizing on the Collection itself (or providing a mutex in the Collections.synchronizedCollection call)

  • ckbe

    Well the only upside of a copy is that you get a snapshot of the CHM at one point in time.

    But this idea of an snapshot isn't really supported by the CHM.

    So best is, not use a CHM or use it's iterator an be aware, that the iterator returns Objects that have been in the CHM between construction of the iterator and the end of the loop.