Concurrent Collections – Map Time!

Java has boasted various collections classes for many years now, all to deal with common programming problems.

When we need synchronized collections, we used to just wrap our regular collections with a call to java.util.Collections.synchronizedList() or the other similar methods. Sometimes though, these methods don’t scale as they are a very primitive and unoptimized way of controlling access.

In this post we will cover the implementation details of ConcurrentHashMap, and also discuss ConcurrentSkipListMap later on. ConcurrentHashMap implements the normal Map interface, and shows the same behaviour as HashMap, so it is a nice drop-in replacement for other maps. But most importantly, it offers vastly superior concurrent performance over using Collections.synchronizedMap() or even using your own customized synchronized blocks.

How does it do this? Well, in the average application, reads typically outnumber writes by a huge number and this applies to most critical sections also – we end up locking and blocking other threads for a lot of reads where nothing is going on. How can this be improved?

Well, the designers of ConcurrentHashMap realized this problem and created the map so that we go beyond having a certain number of buckets and split the map into several segments which each have their own buckets. Hashing the key is still used to determine the appropriate segment, just as they are for buckets. The segment buckets also take advantage of volatile fields to ensure consistent reads during get() or other read operations.

Two nice properties come out of this:

Each segment can actually be written to without locking the rest of the hash table
Reads which are not concurrent with a write to the same bucket will not require a lock (due to volatile fields)

Tip: If you want the equivalent of a ConcurrentHashSet, use Collections.newSetFromMap() with a ConcurrentHashMap as the argument.

Tuning ConcurrentHashMap

Now that we understand the mechanics behind the ConcurrentHashMap, how do we tune it for specific cases?

Well, as a general rule, the sparser the map, the less contention there will be. The only real way to improve concurrency within the map is to adjust the concurrency level argument supplied upon construction. This controls how many segments are created, which directly corresponds to how well the map will scale with many concurrent writes (or reads when concurrent writes are happening). The example below shows examples of various constructions that will result in different performance characteristics.

// default settings, allows for 16 concurrent segments, 
// with default load factor of .75 and initial capacity of 16
Map<String, String> map = new ConcurrentHashMap<String, String>();

// for a larger number of threads, you can specify a higher concurrency level
int concurrencyLevel = 50;
Map<String, String> moreConcurrentMap =
      new ConcurrentHashMap<String, String>p(0.75f, 16, concurrencyLevel);

// a sparser map will use use more memory but result in faster puts and writes.
// but won't reduce contention for writes since they lock each segment
float sparse = 0.25f;
Map<String, String> moreConcurrentMap = 
     new ConcurrentHashMap<String, String>(sparse, 16);

ConcurrentSkipListMap

Java 1.6 introduced the ConcurrentSkipListMap, which is a good alternative to the ConcurrentHashMap when you need a NavigableMap implementation. NavigableMap instances are sorted maps, and provide operations to traverse and navigate through the entries in the map. When considering ConcurrentHashMap versus ConcurrentSkipListMap, your choice should depend on whether you need a sorted map or not.

The ConcurrentSkipListMap in general should not be preferred over ConcurrentHashMap, but is ideal if you need a sorted map. If you have a bit of time, I suggest you read on up skip lists, which are the basis for this map implementation. Skip lists are a data structure which use probabilistic balancing to avoid the need for rebalancing operations. Real life performance of skip lists is exceptional, and having a scalable concurrent sorted map is a boon for those designing high volume applications in Java.

If you have a bit of time, I suggest you read up on how skip lists work – it is a fascinating computer science subject and they are a relatively new invention, dating back only to 1990. There is a nice write-up of skip lists here.