Java 7 – Fork/Join – Obsidian Scheduler

While we lamented how feature-poor Java 7 turned out to be, one thing that made it that turns out to be a boon to high-performance concurrent development is the new Fork/Join framework. This framework is targeted at multi-processor systems (really almost all hardware today) in situations where a batch of work can broken into smaller recursive calls. But more than just that, it also uses a work-stealing algorithm where threads with no work can steal available work from other threads that are busy. What makes that so useful is that you can try to break down your work into fairly small, roughly equal pieces, but some pieces can take longer than others and you’ll still get good use of your processing resources.

The Fork/Join framework builds on the ExecutorService we discussed in our Concurrency series and the implementation is ForkJoinPool that can execute the ForkJoinTask which is usually implemented as a child of either RecursiveTask(returns data) or RecursiveAction(no returned data).

As is our custom, let’s use a sample task and some applicable code. Let’s assume you want to write a class that will calculate the size on disk of a given directory and all its children. And in our case we’ll just make the assumption we’re running some wicked fast SSDs so we can actually benefit from concurrent scans.

Here’s our code:

import java.util.concurrent.*;
import java.io.*;
import java.util.*;
public class DirectorySizer extends RecursiveTask<Long> {
  
  private List<File> mFiles;
  private boolean mAllFiles = true;
  
  public DirectorySizer(List<File> files) {
    mFiles = files;
    for (File file : files) {
      if (file.isDirectory()) {
        mAllFiles = false;
      }
    }
  }
  
  protected Long compute() {
    if (mFiles.size() <=4 && mAllFiles) {
      return computeLocal();
    } else {
      return forkAndJoin();
    }  
  }
  
  private Long computeLocal() {
    long length = 0;
    for (File file : mFiles) {
	  length += file.length();
    }
    return length;
  }
  
  private Long forkAndJoin() {
    List<File> dirsAndFiles = new ArrayList();
	for (File file : mFiles) {
      if (file.isFile()) {
        dirsAndFiles.add(file);
      } else {
        dirsAndFiles.addAll(Arrays.asList(file.listFiles()));
      }
    }
    int rightSize = dirsAndFiles.size() / 2;
    int leftSize = dirsAndFiles.size() - rightSize;
    List<File> leftList = dirsAndFiles.subList(0, leftSize);
    List<File> rightList= dirsAndFiles.subList(leftSize, leftSize+rightSize);
    DirectorySizer d1 = new DirectorySizer(leftList);
    d1.fork();
    DirectorySizer d2 = new DirectorySizer(rightList);
    return d2.compute() + d1.join();
  }
  
  public static void main(String[] args) throws Exception {
    List<File> files = Arrays.asList(new File(args[0]).listFiles());
    DirectorySizer sizer = new DirectorySizer(files);
    ForkJoinPool pool = new ForkJoinPool();
    Long size = pool.invoke(sizer);
    System.out.println(args[0] + " is " + size + " bytes ");
  }
}

Let's break down the usage of the Fork/Join framework. By extending RecursiveTask, all we really need to do is implement the compute method calculating the size when the amount of work is small enough to suit our needs and using a fork/join when the chunk of work is still too large. In our main method we get all the benefit of the framework simply by creating a new ForkJoinPool and passing our top-level instance to the invoke method.

While it's true the potential uses of this framework are limited and require a fairly narrow problem scope, it's nice to see continued advancement in Java in the concurrency space to build on all the goodies we got in Java 5. If you have any concurrency problems you'd like us to tackle in our blog, drop us line. We enjoy talking shop with fellow developers.

1 thought on “Java 7 – Fork/Join”