Cross Stitching: Elegant Concurrency Patterns for JavaScript
This post first appeared on the Big Nerd Ranch blog.
"JavaScript is single-threaded, so it doesn't scale. JavaScript is a toy language because it doesn't support multithreading." Outside (and inside) the web community, statements like these are common.
And in a way, it's true: JavaScript’s event loop means your program does one thing at a time. This intentional design decision shields us from an entire class of multithreading woes, but it has also birthed the misconception that JavaScript can’t handle concurrency.
But in fact, JavaScript's design is well-suited for solving a plethora of concurrency problems without succumbing to the "gotchas" of other multithreaded languages. You might say that JavaScript is single-threaded… just so it can be multithreaded!
Recap: Concurrency
You may want to do some homework if "concurrency" and "parallelism" are new to your vocabulary. TL;DR: for simple programs, we usually write "sequential" or ("serial") code: one step executes at a time, and must complete before the next step begins. If JavaScript could perform a "blocking" AJAX request with ajaxSync()
, serial code might look like this:
console.log('About to make a request.');
let json = ajaxSync('https://api.google.com/search.json');
console.log(json);
console.log('Finished the request.');
/*
=> About to make a request.
... AJAX request runs ...
... a couple seconds later ...
... AJAX request finishes ...
=> { all: ['the', 'things'] }
=> Finished the request.
*/
Until the AJAX request completes, JavaScript pauses (or "blocks") any lines below from executing. In contrast, concurrency is when the execution of one series of steps can overlap another series of steps. In JavaScript, concurrency is often accomplished with async Web APIs and a callback:
console.log('About to make a request.');
ajaxAsync('https://api.google.com/search.json', json => {
console.log(json);
console.log('Finished the request.');
});
console.log('Started the request.');
/*
=> About to make a request.
... AJAX request runs in the background ...
=> Started the request.
... a couple seconds later ...
... AJAX requests finishes ...
=> { all: ['the', 'things'] }
=> Finished the request.
*/
In this second version, the AJAX request only "blocks" the code inside the callback (logging the AJAX response), but the JavaScript runtime will go on executing lines after the AJAX request.
Recap: Event Loop
The JavaScript runtime uses a mechanism, called the "event loop," to keep track of all in-progress async operations so it can notify your program when an operation finishes. If you are unfamiliar with the event loop, check out Philip Robert's exceptional 20 minute overview from ScotlandJS: "Help, I'm stuck in an event-loop."
Thanks to the event loop, a single thread can perform an admirable amount of work concurrently. But why not just reach for multithreading?
Software is harder to write (and debug) when it constantly switches between different tasks through multithreading. So unlike many languages, JavaScript finishes one thing at a time—a constraint called "run-to-completion"—and queues up other things to do in the background. Once the current task is done, it grabs the next chunk of work off the queue and executes to completion.
Since the JavaScript runtime never interrupts code that is already executing on the call stack, you can be sure that shared state (like global variables) won't randomly change mid-function—reentrancy isn't even a thing! Run-to-completion makes it easy to reason about highly concurrent code, for which reason Node.js is so popular for backend programming.
Although your JavaScript code is single-threaded and only does one thing at a time, the JavaScript Runtime and Web APIs are multithreaded! When you pass a callback function to setTimeout()
or start an AJAX request with fetch()
, you are essentially spinning up a background thread in the runtime. Once that background thread completes, and once the current call stack finishes executing, your callback function is pushed onto the (now empty) call stack and run-to-completion. So your JavaScript code itself is single-threaded, but it orchestrates legions of threads!
However, we need some patterns to write concurrent code that is performant and readable.
Recap: Promise Chaining
Suppose we are building a media library app in the browser and are writing a function called updateMP3Meta()
that will read in an MP3 file, parse out some ID3 metadata (e.g. song title, composer, artist) and update a matching Song
record in the database. Assuming the read()
, parseMP3()
and Song.findByName()
functions return Promise
s, we could implement it like this:
let read = (path) => { ... }; // returns a Promise
let parseMP3 = (file) => { ... }; // returns a Promise
let Song = {
findByName(name) { ... } // returns a Promise
};
let updateMP3Meta = (path) => {
return read(path)
.then(file => {
return parseMP3(file).then(meta => {
return Song.findByName(file.name).then(song => {
Object.assign(song, meta);
return song.save();
});
});
});
};
It does the job, but nested .then()
callbacks quickly turn into callback hell and obscure intent… and bugs. We might try using Promise
chaining to flatten the callback chain:
let updateMP3Meta = (path) => {
return read(path)
.then(file => parseMP3(file))
.then(meta => Song.findByName(file.name))
.then(song => {
Object.assign(song, meta);
return song.save();
});
};
This reads nicely, but unfortunately it won't work: we can't access the file
variable from the second .then()
callback, nor meta
from the third .then()
anymore! Promise
chaining can tame callback hell, but only by forfeiting JavaScript's closure superpowers. It's hardly ideal—local variables are the bread-and-butter of state management in functional programming.
Recap: Async Functions
Luckily, ES2017 async
functions merge the benefits of both approaches. Rewriting our updateMP3Meta()
as an async
function yields:
let updateMP3Meta = async (path) => {
let file = await read(path);
let meta = await parseMP3(file);
let song = await Song.findByName(file.name);
Object.assign(song, meta);
return song.save();
};
Hurray! async
functions give us local scoping back without descending into callback hell.
However, updateMP3Meta()
unnecessarily forces some things to run serially. In particular, MP3 parsing and searching the database for a matching Song
can actually be done in parallel; but the await
operator forces Song.findByName()
to run only after parseMP3()
finishes.
Working in Parallel
To get the most out of our single-threaded program, we need to invoke JavaScript's event loop superpowers. We can queue two async operations and wait for both to complete:
let updateMP3Meta = (path) => {
return read(path)
.then(file => {
return Promise.all([
parseMP3(file),
Song.findByName(file.name)
]);
})
.then(([meta, song]) => {
Object.assign(song, meta);
return song.save();
});
};
We used Promise.all()
to wait for concurrent operations to finish, then aggregated the results to update the Song
. Promise.all()
works just fine for a few concurrent spots, but code quickly devolves when you alternate between chunks of code that can be executed concurrently and others that are serial. This intrinsic ugliness is not much improved with async
functions:
let updateMP3Meta = async (path) => {
let file = await read(path);
let metaPromise = parseMP3(file);
let songPromise = Song.findByName(file.name);
let meta = await metaPromise;
let song = await songPromise;
Object.assign(song, meta);
return song.save();
};
Instead of using an inline await
, we used [meta|song]Promise
local variables to begin an operation without blocking, then await
both promises. While async
functions make concurrent code easier to read, there is an underlying structural ugliness: we are manually telling JavaScript what parts can run concurrently, and when it should block for serial code. It's okay for a spot or two, but when multiple chunks of serial code can be run concurrently, it gets incredibly unruly.
We are essentially deriving the evaluation order of a dependency tree… and hardcoding the solution. This means "minor" changes, like swapping out a synchronous API for an async one, will cause drastic rewrites. That's a code smell!
Real Code
To demonstrate this underlying ugliness, let's try a more complex example. I recently worked on an MP3 importer in JavaScript that involved a fair amount of async work. (Check out my blog post or the parser source code if you're interested in working with binary data and text encodings.)
The main function takes in a File
object (from drag-and-drop), loads it into an ArrayBuffer
, parses MP3 metadata, computes the MP3's duration, creates an Album
in IndexedDB
if one doesn't already exist, and finally creates a new Song
:
import parser from 'id3-meta';
import read from './file-reader';
import getDuration from './duration';
import { mapSongMeta, mapAlbumMeta } from './meta';
import importAlbum from './album-importer';
import importSong from './song-importer';
export default async (file) => {
// Read the file
let buffer = await read(file);
// Parse out the ID3 metadata
let meta = await parser(file);
let songMeta = mapSongMeta(meta);
let albumMeta = mapAlbumMeta(meta);
// Compute the duration
let duration = await getDuration(buffer);
// Import the album
let albumId = await importAlbum(albumMeta);
// Import the song
let songId = await importSong({
...songMeta, albumId, file, duration, meta
});
return songId;
};
This looks straightforward enough, but we're forcing some async operations to run sequentially that can be executed concurrently. In particular, we could compute getDuration()
at the same time that we parse the MP3 and import a new album. However, both operations will need to finish before invoking importSong()
.
Our first try might look like this:
export default async (file) => {
// Read the file
let buffer = await read(file);
// Compute the duration
let durationPromise = getDuration(buffer);
// Parse out the ID3 metadata
let metaPromise = parser(file);
let meta = await metaPromise;
let songMeta = mapSongMeta(meta);
let albumMeta = mapAlbumMeta(meta);
// Import the album
let albumIdPromise = importAlbum(albumMeta);
let duration = await durationPromise;
let albumId = await albumIdPromise;
// Import the song
let songId = await importSong({
...songMeta, albumId, file, duration, meta
});
return songId;
};
That took a fair amount of brain tetris to get the order of await
s right: if we hadn't moved getDuration()
up a few lines in the function, we would have created a poor solution since importAlbum()
only depends on albumMeta
, which only depends on meta
. But this solution is still suboptimal! getDuration()
depends on buffer
, but parser()
could be executing at the same time as read()
. To get the best solution, we would have to use Promise.all()
and .then()
s.
To solve the underlying problem without evaluating a dependency graph by hand, we need to define groups of serial steps (which execute one-by-one in a blocking fashion), and combine those groups concurrently.
What if there was a way to define such a dependency graph that's readable, doesn't break closures, doesn't resort to .then()
, and doesn't require a library?
Async IIFEs
That's where async IIFEs come in. For every group of serial (dependent) operations, we'll wrap them up into a micro API called a "task":
let myTask = (async () => {
let other = await otherTask;
let result = await doCompute(other.thing);
return result;
})();
Since all async
functions return a Promise
, the myTask
local variable contains a Promise
that will resolve to result
. I prefer to call these *Task
instead of *Promise
. Inside the async IIFE, operations are sequential, but outside we aren't blocking anything. Furthermore, inside a task we can wait on other tasks to finish, like otherTask
, which could be another async IIFE.
Let's turn the getDuration()
section into a task called durationTask
:
let durationTask = (async () => {
let buffer = await readTask;
let duration = await getDuration(buffer);
return duration;
})();
Since these tasks are defined inline, they have access to variables in the outer closure, including other tasks!
Refactoring into Async Tasks
Let's refactor the entire importer with async IIFEs, or "tasks":
export default async (file) => {
// Read the file
let readTask = read(file);
// Parse out the ID3 metadata
let metaTask = (async () => {
let meta = await parser(file);
let songMeta = mapSongMeta(meta);
let albumMeta = mapAlbumMeta(meta);
return { meta, songMeta, albumMeta };
})();
// Import the album
let albumImportTask = (async () => {
let { albumMeta } = await metaTask;
let albumId = await importAlbum(albumMeta);
return albumId;
})();
// Compute the duration
let durationTask = (async () => {
let buffer = await readTask;
let duration = await getDuration(buffer);
return duration;
})();
// Import the song
let songImportTask = (async () => {
let albumId = await albumImportTask;
let { meta, songMeta } = await metaTask;
let duration = await durationTask;
let songId = await importSong({
...songMeta, albumId, file, duration, meta
});
return songId;
})();
let songId = await songImportTask;
return songId;
};
Now reading the file, computing duration, parsing metadata and database querying will automatically run concurrently or serially—we were even able to leave getDuration()
in its original spot! By declaring tasks and await
ing them inside other tasks, we defined a dependency graph for the runtime and let it discover the optimal solution for us.
Suppose we wanted to add another step to the import process, like retrieving album artwork from a web service:
// Look up album artwork from a web service
let albumArtwork = await fetchAlbumArtwork(albumMeta);
Prior to the async IIFE refactor, adding this feature would have triggered a lot of changes throughout the file, but now we can add it with just a small isolated chunk of additions!
+// Look up album artwork from a web service
+let artworkTask = (async () => {
+ let { albumMeta } = await metaTask;
+ let artwork = await fetchAlbumArtwork(albumMeta);
+ return artwork;
+})();
// Import the album
let albumImportTask = (async () => {
+ let artwork = await artworkTask;
let { albumMeta } = await metaTask;
- let albumId = await importAlbum(albumMeta);
+ let albumId = await importAlbum({ artwork, ...albumMeta });
return albumId;
})();
Tasks are declarative, so managing concurrent vs. serial execution order becomes an "execution detail" instead of an "implementation detail"!
What if we revamped our parser()
function so it could synchronously parse an ArrayBuffer
instead of a File
object? Before this would have triggered a cascade of line reordering, but now the change is trivial:
// Parse out the ID3 metadata
let metaTask = (async () => {
+ let buffer = await readTask;
- let meta = await parser(file);
+ let meta = parser(buffer);
let songMeta = mapSongMeta(meta);
let albumMeta = mapAlbumMeta(meta);
return { meta, songMeta, albumMeta };
})();
Objections
It's tempting to take shortcuts and solve the dependency graph yourself. For example, after our changes to parser()
above, all of the tasks depend on the file being read in, so you could block the entire function with await read(file)
to save a few lines. However, these areas are likely to change, and organizing into serial tasks provides other benefits: these micro APIs make it is easier to read, debug, extract and reason about a complex chunk of concurrency.
Since we wrapped these tasks into async IIFEs, why not extract them into dedicated functions? For the same reason we couldn't use Promise
chaining: we have to give up nested closures and lexically scoped variables. Extracting tasks into top level functions also begs a design question: if all these operations were synchronous, would we still perform this extraction?
If you find yourself extracting async
functions (as we did with importAlbum()
and importSong()
) because of their complexity or reusability, bravo! But ultimately, design principles for breaking down functions should be independent of whether the code is async vs. sync.
Also, splitting functions or moving them too far from their context makes code harder to grasp, as Josh discusses in his post about extracting methods.
More to Come
Functional programming is well-suited to multithreading because it minimizes shared state and opts for local variables as the de facto state mechanism. And thanks to JavaScript's event loop, we can deal with shared state by merging results inside a single thread.
Next time, we'll examine functional patterns for throttling concurrency on a single thread, then wrap up with techniques for efficiently managing a cluster of Web Workers… without worrying a shred about "thread safety."