Manual

Motivation

Dispatcher.jl is designed to distribute and manage execution of a graph of computations. These computations are specified in a manner as close to regular imperative Julia code as possible. Using a parallel executor with several processes, a central controller manages execution, but data is transported only among processes which will use it. This avoids having one large process where all data currently being used is stored.

Design

Overview

Using Dispatcher, run! builds and runs a computation graph of DispatchNodes. DispatchNodes represent units of computation that can be run. The most common DispatchNode is Op, which represents a function call on some arguments. Some of those arguments may exist when building the graph, and others may represent the results of other DispatchNodes. An Executor builds and executes a whole DispatchGraph. Two Executors are provided. AsyncExecutor executes computations asynchronously using Julia Tasks. ParallelExecutor executes computations in parallel using all available Julia processes (by calling @spawn).

Here is an example defining and executing a graph:

filenames = ["mydata-$d.dat" for d in 1:100]
data = [(@op load(filename)) for filename in filenames]

reference = @op load_from_sql("sql://mytable")
processed = [(@op process(d, reference)) for d in data]

rolled = map(1:(length(processed) - 2)) do i
    a = processed[i]
    b = processed[i + 1]
    c = processed[i + 2]
    roll_result = @op roll(a, b, c)
    return roll_result
end

compared = map(1:200) do i
    a = rand(rolled)
    b = rand(rolled)
    compare_result = @op compare(a, b)
    return compare_result
end

best = @op reduction(CollectNode(compared))

executor = ParallelExecutor()
(run_best,) = run!(executor, [best])

The components of this example will be discussed below. This example is based on a Dask example.

Dispatch Nodes

A DispatchNode generally represents a unit of computation that can be run. DispatchNodes are constructed when defining the graph and are run as part of graph execution. CollectNode from the above example is a subtype of DispatchNode.

Any arguments to DispatchNode constructors (including in @op) which are DispatchNodes are recorded as dependencies in the graph.

Op

An Op is a DispatchNode which represents some function call to be run as part of graph execution. This is the most common type of DispatchNode. The @op macro deconstructs a function call to construct an Op. The following code:

roll_result = @op roll(a, b, c)

is equivalent to:

roll_result = Op(roll, a, b, c)

Note that code in the argument list gets evaluated immediately; only the function call is delayed.

Executors

An Executor runs a DispatchGraph. This package currently provides two Executors: AsyncExecutor and ParallelExecutor. They work the same way, except AsyncExecutor runs nodes using @async and ParallelExecutor uses @spawn.

This call:

(run_best,) = run!(executor, [best])

takes an Executor and a Vector{DispatchNode}, creates a DispatchGraph of those nodes and all of their ancestors, runs it, and returns a collection of DispatchResults (in this case containing only the DispatchResult for best). A DispatchResult is a ResultType containing either a DispatchNode or a DependencyError (an error that occurred when attempting to satisfy the requirements for running that node).

It is also possible to feed in inputs in place of nodes in the graph; see run! for more.