Workers

Julia workers were developed that integrate with the python dask-scheduler, and hence follow many of the same patterns that the python dask-workers do.

Notable Differences

The julia workers don't execute computations in a thread pool but rather do so asynchronously. The recommended way to setup the workers is to use addprocs and spawn at least one Worker per julia process added in the cluster.
Currently the julia workers do not support specifying resources needed by computations or spilling excess data onto disk.

Tasks

Usually computations submitted to a worker go through task states in the following order:

waiting -> ready -> executing -> memory

Computations that result in errors being thrown are caught and the error is saved in memory. Workers communicate between themselves to gather dependencies and with the dask-scheduler.

API

DaskDistributedDispatcher.Worker — Type.

Worker

A Worker represents a worker endpoint in the distributed cluster. It accepts instructions from the scheduler, fetches dependencies, executes compuations, stores data, and communicates state to the scheduler.

Fields

status::Symbol: status of this worker
address::Address:: ip address and port that this worker is listening on
listener::Base.TCPServer: tcp server that listens for incoming connections
scheduler_address::Address: the dask-distributed scheduler ip address and port info
batched_stream::Nullable{BatchedSend}: batched stream for communication with scheduler
scheduler::Rpc: manager for discrete send/receive open connections to the scheduler
connection_pool::ConnectionPool: manages connections to peers
handlers::Dict{String, Function}: handlers for operations requested by open connections
compute_stream_handlers::Dict{String, Function}: handlers for compute stream operations
transitions::Dict{Tuple, Function}: valid transitions that a task can make
data_needed::Deque{String}: keys whose data we still lack
ready::PriorityQueue{String, Tuple, Base.Order.ForwardOrdering}: keys ready to run
data::Dict{String, Any}: maps keys to the results of function calls (actual values)
tasks::Dict{String, Tuple}: maps keys to the function, args, and kwargs of a task
task_state::Dict{String, Symbol}: maps keys tot heir state: (waiting, executing, memory)
priorities::Dict{String, Tuple}: run time order priority of a key given by the scheduler
priority_counter::Int: used to prioritize tasks by their order of arrival
dep_transitions::Dict{Tuple, Function}: valid transitions that a dependency can make
dep_state::Dict{String, Symbol}: maps dependencies with their state (waiting, flight, memory)
dependencies::Dict{String, Set}: maps a key to the data it needs to run
dependents::Dict{String, Set}: maps a dependency to the keys that use it
waiting_for_data::Dict{String, Set}: maps a key to the data it needs that we don't have
pending_data_per_worker::DefaultDict{String, Deque}: data per worker that we want
who_has::Dict{String, Set}: maps keys to the workers believed to have their data
has_what::DefaultDict{String, Set{String}}: maps workers to the data they have
in_flight_workers::Dict{String, Set}: workers from which we are getting data from
missing_dep_flight::Set{String}: missing dependencies

Workers

Notable Differences

Tasks

API

Internals