NeurIPS 2019 in Review

Author: Invenia team

NeurIPS 2019

Introduction

In early December 2019, part of the Invenia team headed to Vancouver, Canada, for the annual Neural Information Processing Systems (NeurIPS, previously NIPS) conference. It was the biggest NeurIPS conference yet, with an estimated 13,000 attendees, 6,743 paper submissions, and almost 80 official meetups.

We sent developers, researchers, and people operation representatives from both Cambridge and Winnipeg to attend talks, meet people, and run our booth at the expo. Below are some of our highlights from the week.

Yoshua Bengio and Blaise Aguera y Arcas Keynotes

The keynote talks by Blaise Aguera y Arcas and Yoshua Bengio offered interesting perspectives and discussed a new, or arguably old, direction for the learning community. Bengio outlined how current deep learning is excelling with “system 1 learning”, tasks that we as humans do quickly and unconsciously. At the same time, Aguera y Arcas discussed AI’s superhuman performance in such tasks. Punctuating the message of both talks, Aguera y Arcas described just how little territory this covers if we are interested in building a more human-like intelligence.

Current learning methods are lacking in what Bengio refers to as “system 2 learning” - slow, conscious, and logical problem-solving. He underlined some of the missing pieces of current deep learning, such as better out-of-distribution generalisation, high-level semantic representation, and an understanding of causality. The notion that state-of-the-art methods may be inadequate seemed even clearer when Aguera y Arcas questioned whether natural intelligence ever performs optimisation. He highlighted that “evolution decides on what is good and bad” rather than having some predefined loss to optimise.

It’s interesting to see that, even after the many recent advances in deep learning, we are still at a loss when it comes to replicating a natural form of intelligence – which was the initial motivation for neural networks. Both speakers addressed meta-learning, a prominent research area at NeurIPS this year, as a step in the right direction. However, the concluding message from Aguera y Arcas was that, to build natural intelligence, we may need to explore brains with evolved architectures, cellular automata, and social or cooperative learning among agents.

On a completely unrelated note, Blaise Aguera also mentioned a great comic on the nascent field of federated learning. For anyone not familiar with the term, it’s a cute introduction to the topic.

Tackling Climate Change with Machine Learning

Among the workshops, this year’s “Tackling Climate Change with Machine Learning” was particularly interesting for us.

Jeff Dean from Google AI gave an interesting talk about using Bayesian inference for physical systems. One such application discussed was using TensorFlow Probability to help stabilize the plasma flows in a type of fusion reactor. Another topic was on using TPUs for simulating partial differential equations, such as hydraulic model simulation for flood forecast, and combining the results with neural nets for weather prediction. He also mentioned that Google data centers are “net zero-carbon” since 2017. However, this does not mean they consume no fossil fuels – they offset that consumption by producing renewable energy which is then made available to the grid – so there is still room for improvement.

In addition to this, Qinghu Tang discussed the use of semi-supervised learning to create a fine-grained mapping of the low voltage distribution grid using street view images. Their ambitious plan is to create “a comprehensive global database containing information on conventional and renewable energy systems ranging from supply, transmission, to demand-side.” This announcement is particularly exciting to those using cleaner energy sources to combat climate change, as this database would be largely useful for planning.

Earlier in 2019, at the ICML 2019 Workshop, we presented a poster in which we proposed an algorithm for reducing the computational time of solving Optimal Power Flow (OPF) problems. As a follow-up we presented another poster, in which we apply meta-optimisation in combination with a classifier that predicts the active constraints of the OPF problem. Our approach reduces the computational time significantly while still guaranteeing feasibility. A promising future direction is to combine it with the work of Frederik Diehl using graph neural networks (GNN) for predicting the solution of AC-OPF problems.

Responsible use of computing resources

On the last day of the conference, the Deep Reinforcement Learning (RL) panel was asked about the responsible use of computing resources. The answers highlighted the many different stances on what can be a contentious matter.

Some showed a sense of owing the community something, realising that they are in a privileged position, and choosing wisely which computationally expensive experiments to run. It was also mentioned that there is a need to provide all parts of the community with access to extensive computing resources, instead of just a few privileged research teams in a handful of industry labs. However, the challenge lies in how to achieve that.

A different point of view is that we should not be convinced that all research requires large amounts of computing resources. Novel research can still be done with tabular environment reinforcement learning experiments, for instance. It is important to avoid bias towards compute-heavy research.

There have been some attempts at limiting the available resources, in hopes of sparking new research insights, but we still have to see the results of this initiative. Ultimately, there are cases in which the adoption of compute-heavy approaches might simply be unavoidable, such as for reinforcement learning in large continuous control tasks.

Advances in Approximate Bayesian Inference (AABI) Workshop

One of the hidden gems of NeurIPS is the co-located workshop Advances in Approximate Bayesian Inference (AABI), which provides a venue for both established academics as well as more junior researchers to meet up and discuss all things Bayesian. This year the event was the largest to-date - with 120 accepted papers compared to just 40 last year - and consisted of 11 talks and two poster sessions, followed by a panel discussion.

Sergey Levine, from UC Berkeley discussed the mathematical connections between control and Bayesian inference (aka control-as-inference framework). More specifically, Levine presented an equivalence between maximum-entropy reinforcement learning (MaxEnt RL) and inference in a particular class of probabilistic graphical models. This highlighted the usefulness of finding a principled approach to the design of reward functions and addressing the issue of robustness in RL.

In the poster session, we presented GP-ALPS, our work on Bayesian model selection for linear multi-output Gaussian Process (GP) models, which find a parsimonious, yet descriptive model. This is done in an automated way by switching off latent processes that do not meaningfully contribute to explaining the data, using latent Bernoulli variables and approximate inference.

The day finished with a panel discussion that featured, among others, Radford Neal and James Hensman. The participants exchanged opinions on where Bayesian machine learning can add value in the age of big data and deep learning, for example, on tasks that require reasoning about uncertainty. They also discussed the trade-offs between modern Monte Carlo techniques and variational inference, as well as pointed out the current lack of standard benchmarks for uncertainty quantification.

Dealing with real-world data

It was nice to see some papers that specifically looked at modelling with asynchronous and incomplete data, data available at different granularities, and dataset shift - all scenarios that arise in real-world practice. These interesting studies make a refreshing change from the more numerous papers on efficient inference in Bayesian learning and include some fresh, outside-the-box ideas.

GPs were widely applied in order to deal with real-world data issues. One eye-catching poster was Modelling dynamic functional connectivity with latent factor GPs. This paper made clever use of transformations of the approximated covariance matrix to Euclidean space to efficiently adapt covariance estimates. A paper that discussed the problem of missing data in time-series was Deep Amortized Variational Inference for Multivariate Time Series Imputation with Latent Gaussian Process Models. There were also papers that looked at aggregated data issues such as Multi-task learning for aggregated data using Gaussian processes, Spatially aggregated Gaussian processes with multivariate areal outputs, and Multi-resolution, multi-task Gaussian processes. The latter contained some fancy footwork to incorporate multi-resolution data in a principled way within both shallow and deep GP frameworks.

Naturally, there were also several unsupervised learning works on the subject. A highlight was Sparse Variational Inference: Bayesian coresets from scratch, presenting the integration of data compression via Bayesian coresets in SVI with a neat framework for the exponential family. Additionally, the paper discussing Multiway clustering via tensor block modules, presented an algorithm for clustering in high-dimensional feature spaces, with convergence and consistency properties set out clearly in the paper. Finally, we must mention Missing not at random in matrix completion: the effectiveness of estimating missingness probabilities under a low nuclear norm assumption, a paper that proposes a principled approach to filling in missing data by exploiting an assumption that the ‘missingness’ process has a low nuclear norm structure.

Research engineering

Research Engineering is often overlooked in academic circles, even though this is a key role in most companies doing any amount of machine learning research. NeurIPS was a good venue for discussion with research engineers from several companies, which provided some interesting insights into how workflow varies across industry. We all had lots of things in common, such as figuring out how to match research engineers to research projects and trying to build support tools for everyday research tasks.

In many companies, the main output is often research (papers, not products), and having the same people who do novel research resulting in papers also work on putting machine learning into production seemed surprisingly rare. This choice can lead to the significant problem of having two different languages – for example, PyTorch for researchers and TensorFlow for production – which causes extra strain on research engineers who become responsible for translating code and dealing with language particularities. We are fortunate to avoid this kind of issue, both by enforcing high standards on code written by researchers, but more importantly having research code never too far from being able to run in production.

Research Retrospectives

A new addition to the workshops this year was Retrospectives in Machine Learning . The core idea of a retrospective is to reflect on previous work and to discuss shortcomings in the initial intuition, analysis, or results. The goal of the workshop was to “improve the science, openness, and accessibility of the machine learning field, by widening what is publishable and helping to identify opportunities for improvement.” In a way, we can say that it discusses which things readers of important papers should know now, that were not in the original publication.

There were many great talks, with one of the most honest and direct discussions being David Duvenaud’s “Bullshit that I and others have said about Neural ODEs”. The talk discussed many insights on the science side, as well as around what happened after the Neural ODEs paper was published at NeurIPS 2018. An overarching theme of the talk is the sociology of the very human institutions of research. As an example, the talk suggests thinking about how science journalists have distinct incentives, how these interact with researcher/academic incentives, and how this can – even inadvertently – result in hype.

Program transformations workshop

The program transformations workshop was primarily focused on automatic differentiation (AD), which has been a key element in the development of modern machine learning techniques.

The workshop was a good size, with a substantial portion of all people who are currently doing interesting things in AD, such as Jax, Swift4TF and ChainRules.jl in attendance. Also present were some of the fathers of AD, such as Laurent Hascoët and Andreas Griewank. Their general attitude towards recent work seemed to be that there have not been significant theoretical advances since the ‘80s. Still, there has been considerable progress toward making AD more useful and accessible.

One of the major topics was AD theory: the investigation of the mathematical basis of current methods. As a highlight, the Differentiable Curry presentation introduced a diagrammatic notation which made the type-theory in the original paper much clearer and more intuitive.

Various new tools were also presented, such as Jax, which made its first public appearance at a major conference. There have been many Jax-related discussions in the Julia community, and it seems to be the best of all Python JITs for differentiable programming. Also of note was Torchscript, a tool that allows running Pytorch outside of Python.

There were also more traditional AD talks, such as high-performance computing (HPC) AD in Fortran and using MPI. A nice exception was Jesse Bettencourt’s work on Taylor-Mode Automatic Differentiation for Higher-Order Derivatives, which is an implementation of Griewank and Walther (2008) chapter 13 in Jax.

Conclusion

As usual, NeurIPS represents a great venue to stay connected with the newest advances and trends in the most diverse subfields of ML. Both academia and industry are largely represented, and that creates an unique forum. We look forward to joining the rest of the community again in the upcoming NeurIPS 2020 and invite everyone to stop by our booth and meet our team.


The Emergent Features of JuliaLang: Part II - Traits

Author: Lyndon White

Introduction

This blog post is based on a talk originally given at a Cambridge PyData Meetup, and also at a London Julia Users Meetup.

This is the second of a two-part series of posts on the Emergent Features of JuliaLang. That is to say, features that were not designed, but came into existence from a combination of other features. While Part I was more or less a grab-bag of interesting tricks, Part II (this post), is focused solely on traits, which are, in practice, much more useful than anything discussed in Part I. This post is fully indepedent from Part I.

Traits

Inheritance can be a gun with only one bullet, at least in the case of single inheritance like Julia uses. Single-inheritance means that, once something has a super-type, there can’t be another. Traits are one way to get something similar to multiple inheritance under such conditions. They allow for a form of polymorphism, that is orthogonal to the inheritence tree, and can be implemented using functions of types. In an earlier blog post this is explained using a slightly different take.

Sometimes people claim that Julia doesn’t have traits, which is not correct. Julia does not have syntactic sugar for traits, nor does it have the ubiqutious use of traits that some languages feature. But it does have traits, and in fact they are even used in the standard library. In particular for iterators, IteratorSize and IteratorEltype, and for several other interfaces.

These are commonly called Holy traits, not out of religious glorification, but after Tim Holy. They were originally proposed to make StridedArrays extensible. Ironically, even though they are fairly well established today, StridedArray still does not use them. There are on-going efforts to add more traits to arrays, which one day will no doubt lead to powerful and general BLAS-type functionality.

There are a few ways to implement traits, though all are broadly similar. Here we will focus on the implementation based on concrete types. Similar things can be done with Type{<:SomeAbstractType} (ugly, but flexible), or even with values if they are of types that constant-fold (like Bool), particularly if you are happy to wrap them in Val when dispatching on them (an example of this will be shown later).

There are a few advantages to using traits:

  • Can be defined after the type is declared (unlike a supertype).
  • Don’t have to be created up-front, so types can be added later (unlike a Union).
  • Otherwise-unrelated types (unlike a supertype) can be used.

A motivating example from Python: the AsList function

In Python TensorFlow, _AsList is a helper function:

def _AsList(x):
    return x if isinstance(x, (list, tuple)) else [x]

This converts scalars to single item lists, and is useful because it only needs code that deals with lists. This is not really idiomatic python code, but TensorFlow uses it in the wild. However, _AsList fails for numpy arrays:

>>> _AsList(np.asarray([1,2,3]))
[array([1, 2, 3])]

This can be fixed in the following way:

def _AsList(x):
    return x if isinstance(x, (list, tuple, np.ndarray)) else [x]

But where will it end? What if other packages want to extend this? What about other functions that also depend on whether something is a list or not? The answer here is traits, which give the ability to mark types as having particular properties. In Julia, in particular, dispatch works on these traits. Often they will compile out of existence (due to static dispatch, during specialization).

Note that at some point we have to document what properties a trait requires (e.g. what methods must be implemented). There is no static type-checker enforcing this, so it may be helpful to write a test suite for anything that has a trait which checks that it works properly.

The Parts of Traits

Traits have a few key parts:

  • Trait types: the different traits a type can have.
  • Trait function: what traits a type has.
  • Trait dispatch: using the traits.

To understand how traits work, it is important to understand the type of types in Julia. Types are values, so they have a type themselves: DataType. However, they also have the special pseudo-supertype Type, so a type T acts like T<:Type{T}.

typeof(String) === DataType
String isa Type{String} === true
String isa Type{<:AbstractString} === true

We can dispatch on Type{T} and have it resolved at compile time.

Trait Type

This is the type that is used to attribute a particular trait. In the following example we will consider a trait that highlights the properties of a type for statistical modeling. This is similar to the MLJ ScientificTypes.jl, or the StatsModel schema.

abstract type StatQualia end

struct Continuous <: StatQualia end
struct Ordinal <: StatQualia end
struct Categorical <: StatQualia end
struct Normable <: StatQualia end

Trait Function

The trait function takes a type as input, and returns an instance of the trait type. We use the trait function to declare what traits a particular type has. For example, we can say things like floats are continuous, booleans are categorical, etc.

statqualia(::Type{<:AbstractFloat}) = Continuous()
statqualia(::Type{<:Integer}) = Ordinal()

statqualia(::Type{<:Bool}) = Categorical()
statqualia(::Type{<:AbstractString}) = Categorical()

statqualia(::Type{<:Complex}) = Normable()

Using Traits

To use a trait we need to re-dispatch upon it. This is where we take the type of an input, and invoke the trait function on it to get objects of the trait type, then dispatch on those.

In the following example we are going to define a bounds function, which will define some indication of the range of values a particular type has. It will be defined on a collection of objects with a particular trait, and it will be defined differently depending on which statqualia they have.

using LinearAlgebra

# This is the trait re-dispatch; get the trait from the type
bounds(xs::AbstractVector{T}) where T = bounds(statqualia(T), xs)

# These functions dispatch on the trait
bounds(::Categorical, xs) = unique(xs)
bounds(::Normable, xs) = maximum(norm.(xs))
bounds(::Union{Ordinal, Continuous}, xs) = extrema(xs)

Using the above:

julia> bounds([false, false, true])
2-element Array{Bool,1}:
 false
 true

julia> bounds([false, false, false])
1-element Array{Bool,1}:
 false

julia> bounds([1,2,3,2])
(1, 3)

julia> bounds([1+1im, -2+4im, 0+-2im])
4.47213595499958

We can also extend traits after the fact: for example, if we want to add the property that vectors have norms defined, we could define:

julia> statqualia(::Type{<:AbstractVector}) = Normable()
statqualia (generic function with 6 methods)

julia> bounds([[1,1], [-2,4], [0,-2]])
4.47213595499958

Back to AsList

First, we define the trait type and trait function:

struct List end
struct Nonlist end

islist(::Type{<:AbstractVector}) = List()
islist(::Type{<:Tuple}) = List()
islist(::Type{<:Number}) = Nonlist()

Then we define trait dispatch:

aslist(x::T) where T = aslist(islist(T), x)
aslist(::List, x) = x
aslist(::Nonlist, x) = [x]

This allows aslist to work as expected.

julia> aslist(1)
1-element Array{Int64,1}:
 1

julia> aslist([1,2,3])
3-element Array{Int64,1}:
 1
 2
 3

julia> aslist([1])
1-element Array{Int64,1}:
 1

As discussed above this is fully extensible.

Dynamic dispatch as fallback.

All the traits discussed so far have been fully-static, and they compile away. We can also write runtime code, at a small runtime cost. The following makes a runtime call to hasmethod to look up if the given type has an iterate method defined. (There are plans to make hasmethod compile time. But for now it can only be done at runtime.) Similar code to this can be used to dispatch on the values of objects.

islist(T) = hasmethod(iterate, Tuple{T}) ? List() : Nonlist()

We can see that this works on strings, as it does not wrap the following into an array.

julia> aslist("ABC")
"ABC"

Traits on functions

We can also attach traits to functions, because functions are instances of singleton types, e.g. foo::typeof(foo). We can use this idea for declarative input transforms.

As an example, we could have different functions expect the arrangement of observations to be different. More specifically, different machine learning models might expect the inputs be:

  • Iterator of Observations.
  • Matrix with Observations in Rows.
  • Matrix with Observations in Columns.

This isn’t a matter of personal perference or different field standards; there are good performance-related reasons to choose one the above options depending on what operations are needed. As a user of a model, however, we should not have to think about this.

The following examples use LIBSVM.jl, and DecisionTree.jl. In practice we can just use the MLJ interface instead, which takes care of this kind of thing.

In this demonstration of the simple use of a few ML libraries, first we need some basic functions to deal with our data. In particular we want to define get_true_classes which as it would naturally be written would expect an iterator of observations.

using Statistics
is_large(x) = mean(x) > 0.5
get_true_classes(xs) = map(is_large, xs)

inputs = rand(100, 1_000);  # 100 features, 1000 observations
labels = get_true_classes(eachcol(inputs));

For simplicity, we will simply test on our training data, which should be avoided in general (outside of simple validation that training is working).

The first library we will consider is LIBSVM, which expect the data to be a matrix with 1 observation per column.

using LIBSVM
svm = svmtrain(inputs, labels)

estimated_classes_svm, probs = svmpredict(svm, inputs)
mean(estimated_classes_svm .== labels)

Next, lets try DecisionTree.jl, which expects data to be a matrix with 1 observation per row.

using DecisionTree
tree = DecisionTreeClassifier(max_depth=10)
fit!(tree, permutedims(inputs), labels)

estimated_classes_tree = predict(tree, permutedims(inputs))
mean(estimated_classes_tree .== labels)

As we can see above, we had to know what each function needed, and use eachcol and permutedims to modify the data. The user should not need to remember these details; they should simply be encoded in the program.

Traits to the Rescue

We will attach a trait to each function that needed to rearrange its inputs. There is a more sophisticated version of this, which also attaches traits to inputs saying how the observations are currently arranged, or lets the user specify. For simplicity, we will assume the data starts out as a matrix with 1 observations per column.

We are considering three possible ways a function might like its data to be arranged. Each of these will define a different trait type.

abstract type ObsArrangement end

struct IteratorOfObs <: ObsArrangement end
struct MatrixColsOfObs <: ObsArrangement end
struct MatrixRowsOfObs <: ObsArrangement end

We can encode this information about each function expectation into a trait, rather than force the user to look it up from the documentation.

# Our intial code:
obs_arrangement(::typeof(get_true_classes)) = IteratorOfObs()

# LIBSVM
obs_arrangement(::typeof(svmtrain)) = MatrixColsOfObs()
obs_arrangement(::typeof(svmpredict)) = MatrixColsOfObs()

# DecisionTree
obs_arrangement(::typeof(fit!)) = MatrixRowsOfObs()
obs_arrangement(::typeof(predict)) = MatrixRowsOfObs()

We are also going to attach some simple traits to the data types to say whether or not they contain observations. We will use value types for this, rather than fully declare the trait types, so we can just skip straight to declaring the trait functions:

# All matrices contain observations
isobs(::AbstractMatrix) = Val{true}()

# It must be iterator of vectors, else it doesn't contain observations
isobs(::T) where T = Val{eltype(T) isa AbstractVector}()

Next, we can define model_call: a function which uses the traits to decide how to rearrange the observations before calling the function, based on the type of the function and on the type of the argument.

function model_call(func, args...; kwargs...)
    return func(maybe_organise_obs.(func, args)...; kwargs...)
end

# trait re-dispatch: don't rearrange things that are not observations
maybe_organise_obs(func, arg) = maybe_organise_obs(func, arg, isobs(arg))
maybe_organise_obs(func, arg, ::Val{false}) = arg
function maybe_organise_obs(func, arg, ::Val{true})
    organise_obs(obs_arrangement(func), arg)
end

# The heavy lifting for rearranging the observations
organise_obs(::IteratorOfObs, obs_iter) = obs_iter
organise_obs(::MatrixColsOfObs, obsmat::AbstractMatrix) = obsmat

organise_obs(::IteratorOfObs, obsmat::AbstractMatrix) = eachcol(obsmat)
function organise_obs(::MatrixColsOfObs, obs_iter)
    reduce(hcat, obs_iter)
end

function organise_obs(::MatrixRowsOfObs, obs)
    permutedims(organise_obs(MatrixColsOfObs(), obs))
end

Now, rather than calling things directly, we can use model_call, which takes care of rearranging things. Notice that the code no longer needs to be aware of the particular cases for each library, which makes things much easier for the end-user: just use model_call and don’t worry about how the data is arranged.

inputs = rand(100, 1_000);  # 100 features, 1000 observations
labels = model_call(get_true_classes, inputs);
using LIBSVM
svm = model_call(svmtrain, inputs, labels)

estimated_classes_svm, probs = model_call(svmpredict, svm, inputs)
mean(estimated_classes_svm .== labels)
using DecisionTree
tree = DecisionTreeClassifier(max_depth=10)
model_call(fit!, tree, inputs, labels)

estimated_classes_tree = model_call(predict, tree, inputs)
mean(estimated_classes_tree .== labels)

Conclusion: JuliaLang is not magic

In this series of posts we have seen a few examples of how certain features can give rise to other features. Unit syntax, Closure-based Objects, Contextual Compiler Passes, and Traits, all just fall out of the combination of other features. We have also seen that Types turn out to be very powerful, especially with multiple dispatch.


The Emergent Features of JuliaLang: Part I

Author: Lyndon White

Introduction

This blog post is based on a talk originally given at a Cambridge PyData Meetup, and also at a London Julia Users Meetup.

Julia is known to be a very expressive language with many interesting features. It is worth considering that some features were not planned, but simply emerged from combinations of other features. This post will describe how several interesting features are implemented:

  1. Unit syntactic sugar,
  2. Pseudo-OO objects with public/private methods,
  3. Dynamic Source Tranformation / Custom Compiler Passes (Cassette),
  4. Traits.

Some of these (1 and 4) should be used when appropriate, while others (2 and 3) are rarely appropriate, but are instructive.

Part I of this post (which you are reading now) will focus on the first three. Part II of this post is about traits, which deserve there own section because they are one of the most powerful and interesting Julia features. They emerge from types, multiple dispatch, and the ability to dispatch on the types themselves (rather than just instances). We will review both the common use of traits on types, and also the very interesting—but less common—use of traits on functions.

There are many other emergent features which are not discussed in this post. For example, creating a vector using Int[] is an overload of getindex, and constructors are overloads of ::Type{<:T}().

Juxtaposition Multiplication: Convenient Syntax for Units

Using units in certain kinds of calculations can be very helpful for protecting against arithmetic mistakes. The package Unitful provides the ability to do calculations using units in a very natural way, and makes it easy to write things like “2 meters” as 2m. Here is an example of using Unitful units:

julia> using Unitful.DefaultSymbols

julia> 1m * 2m
2 m^2

julia> 10kg * 15m / 1s^2
150.0 kg m s^-2

julia> 150N == 10kg * 15m / 1s^2
true

How does this work? The answer is Juxtaposition Multiplication—a literal number placed before an expression results in multiplication:

julia> x = 0.5π
1.5707963267948966

julia> 2x
3.141592653589793

julia> 2sin(x)
2.0

To make this work, we need to overload multiplication to invoke the constructor of the unit type. Below is a simplified version of what goes on under the hood of Unitful.jl:

abstract type Unit<:Real end
struct Meter{T} <: Unit
    val::T
end

Base.:*(x::Any, U::Type{<:Unit}) = U(x)

Here we are overloading multiplication with a unit subtype, not an instance of a unit subtype (Meter(2)), but with the subtype itself (Meter). This is what ::Type{<:Unit} refers to. We can see this if we write:

julia> 5Meter
Meter{Int64}(5)

This shows that we create a Meter object with val=5.

To get to a full units system, we then need to define methods for everything that numbers need to work with, such as addition and multiplication. The final result is units-style syntactic sugar.

Closures give us “Classic Object-Oriented Programming”

First, it is important to emphasize: don’t do this in real Julia code. It is unidiomatic, and likely to hit edge-cases the compiler doesn’t optimize well (see for example the infamous closure boxing bug). This is all the more important because it often requires boxing (see below).

“Classic OO” has classes with member functions (methods) that can see all the fields and methods, but that outside the methods of the class, only public fields and methods can be seen. This idea was originally posted for Julia in one of Jeff Bezanson’s rare Stack Overflow posts, which uses a classic functional programming trick.

Let’s use closures to define a Duck type as follows:

function newDuck(name)
    # Declare fields:
    age=0

    # Declare methods:
    get_age() = age
    inc_age() = age+=1

    quack() = println("Quack!")
    speak() = quack()

    #Declare public:
    ()->(get_age, inc_age, speak, name)
end

This implements various things we would expect from “classic OO” encapsulation. We can construct an object and call public methods:

julia> bill_the_duck = newDuck("Bill")
#7 (generic function with 1 method)

julia> bill_the_duck.get_age()
0

Public methods can change private fields:

julia> bill_the_duck.inc_age()
1

julia> bill_the_duck.get_age()
1

Outside the class we can’t access private fields:

julia> bill_the_duck.age
ERROR: type ##7#12 has no field age

We also can’t access private methods:

julia> bill_the_duck.quack()
ERROR: type ##7#12 has no field quack

However, accessing their functionality via public methods works:

julia> bill_the_duck.speak()
Quack!

How does this work?

Closures return singleton objects, with directly-referenced, closed variables as fields. All the public fields/methods are directly referenced, but the private fields (e.g age, quack) are not—they are closed over other methods that use them. We can see those private methods and fields via accessing the public method closures:

julia> bill_the_duck.speak.quack
(::var"#quack#10") (generic function with 1 method)

julia> bill_the_duck.speak.quack()
Quack!
julia> bill_the_duck.inc_age.age
Core.Box(1)

An Aside on Boxing

The Box type is similar to the Ref type in function and purpose. Box is the type Julia uses for variables that are closed over, but which might be rebound to a new value. This is the case for primitives (like Int) which are rebound whenever they are incremented. It is important to be clear on the difference between mutating the contents of a variable, and rebinding that variable name.

julia> x = [1, 2, 3, 4];

julia> objectid(x)
0x79eedc509237c203

julia> x .= [10, 20, 30, 40];  # mutating contents

julia> objectid(x)
0x79eedc509237c203

julia> x = [100, 200, 300, 400];  # rebinding the variable name

julia> objectid(x)
0xfa2c022285c148ed

In closures, boxing applies only to rebinding, though the closure bug means Julia will sometimes over-eagerly box variables because it believes that they might be rebound. This has no bearing on what the code does, but it does impact performance.

While this kind of code itself should never be used (since Julia has a perfectly functional system for dispatch and works well without “Classic OO”-style encapsulation), knowing how closures work opens other opportunities to see how they can be used. In our ChainRules.jl project, we are considering the use of closures as callable named tuples as part of a difficult problem in extensibility and defaults.

Cassette, etc.

Cassette is built around a notable Julia feature, which goes by several names: Custom Compiler Passes, Contextual Dispatch, Dynamically-scoped Macros or Dynamic Source Rewriting. This same trick is used in IRTools.jl, which is at the heart of the Zygote automatic differentiation package. This is an incredibly powerful and general feature, and was the result of a very specific Issue #21146 and very casual PR #22440 suggestion that it might be useful for one particular case. These describe everything about Cassette, but only in passing.

The Custom Compiler Pass feature is essential for:

Call Overloading

One of the core uses of Cassette is the ability to effectively overload what it means to call a function. Call overloading is much more general than operator overloading, as it applies to every call (special-cased as appropriate), whereas operator overloading applies to just one function call and just one set of types.

To give a concrete example of how call overloading is more general, operator overloading/dispatch (multiple or otherwise) would allow us to to overload, for example, sin(::T) for different types T. This way sin(::DualNumber) could be specialized to be different from sin(::Float64), so that with DualNumber as input it could be set to calculate the nonreal part using the cos (which is the derivative of sin). This is exactly how DualNumbers operate. However, operator overloading can’t express the notion that, for all functions f, if the input type is DualNumber, that f should calculate the dual part using the deriviative of f. Call overloading allows for much more expressivity and massively simplifies the implementation of automatic differentiation.

The above written out in code:

function Cassette.overdub(::AutoDualCtx, f, d::DualNumber)
    res = ChainRules.frule(f, val(d))
    if res !== nothing
        f_val, pushforward = res
        f_diff = pushforward(partial(f))
        return Dual(f_val, extern(f_diff))
    else
        return f(d)
    end
end

That is just one of the more basic things that can be done with Cassette. Before we can explain how Cassette works, we need to understand how generated functions work, and before that we need to understand the different ways code is presented in Julia, as it is compiled.

Layers of Representation

Julia has many representations of the code it moves through during compilation.

  • Source code (CodeTracking.definition(String,....)),
  • Abstract Syntax Tree (AST): (CodeTracking.definition(Expr, ...),
  • Untyped Intermediate Represention (IR): @code_lowered,
  • Typed Intermediate Represention: @code_typed,
  • LLVM: @code_llvm,
  • ASM: @code_native.

We can retrieve the different representations using the functions/macros indicated.

The Untyped IR is of particular interest as this is what is needed for Cassette. This is a linearization of the AST, with the following properties:

  • Only 1 operation per statement (nested expressions get broken up);
  • the return values for each statement are accessed as SSAValue(index);
  • variables become Slots;
  • control-flow becomes jumps (like Goto);
  • function names become qualified as GlobalRef(mod, func).

Untyped IR is stored in a CodeInfo object. It is not particularly pleasent to write, though for a intermidate representation it is fairly readable. The particularly hard part is that expression return values are SSAValues which are defined by position. So if code is added, return values need to be renumbered. Core.Compiler, Cassette and IRTools define various helpers to make this easier, but for out simple example we don’t need them.

Generated Functions

Generated functions take types as inputs and return the AST (Abstract Syntax Tree) for what code should run, based on information in the types. This is a kind of metaprogramming. Take for example a function f with input an N-dimensional array (type AbstractArray{T, N}). Then a generated function for f might construct code with N nested loops to process each element. It is then possible to generate code that only accesses the fields we want, which gives substantial performance improvements.

For a more detailed example lets consider merging NamedTuples. This could be done with a regular function:

function merge(a::NamedTuple{an}, b::NamedTuple{bn}) where {an, bn}
    names = merge_names(an, bn)
    types = merge_types(names, typeof(a), typeof(b))
    NamedTuple{names,types}(map(n->getfield(sym_in(n, bn) ? b : a, n), names))
end

This checks at runtime what fields each NamedTuple has, in order to decide what will be in the merge. However, we actually know all this information based on the types alone, because a list of fields is stored as part of the type (that is the an and bn type-parameters.)

@generated function merge(a::NamedTuple{an}, b::NamedTuple{bn}) where {an, bn}
    names = merge_names(an, bn)
    types = merge_types(names, a, b)
    vals = Any[ :(getfield($(sym_in(n, bn) ? :b : :a), $(QuoteNode(n)))) for n in names ]
    :( NamedTuple{$names,$types}(($(vals...),)) )
end

Recall that a generated function returns an AST, which is the case in the examples above. However, it is also allowed to return a CodeInfo for Untyped IR.

Making Cassette

There are two key facts mentioned above:

  1. Generated Functions are allowed to return CodeInfo for Untyped IR.
  2. @code_lowered f(args...) provided the ability to retrieve the CodeInfo for the given method of f.

The core of Cassete it to use a generated function to call code defined in a new CodeInfo of Untyped IR, which is based on a modified copy of code that is returned by @code_lowered. To properly understand how it works, lets run through a manual example (originally from a JuliaCon talk on MagneticReadHead.jl).

We define a generated function rewritten that makes a copy of the Untyped IR (a CodeInfo object that it gets back from @code_lowered) and then mutates it, replacing each call with a call to the function call_and_print. Finally, this returns the new CodeInfo to be run when it is called.

call_and_print(f, args...) = (println(f, " ", args); f(args...))


@generated function rewritten(f)
    ci_orig = @code_lowered f.instance()
    ci = ccall(:jl_copy_code_info, Ref{Core.CodeInfo}, (Any,), ci_orig)
    for ii in eachindex(ci.code)
        if ci.code[ii] isa Expr && ci.code[ii].head==:call
            func = GlobalRef(Main, :call_and_print)
            ci.code[ii] = Expr(:call, func, ci.code[ii].args...)
        end
    end
    return ci
end

(Notice: this exact version of the code works on julia 1.0, 1.1, and 1.2, and also prints a ton of warnings as it hurts type-inference a lot. But slightly improved versions of it exist in Cassette, that don’t make type-inference cry, and that work in all Julia versions. This just gets the point across.)

We can see that this works:

julia> foo() = 2*(1+1)
foo (generic function with 1 method)

julia> rewritten(foo)
+ (1, 1)
* (2, 2)
4

Rather than replacing each call with call_and_print, we could instead call a function that does the work we are interested in, and then calls rewritten on the function that would have been called:

function work_and_recurse(f, args...)
    println(f, " ", args)
    rewritten(f, args...)
end

So, not only does the function we call get rewritten, but so does every function it calls, all the way down.

This is how Cassette and IRTools work. There are a few complexities and special cases that need to be taken care of, but this is the core of it: recursive invocation of generated functions that rewrite the IR, similar to what is returned by @code_lowered.

Wrapping up Part I

In this part we covered the basics of:

  1. Unit syntactic sugar,
  2. Pseudo-OO objects with public/private methods,
  3. Dynamic Source Tranformation / Custom Compiler Passes (Cassette).

In Part II, we will discuss Traits!


Memristive dynamics: a heuristic optimization tool

Author: Ana Zegarac and Francesco Caravelli

\(\newcommand{\Ron}{R_{\text{on}}}\) \(\newcommand{\Roff}{R_{\text{off}}}\) \(\newcommand{\dif}{\mathop{}\!\mathrm{d}}\) \(\newcommand{\coloneqq}{\mathrel{\vcenter{:}}=}\)

Introduction

In this post we will explore an interesting area of modern circuit theory and computing: how exotic properties of certain nanoscale circuit components (analogous to tungsten filament lightblulbs) can be used for solving certain optimization problems. Let’s start with a simple example: imagine a circuit made up of resistors, all of which have the same resistance.

circuit

If we connect points A and B of the circuit above to a battery, current will flow through the conductors (let’s say from A to B), according to Kirchhoff’s laws. Because of the symmetry of the problem, the current splits at each intersection according to the inverse resistance rule, and so the currents will flow equally in all directions to reach B. This implies that if we follow the maximal current at each intersection, we can use this circuit to solve the minimum distance (or cost) problem on a graph. This can be seen as an example of a physical system performing analog computation. More generally, there is a deep connection between graph theory and resistor networks.

In 2008, researchers from Hewlett Packard introduced what is now called the “nanoscale memristor” 1. This is a type of resistor made of certain metal oxides such as tungsten or titanium, for which the resistance changes as a function of time in a peculiar way. For the case of titanium dioxide, the resistance changes between two limiting values according to a simple convex law depending on an internal parameter, \(w\), which is constrained between \(0\) and \(1\):

\[R(w)= (1-w) \Ron + w \Roff,\]

where,

\[\frac{\dif}{\dif t} w(t)=\alpha w- \Ron \frac{I}{\beta},\]

and \(I\) is the current in the device. This is a simple prototypical model of a memory-resistor, or memristor.

In the general case of a network of memristors, the above differential equation becomes 2 3 4 5:

\[\frac{\dif}{\dif t}\vec{w}(t)=\alpha\vec{w}(t)-\frac{1}{\beta} \left(I+\frac{\Roff-\Ron}{\Ron} \Omega W(t)\right)^{-1} \Omega \vec S(t),\]

where \(\Omega\) is a matrix representing the circuit topology, \(\vec S(t)\) are the applied voltages, and \(W\) a diagonal matrix of the values of \(w\) for each memristor in the network.

It is interesting to note that the equation above is related to a class of optimization problems, similarly to the case of a circuit of simple resistors. In the memristor case, this class is called quadratically unconstrained binary optimization (QUBO). The solution of a QUBO problem is the set of binary parameters, \(w_i\), that minimize (or maximize) a function of the form:

\[F(\vec w)= -\frac p 2 \sum_{ij} w_i J_{ij} w_j + \sum_j h_j w_j.\]

This is because the dynamics of memristors, which are described by the equation above, is such that a QUBO functional is minimized. For the case of realistic circuits, \(\Omega\) satisfies certain properties which we will discuss below, but from the point of view of optimization theory, the system of differential equations can in principle be simulated for arbitrary \(\Omega\). Therefore, the memristive differential equation can serve as a heuristic solution method for an NP-Complete problem such as QUBO 4. These problems are NP-Complete because there is no known algorithm that is better than exhaustive search: because of the binary nature of the variables, in the worst case we have to explore all \(2^N\) possible values of the variables \(w\) to find the extrema of the problem. In a sense, the memristive differential equation is a relaxation of the QUBO problem to continuous variables.

Let’s look at an application to a standard problem: given the expected returns and covariance matrix between prices of some financial assets, which assets should an investor allocate capital to? This setup, with binary decision variables, is different than the typical question of portfolio allocation, in which the decision variables are real valued (fractions of wealth to allocate to assets). This is a combinatorial decision whether or not to invest in a given asset, but not the amount: the combinatorial Markowitz problem. The heuristic solution we propose has been introduced in the appendix of this paper. This can also be used as a screening procedure before the real portfolio optimization is performed, i.e., as a dimensionality reduction technique. More formally, the objective is to maximize:

\[M(W)=\sum_i \left(r_i-\frac{p}{2}\Sigma_{ii} \right)W_i-\frac{p}{2} \sum_{i\neq j} W_i \Sigma_{ij} W_j,\]

where \(r_i\) are the expected returns and \(\Sigma\) the covariance matrix. The mapping between the above and the equivalent memristive equation is given by:

\[\begin{align} \Sigma &= \Omega, \\ \frac{\alpha}{2} +\frac{\alpha\xi}{3}\Omega_{ii} -\frac{1}{\beta} \sum_j \Omega_{ij} S_j &= r_i-\frac{p}{2}\Sigma_{ii}.\\ \frac{p}{2} &= \alpha \xi. \end{align}\]

The solution vector \(S\) is obtained through inversion of the matrix \(\Sigma\) (if it is invertible). There are some conditions for this method to be suitable for heuristic optimization (related to the spectral properties of the matrix \(J\)), but as a heuristic method, this is much cheaper than exhaustive search: it requires a single matrix inversion which scales as \(O(N^3)\), and the simulation of a first order differential equation which also requires a matrix inversion step by step, and thus scales as \(O(T \cdot N^3)\) where \(T\) is the number of time steps. We propose two approaches to this problem: one based on the most common Metropolis-Hasting-type algorithm (at each time step we flip a single weight, which can be referred as “Gibbs sampling”), and the other on the heuristic memristive equation.

Sample implementation

Below we include julia code for both algorithms.

using FillArrays
using LinearAlgebra

"""
    monte_carlo(
        expected_returns::Vector{Float64},
        Σ::Matrix{Float64},
        p::Float64,
        weights_init::Vector{Bool};
        λ::Float64=0.999995,
        effective_its::Int=3000,
        τ::Real=0.75,
    )

Execute Monte Carlo in order to find the optimal portfolio composition,
considering an asset covariance matrix `Σ` and a risk parameter `p`. `reg`
represents the regularisation constant for `Σ`, `λ` is the annealing factor, with
the temperature `τ` decreases at each step and `effective_its` determines how
many times, on average, each asset will have its weight changed.
"""
function monte_carlo(
    expected_returns::Vector{Float64},
    Σ::Matrix{Float64},
    p::Float64,
    weights_init::Vector{Bool};
    λ::Float64=0.999995,
    effective_its::Int=3000,
    τ::Real=0.75,
)
    weights = weights_init
    n = size(weights_init, 1)

    total_time = effective_its * n
    energies = Vector{Float64}(undef, total_time)

    # Compute energy of initial configuration
    energy_min = energy(expected_returns, Σ, p, weights_init)

    for t in 1:total_time
        # Anneal temperature
        τ = τ * λ
        # Randomly draw asset
        rand_index = rand(1:n)
        weights_current = weights
        # Flip weight from 0 to 1 and vice-versa
        weights_current[rand_index] = 1 - weights_current[rand_index]
        # Compute new energy
        energy_current = energy(expected_returns, Σ, p, weights)

        # Compute Boltzmann factor
        β = exp(-(energy_current - energy_min) / τ)

        # Randomly accept update
        if rand() < min(1, β)
            energy_min = energy_current
            weights = weights_current
        end

        # Store current best energy
        energies[t] = energy_min
    end
    return weights, energies
end

"""
    memristive_opt(
        expected_returns::Vector{Float64},
        Σ::Matrix{Float64},
        p::Float64,
        weights_init::Vector{<:Real};
        α=0.1,
        β=10,
        δt=0.1,
        total_time=3000,
    )

Execute optimisation via the heuristic "memristive" equation in order to find the
optimal portfolio composition, considering an asset covariance matrix `Σ` and a
risk parameter `p`. `reg` represents the regularisation constant for `Σ`, `α` and
`β` parametrise the memristor state (see Equation (2)),
`δt` is the size of the time step for the dynamical updates and `total_time` is
the number of time steps for which the dynamics will be run.
"""
function memristive_opt(
    expected_returns::Vector{Float64},
    Σ::Matrix{Float64},
    p::Float64,
    weights_init::Vector{<:Real};
    α=0.1,
    β=10,
    δt=0.1,
    total_time=3000,
)
    n = size(weights_init, 1)

    weights_series = Matrix{Float64}(undef, n, total_time)
    weights_series[:, 1] = weights_init
    energies = Vector{Float64}(undef, total_time)
    energies[1] = energy(expected_returns, Σ, p, weights_series[:, 1])

    # Compute resistance change ratio
    ξ = p / 2α
    # Compute Σ times applied voltages matrix
    ΣS = β * (α/2 * ones(n, 1) + (p/2 + α * ξ/3) * diag(Σ) - expected_returns)

    for t in 1:total_time-1
        update = δt * (α * weights_series[:, t] - 1/β * (Eye(n) + ξ *
            Σ * Diagonal(weights_series[:, t])) \ ΣS)
        weights_series[:, t+1] = weights_series[:, t] + update

        weights_series[weights_series[:, t+1] .> 1, t+1] .= 1.0
        weights_series[weights_series[:, t+1] .< 1, t+1] .= 0.0

        energies[t + 1] = energy(expected_returns, Σ, p, weights_series[:, t+1])
    end

    weights_final = weights_series[:, end]

    return weights_final, energies
end

"""
    energy(
        expected_returns::Vector{Float64},
        Σ::Matrix{Float64},
        p::Float64,
        weights::Vector{Float64},
    )

Return minus the expected return corrected by the variance of the portfolio,
according to the Markowitz approach. `Σ` represents the covariance of the assets,
`p` controls the risk tolerance and `weights` represent the (here binary)
portfolio composition.
"""
function energy(
    expected_returns::Vector{Float64},
    Σ::Matrix{Float64},
    p::Float64,
    weights::Vector{<:Real},
)
    -dot(expected_returns, weights) + p/2 * weights' * Σ * weights
end

A simple comparison between the two approaches shows the superiority of the memristor equation-based approach for this problem. We used a portfolio of 200 assets, with randomly generated initial allocations, expected returns and covariances. After 3000 effective iterations (that is 3000 times 200, the number of assets), the Monte Carlo approach leads to a corrected expected portfolio return of 3.9, while the memristive approach, in only a few steps, already converges to a solution with a corrected expected portfolio return of 19.5! Moreover, the solution found via Monte Carlo allocated investment to 91 different assets, while the memristor-based one only invested in 71 assets, thus showing an even better return to investment. There are obvious caveats here, as each of these methods have their own parameters which can be tuned in order to improve performance (which we did try doing), and it is known that Monte Carlo methods are ill-suited to combinatorial problems. Yet, such a stark difference shows how powerful memristors can be for optimization purposes (in particular for quickly generating some sufficiently good but not necessarily optimal solutions).

The equation for memristors is also interesting in other ways, for example, it has graph theoretical underpinnings 2 3 4. Further, the memristor network equation is connected to optimization 5 6 7. These are summarized in 8. For a general overview of the field of memristors, see the recent review 9.


  1. D. B. Strukov, G. S. Snider, D. R. Stewart, R. S. Williams, Nature 453, pages 80–83 (01 May 2008) 

  2. F. Caravelli, F. L. Traversa, M. Di Ventra, Phys. Rev. E 95, 022140 (2017) - https://arxiv.org/abs/1608.08651  2

  3. F. Caravelli, International Journal of Parallel, Emergent and Distributed Systems, 1-17 (2017) - https://arxiv.org/abs/1611.02104  2

  4. F. Caravelli, Phys. Rev. E 96, 052206 (2017) - https://arxiv.org/abs/1705.00244  2 3

  5. F. Caravelli, Entropy 2019, 21(8), 789 - https://www.mdpi.com/1099-4300/21/8/789, https://arxiv.org/abs/1712.07046  2

  6. Bollobás, B., 2012. Graph theory: an introductory course (Vol. 63). Springer Science & Business Media. 

  7. http://www.gipsa-lab.fr/~francis.lazarus/Enseignement/compuTopo3.pdf 

  8. A. Zegarac, F. Caravelli, EPL 125 10001, 2019 - https://iopscience.iop.org/article/10.1209/0295-5075/125/10001/pdf 

  9. F. Caravelli and J. P. Carbajal, Technologies 2018, 6(4), 118 - https://engrxiv.org/c4qr9 


JuliaCon 2019

Author: Eric Davies, Fernando Chorney, Glenn Moynihan, Lyndon White, Mary Jo Ramos, Nicole Epp, Sean Lovett, Will Tebbutt

Some of us at Invenia recently attended JuliaCon 2019, which was held at the University of Maryland, Baltimore, on July 21-26.

It was the biggest JuliaCon yet, with more than 370 participants, but still had a welcoming atmosphere as in previous years. There were plenty of opportunities to catch up with people that you work with regularly but almost never actually have a face-to-face conversation with.

We sent 16 developers and researchers from our Winnipeg and Cambridge offices. As far as we know, this was the largest contingent of attendees after Julia Computing! We had a lot of fun and thought it would be nice to share some of our favourite parts.

Julia is more than scientific computing

For someone who is relatively new to Julia, it may be difficult to know what to expect from a JuliaCon. Consider someone who has used Julia before in a few ways, but who has yet to fully grasp the power of the language.

One of the first things to notice is just how diverse the language is. It may initially seem as a very special tool for the scientific community, but the more one learns about it, the more evident it becomes that it could be used to tackle all sorts of problems.

Workshops were tailored for individuals with a range of Julia knowledge. Beginners had the opportunity to play around with Julia in basic workshops, and more advanced workshops covered topics from differential equations to parallel computing. There was something for everyone and no reason to feel intimidated.

The Intermediate Julia for Scientific Computing workshop highlighted Julia’s multiple dispatch and meta-programming capabilities. Both are very useful and interesting tools. As one learns more about Julia and sees the growth of the community, it becomes easier to understand the reasons why some love the language so much. It has the scientific usage without compromising on speed, readability, or usefulness.

Julia is welcoming!

The various workshops and talks that encouraged diverse groups to utilize and contribute to the language were also great to see. It is uplifting to witness such a strong initiative to make everyone feel welcome.

The Diversity and Inclusion BOF provided a space for anyone to voice opinions, concerns and suggestions on how to improve the Julia experience for everyone. The Diversity and Inclusion session showcased how to foster the community’s growth. The speakers were educators who shared their experiences using Julia as a tool to enhance education for women, minorities, and students with lower socioeconomic backgrounds. The inspirational work done by these individuals – and everyone at JuliaCon – prove that anyone can use Julia and succeed with the community’s support.

The community has a big role to play

Heather Miller’s keynote on Scala’s experience as an open source project was another highlight. It is staggering to see what the adoption of open source software in industry looks like, as well as learning about the issues that come up with growth, and the importance of the community. Although Heather’s snap polls at the end seemed broadly positive about the state of Julia’s community, they suggested that the level to which we’re mentoring newer members of the community is lower than ideal.

Birds of a Feather Sessions

This year Birds of a Feather (BoF) sessions were a big thing. In previous JuliaCons, there were only one or two, but this year we had twelve. Each session was unique and interesting, and in almost every case people wished they could continue after the hour was up. The BoFs were an excellent chance to discuss and explore topics that are difficult to do in a formal talk. In many ways this is the best reason to go to a conference: to connect with people, make plans, and learn deeply. Normally this happens in cramped corridors or during coffee breaks, which certainly did happen this year still, but the BoFs gave the whole process just a little bit more structure, and also tables.

Each BoF was different and good in its own ways, and none would have been half as good without getting everyone in the same room. We mentioned the Diversity BoF earlier. The Parallelism BoF ended with everyone breaking into four groups, each of which produced notes on future directions for parallelism in Julia. Our own head of development, Curtis Vogt, ran the “Julia In Production” BoF which showed that both new and established companies are adopting Julia, and led to some useful discussion about better support for Julia in Cloud computing environments.

The Cassette BoF was particularly good. It had everyone talking about what they were using Cassette for. Our own Lyndon White presented his new Cassette-like project, Arborist, and some of the challenges it faces, including understanding of why the compiler behaves the way it does. We also learned that neither Jarret Revels (the creator of Cassette), nor Valentin Churavy (its current maintainer) know exactly what Tagging in Cassette does.

Support tools are becoming well established

With the advent of Julia 1.0 at JuliaCon 2018 we saw the release of a new Pkg.jl; a far more robust and user-friendly package manager. However, many core tools to support the maturing package ecosystem were yet to emerge until JuliaCon 2019. This year we saw the introduction of a new debugger, a formatter, and an impressive documentation generator.

For the past year, the absence of a debugger that equalled the pleasure of using Pkg.jl remained an open question. An answer was unveiled in Debugger.jl, by the venerable Tim Holy, Sebastian Pfitzner, and Kristoffer Carlsson. They discussed the ease with which Debugger.jl can be used to declare break points, enter functions, traverse lines of code, and modify environment variables, all from within a new debugger REPL mode! This is a very welcome addition to the Julia toolbox.

Style guides are easy to understand but it’s all too easy to misstep in writing code. Dominique Luna gave a simple walkthrough of his JuliaFormatter.jl package, which formats the line width of source code according to specified parameters. The formatter spruces up and nests lines of code to present more aesthetically pleasing text. Only recently registered as a package, and not a comprehensive linter, it is still a good step in the right direction, and one that will save countless hours of code review for such a simple package.

Code is only as useful as its documentation and Documenter.jl is the canonical tool for generating package documentation. Morten Piibeleht gave an excellent overview of the API including docstring generation, example code evaluation, and using custom CSS for online manuals. A great feature is the inclusion of doc testing as part of a unit testing set up to ensure that examples match function outputs. While Documenter has had doctests for a long time, they are now much easier to trigger: just add using Documenter, MyPackage; doctest(MyPackage) to your runtests.jl file. Coupled with Invenia’s own PkgTemplates.jl, creating a maintainable package framework has never been easier.

Probabilistic Programming: Julia sure does do a lot of it

Another highlight was the somewhat unexpected probabilistic programming track on Thursday afternoon. There were 5 presentations, each on a different framework and take on what probabilistic programming in Julia can look like. These included a talk on Stheno.jl from our own Will Tebbutt, which also contained a great introduction to Gaussian Processes.

Particularly interesting was the Gen for data cleaning talk by Alex Law. This puts the problem of data cleaning – correcting miss-entered, or incomplete data – into a probabilistic programming setting. Normally this is done via deterministic heuristics, for example by correcting spelling by using the Damerau–Levenshtein distance to the nearest word in a dictionary. However, such approaches can have issues, for instance, when correcting town names, the nearest by spelling may be a tiny town with very small population, which is probably wrong. More complicated heuristics can be written to handle such cases, but they can quickly become unwieldy. An alternative to heuristics is to write statements about the data in a probabilistic programming language. For example, there is a chance of one typo, and a smaller chance of two typos, and further that towns with higher population are more likely to occur. Inference can then be run in the probabilistic model for the most likely cleaned field values. This is a neat idea based on a few recent publications. We’re very excited about all of this work, and look forward to further discussions with the authors of the various frameworks.

Composable Parallelism

A particularly exciting announcement was composable multi-threaded parallelism for Julia v1.3.0-alpha.

Safe and efficient thread parallelism has been on the horizon for a while now. Previously, multi-thread parallelism was in an experimental form and generally very limited (see the Base.Threads.@threads macro). This involved dividing the work into blocks that execute independently and then joined into Julia’s main thread. Limitations included not being able to use I/O or to do task switching inside the threaded for-loop of parallel work.

All that is changing in Julia v1.3. One of the most exciting changes is that all system-level I/O operations are now thread-safe. Functions like print can be used during a @spawn or @threads macro for-loop call. Additionally, the @spawn construct (which is like @async, but parallel) was introduced; this has a threaded meaning, moving towards parallelism rather than the pre-existing Distributed standard library export.

Taking advantage of hardware parallel computing capacities can lead to very large speedups for certain workflows, and now it will be much easier to get started with threads. The usual multithreading pitfalls of race conditions and potential deadlocks still exist, but these can generally be worked around with locks and atomic operations where needed. There are many other features and improvements to look forward to.

Neural differential equations: machine learning and physics join forces

We were excited to see DiffEqFlux.jl presented at JuliaCon this year. This combines the excellent DifferentialEquations.jl and Flux.jl packages to implement Neural ODEs, SDEs, PDEs, DDEs, etc. All using state-of-the-art time-integration methods.

The Neural Ordinary Differential Equations paper caused a stir at NeurIPS 2018 for its successful combination of two seemingly-distinct techniques: differential equations and neural networks. More generally there has been a recent resurgence of interest in combining modern machine learning with traditional scientific modelling techniques (see Hidden Physics Models, among others). There are many possible applications for these methods: they have potential for both enriching black-box machine learning models with physical insights, and conversely using data to learn unknown terms in structured physical models. For example, it is possible to use a differential equations solver as a layer embedded within a neural network, and train the resulting model end-to-end. In the latter case, it is possible to replace a term in a differential equation by a neural network. Both these use-cases, and many others, are now possible in DiffEqFlux.jl with just a few lines of code. The generic programming capabilities of Julia really shine through in the flexibility and composability of these tools, and we’re excited to see where this field will go next.

The compiler has come a long way!

The Julia compiler and standard library has come a long way in the last two years. An interesting case came up while Eric Davies was working on IndexedDims.jl at the hackathon.

Take for example the highly-optimized code from base/multidimensional.jl: ​

@generated function _unsafe_getindex!(dest::AbstractArray, src::AbstractArray, I::Vararg{Union{Real, AbstractArray}, N}) where N
   quote
       Base.@_inline_meta
       D = eachindex(dest)
       Dy = iterate(D)
       @inbounds Base.Cartesian.@nloops $N j d->I[d] begin
           # This condition is never hit, but at the moment
           # the optimizer is not clever enough to split the union without it
           Dy === nothing && return dest
           (idx, state) = Dy
           dest[idx] = Base.Cartesian.@ncall $N getindex src j
           Dy = iterate(D, state)
       end
       return dest
   end
end

It is interesting that this now has speed within a factor of two of the following simplified version:

function _unsafe_getindex_modern!(dest::AbstractArray, src::AbstractArray, I::Vararg{Union{Real, AbstractArray}, N}) where N
   @inbounds for (i, j) in zip(eachindex(dest), Iterators.product(I...))
       dest[i] = src[j...]
   end
   return dest
end

In Julia 0.6, this was five times slower than the highly-optimized version above!

It turns out that the type parameter N matters a lot. Removing this explicit type parameter causes performance to degrade by an order of magnitude. While Julia generally specializes on the types of the arguments passed to a method, there are a few cases in which Julia will avoid that specialization unless an explicit type parameter is added: the three main cases are Function, Type, and as in the example above, Vararg arguments.

Some of our other favourite talks

Here are some of our other favourite talks not discussed above:

  1. Heterogeneous Agent DSGE Models
  2. Solving Cryptic Crosswords
  3. Differentiate All The Things!
  4. Building a Debugger with Cassette
  5. FilePaths
  6. Ultimate Datetime
  7. Smart House with JuliaBerry
  8. Why writing C interfaces in Julia is so easy
  9. Open Source Power System Modeling
  10. What’s Bad About Julia
  11. The Unreasonable Effectiveness of Multiple Dispatch

It is always impressive how much is covered every JuliaCon, considering its size. It serves to show both how fast the community is growing and how versatile the language is. We look forward for another one in 2020, even bigger and broader.