# NeurIPS 2019 in Review

Author: Invenia team

## Introduction

In early December 2019, part of the Invenia team headed to Vancouver, Canada, for the annual Neural Information Processing Systems (NeurIPS, previously NIPS) conference. It was the biggest NeurIPS conference yet, with an estimated 13,000 attendees, 6,743 paper submissions, and almost 80 official meetups.

We sent developers, researchers, and people operation representatives from both Cambridge and Winnipeg to attend talks, meet people, and run our booth at the expo. Below are some of our highlights from the week.

## Yoshua Bengio and Blaise Aguera y Arcas Keynotes

The keynote talks by Blaise Aguera y Arcas and Yoshua Bengio offered interesting perspectives and discussed a new, or arguably old, direction for the learning community. Bengio outlined how current deep learning is excelling with “system 1 learning”, tasks that we as humans do quickly and unconsciously. At the same time, Aguera y Arcas discussed AI’s superhuman performance in such tasks. Punctuating the message of both talks, Aguera y Arcas described just how little territory this covers if we are interested in building a more human-like intelligence.

Current learning methods are lacking in what Bengio refers to as “system 2 learning” - slow, conscious, and logical problem-solving. He underlined some of the missing pieces of current deep learning, such as better out-of-distribution generalisation, high-level semantic representation, and an understanding of causality. The notion that state-of-the-art methods may be inadequate seemed even clearer when Aguera y Arcas questioned whether natural intelligence ever performs optimisation. He highlighted that “evolution decides on what is good and bad” rather than having some predefined loss to optimise.

It’s interesting to see that, even after the many recent advances in deep learning, we are still at a loss when it comes to replicating a natural form of intelligence – which was the initial motivation for neural networks. Both speakers addressed meta-learning, a prominent research area at NeurIPS this year, as a step in the right direction. However, the concluding message from Aguera y Arcas was that, to build natural intelligence, we may need to explore brains with evolved architectures, cellular automata, and social or cooperative learning among agents.

On a completely unrelated note, Blaise Aguera also mentioned a great comic on the nascent field of federated learning. For anyone not familiar with the term, it’s a cute introduction to the topic.

## Tackling Climate Change with Machine Learning

Among the workshops, this year’s “Tackling Climate Change with Machine Learning” was particularly interesting for us.

Jeff Dean from Google AI gave an interesting talk about using Bayesian inference for physical systems. One such application discussed was using TensorFlow Probability to help stabilize the plasma flows in a type of fusion reactor. Another topic was on using TPUs for simulating partial differential equations, such as hydraulic model simulation for flood forecast, and combining the results with neural nets for weather prediction. He also mentioned that Google data centers are “net zero-carbon” since 2017. However, this does not mean they consume no fossil fuels – they offset that consumption by producing renewable energy which is then made available to the grid – so there is still room for improvement.

In addition to this, Qinghu Tang discussed the use of semi-supervised learning to create a fine-grained mapping of the low voltage distribution grid using street view images. Their ambitious plan is to create “a comprehensive global database containing information on conventional and renewable energy systems ranging from supply, transmission, to demand-side.” This announcement is particularly exciting to those using cleaner energy sources to combat climate change, as this database would be largely useful for planning.

Earlier in 2019, at the ICML 2019 Workshop, we presented a poster in which we proposed an algorithm for reducing the computational time of solving Optimal Power Flow (OPF) problems. As a follow-up we presented another poster, in which we apply meta-optimisation in combination with a classifier that predicts the active constraints of the OPF problem. Our approach reduces the computational time significantly while still guaranteeing feasibility. A promising future direction is to combine it with the work of Frederik Diehl using graph neural networks (GNN) for predicting the solution of AC-OPF problems.

## Responsible use of computing resources

On the last day of the conference, the Deep Reinforcement Learning (RL) panel was asked about the responsible use of computing resources. The answers highlighted the many different stances on what can be a contentious matter.

Some showed a sense of owing the community something, realising that they are in a privileged position, and choosing wisely which computationally expensive experiments to run. It was also mentioned that there is a need to provide all parts of the community with access to extensive computing resources, instead of just a few privileged research teams in a handful of industry labs. However, the challenge lies in how to achieve that.

A different point of view is that we should not be convinced that all research requires large amounts of computing resources. Novel research can still be done with tabular environment reinforcement learning experiments, for instance. It is important to avoid bias towards compute-heavy research.

There have been some attempts at limiting the available resources, in hopes of sparking new research insights, but we still have to see the results of this initiative. Ultimately, there are cases in which the adoption of compute-heavy approaches might simply be unavoidable, such as for reinforcement learning in large continuous control tasks.

## Advances in Approximate Bayesian Inference (AABI) Workshop

One of the hidden gems of NeurIPS is the co-located workshop Advances in Approximate Bayesian Inference (AABI), which provides a venue for both established academics as well as more junior researchers to meet up and discuss all things Bayesian. This year the event was the largest to-date - with 120 accepted papers compared to just 40 last year - and consisted of 11 talks and two poster sessions, followed by a panel discussion.

Sergey Levine, from UC Berkeley discussed the mathematical connections between control and Bayesian inference (aka control-as-inference framework). More specifically, Levine presented an equivalence between maximum-entropy reinforcement learning (MaxEnt RL) and inference in a particular class of probabilistic graphical models. This highlighted the usefulness of finding a principled approach to the design of reward functions and addressing the issue of robustness in RL.

In the poster session, we presented GP-ALPS, our work on Bayesian model selection for linear multi-output Gaussian Process (GP) models, which find a parsimonious, yet descriptive model. This is done in an automated way by switching off latent processes that do not meaningfully contribute to explaining the data, using latent Bernoulli variables and approximate inference.

The day finished with a panel discussion that featured, among others, Radford Neal and James Hensman. The participants exchanged opinions on where Bayesian machine learning can add value in the age of big data and deep learning, for example, on tasks that require reasoning about uncertainty. They also discussed the trade-offs between modern Monte Carlo techniques and variational inference, as well as pointed out the current lack of standard benchmarks for uncertainty quantification.

## Dealing with real-world data

It was nice to see some papers that specifically looked at modelling with asynchronous and incomplete data, data available at different granularities, and dataset shift - all scenarios that arise in real-world practice. These interesting studies make a refreshing change from the more numerous papers on efficient inference in Bayesian learning and include some fresh, outside-the-box ideas.

GPs were widely applied in order to deal with real-world data issues. One eye-catching poster was Modelling dynamic functional connectivity with latent factor GPs. This paper made clever use of transformations of the approximated covariance matrix to Euclidean space to efficiently adapt covariance estimates. A paper that discussed the problem of missing data in time-series was Deep Amortized Variational Inference for Multivariate Time Series Imputation with Latent Gaussian Process Models. There were also papers that looked at aggregated data issues such as Multi-task learning for aggregated data using Gaussian processes, Spatially aggregated Gaussian processes with multivariate areal outputs, and Multi-resolution, multi-task Gaussian processes. The latter contained some fancy footwork to incorporate multi-resolution data in a principled way within both shallow and deep GP frameworks.

Naturally, there were also several unsupervised learning works on the subject. A highlight was Sparse Variational Inference: Bayesian coresets from scratch, presenting the integration of data compression via Bayesian coresets in SVI with a neat framework for the exponential family. Additionally, the paper discussing Multiway clustering via tensor block modules, presented an algorithm for clustering in high-dimensional feature spaces, with convergence and consistency properties set out clearly in the paper. Finally, we must mention Missing not at random in matrix completion: the effectiveness of estimating missingness probabilities under a low nuclear norm assumption, a paper that proposes a principled approach to filling in missing data by exploiting an assumption that the ‘missingness’ process has a low nuclear norm structure.

## Research engineering

Research Engineering is often overlooked in academic circles, even though this is a key role in most companies doing any amount of machine learning research. NeurIPS was a good venue for discussion with research engineers from several companies, which provided some interesting insights into how workflow varies across industry. We all had lots of things in common, such as figuring out how to match research engineers to research projects and trying to build support tools for everyday research tasks.

In many companies, the main output is often research (papers, not products), and having the same people who do novel research resulting in papers also work on putting machine learning into production seemed surprisingly rare. This choice can lead to the significant problem of having two different languages – for example, PyTorch for researchers and TensorFlow for production – which causes extra strain on research engineers who become responsible for translating code and dealing with language particularities. We are fortunate to avoid this kind of issue, both by enforcing high standards on code written by researchers, but more importantly having research code never too far from being able to run in production.

## Research Retrospectives

A new addition to the workshops this year was Retrospectives in Machine Learning . The core idea of a retrospective is to reflect on previous work and to discuss shortcomings in the initial intuition, analysis, or results. The goal of the workshop was to “improve the science, openness, and accessibility of the machine learning field, by widening what is publishable and helping to identify opportunities for improvement.” In a way, we can say that it discusses which things readers of important papers should know now, that were not in the original publication.

There were many great talks, with one of the most honest and direct discussions being David Duvenaud’s “Bullshit that I and others have said about Neural ODEs”. The talk discussed many insights on the science side, as well as around what happened after the Neural ODEs paper was published at NeurIPS 2018. An overarching theme of the talk is the sociology of the very human institutions of research. As an example, the talk suggests thinking about how science journalists have distinct incentives, how these interact with researcher/academic incentives, and how this can – even inadvertently – result in hype.

## Program transformations workshop

The program transformations workshop was primarily focused on automatic differentiation (AD), which has been a key element in the development of modern machine learning techniques.

The workshop was a good size, with a substantial portion of all people who are currently doing interesting things in AD, such as Jax, Swift4TF and ChainRules.jl in attendance. Also present were some of the fathers of AD, such as Laurent Hascoët and Andreas Griewank. Their general attitude towards recent work seemed to be that there have not been significant theoretical advances since the ‘80s. Still, there has been considerable progress toward making AD more useful and accessible.

One of the major topics was AD theory: the investigation of the mathematical basis of current methods. As a highlight, the Differentiable Curry presentation introduced a diagrammatic notation which made the type-theory in the original paper much clearer and more intuitive.

Various new tools were also presented, such as Jax, which made its first public appearance at a major conference. There have been many Jax-related discussions in the Julia community, and it seems to be the best of all Python JITs for differentiable programming. Also of note was Torchscript, a tool that allows running Pytorch outside of Python.

There were also more traditional AD talks, such as high-performance computing (HPC) AD in Fortran and using MPI. A nice exception was Jesse Bettencourt’s work on Taylor-Mode Automatic Differentiation for Higher-Order Derivatives, which is an implementation of Griewank and Walther (2008) chapter 13 in Jax.

## Conclusion

As usual, NeurIPS represents a great venue to stay connected with the newest advances and trends in the most diverse subfields of ML. Both academia and industry are largely represented, and that creates an unique forum. We look forward to joining the rest of the community again in the upcoming NeurIPS 2020 and invite everyone to stop by our booth and meet our team.

# The Emergent Features of JuliaLang: Part II - Traits

Author: Lyndon White

# Introduction

This blog post is based on a talk originally given at a Cambridge PyData Meetup, and also at a London Julia Users Meetup.

This is the second of a two-part series of posts on the Emergent Features of JuliaLang. That is to say, features that were not designed, but came into existence from a combination of other features. While Part I was more or less a grab-bag of interesting tricks, Part II (this post), is focused solely on traits, which are, in practice, much more useful than anything discussed in Part I. This post is fully indepedent from Part I.

# Traits

Inheritance can be a gun with only one bullet, at least in the case of single inheritance like Julia uses. Single-inheritance means that, once something has a super-type, there can’t be another. Traits are one way to get something similar to multiple inheritance under such conditions. They allow for a form of polymorphism, that is orthogonal to the inheritence tree, and can be implemented using functions of types. In an earlier blog post this is explained using a slightly different take.

Sometimes people claim that Julia doesn’t have traits, which is not correct. Julia does not have syntactic sugar for traits, nor does it have the ubiqutious use of traits that some languages feature. But it does have traits, and in fact they are even used in the standard library. In particular for iterators, IteratorSize and IteratorEltype, and for several other interfaces.

These are commonly called Holy traits, not out of religious glorification, but after Tim Holy. They were originally proposed to make StridedArrays extensible. Ironically, even though they are fairly well established today, StridedArray still does not use them. There are on-going efforts to add more traits to arrays, which one day will no doubt lead to powerful and general BLAS-type functionality.

There are a few ways to implement traits, though all are broadly similar. Here we will focus on the implementation based on concrete types. Similar things can be done with Type{<:SomeAbstractType} (ugly, but flexible), or even with values if they are of types that constant-fold (like Bool), particularly if you are happy to wrap them in Val when dispatching on them (an example of this will be shown later).

There are a few advantages to using traits:

• Can be defined after the type is declared (unlike a supertype).
• Don’t have to be created up-front, so types can be added later (unlike a Union).
• Otherwise-unrelated types (unlike a supertype) can be used.

### A motivating example from Python: the AsList function

In Python TensorFlow, _AsList is a helper function:

def _AsList(x):
return x if isinstance(x, (list, tuple)) else [x]


This converts scalars to single item lists, and is useful because it only needs code that deals with lists. This is not really idiomatic python code, but TensorFlow uses it in the wild. However, _AsList fails for numpy arrays:

>>> _AsList(np.asarray([1,2,3]))
[array([1, 2, 3])]


This can be fixed in the following way:

def _AsList(x):
return x if isinstance(x, (list, tuple, np.ndarray)) else [x]


But where will it end? What if other packages want to extend this? What about other functions that also depend on whether something is a list or not? The answer here is traits, which give the ability to mark types as having particular properties. In Julia, in particular, dispatch works on these traits. Often they will compile out of existence (due to static dispatch, during specialization).

Note that at some point we have to document what properties a trait requires (e.g. what methods must be implemented). There is no static type-checker enforcing this, so it may be helpful to write a test suite for anything that has a trait which checks that it works properly.

### The Parts of Traits

Traits have a few key parts:

• Trait types: the different traits a type can have.
• Trait function: what traits a type has.
• Trait dispatch: using the traits.

To understand how traits work, it is important to understand the type of types in Julia. Types are values, so they have a type themselves: DataType. However, they also have the special pseudo-supertype Type, so a type T acts like T<:Type{T}.

typeof(String) === DataType
String isa Type{String} === true
String isa Type{<:AbstractString} === true


We can dispatch on Type{T} and have it resolved at compile time.

#### Trait Type

This is the type that is used to attribute a particular trait. In the following example we will consider a trait that highlights the properties of a type for statistical modeling. This is similar to the MLJ ScientificTypes.jl, or the StatsModel schema.

abstract type StatQualia end

struct Continuous <: StatQualia end
struct Ordinal <: StatQualia end
struct Categorical <: StatQualia end
struct Normable <: StatQualia end


#### Trait Function

The trait function takes a type as input, and returns an instance of the trait type. We use the trait function to declare what traits a particular type has. For example, we can say things like floats are continuous, booleans are categorical, etc.

statqualia(::Type{<:AbstractFloat}) = Continuous()
statqualia(::Type{<:Integer}) = Ordinal()

statqualia(::Type{<:Bool}) = Categorical()
statqualia(::Type{<:AbstractString}) = Categorical()

statqualia(::Type{<:Complex}) = Normable()


#### Using Traits

To use a trait we need to re-dispatch upon it. This is where we take the type of an input, and invoke the trait function on it to get objects of the trait type, then dispatch on those.

In the following example we are going to define a bounds function, which will define some indication of the range of values a particular type has. It will be defined on a collection of objects with a particular trait, and it will be defined differently depending on which statqualia they have.

using LinearAlgebra

# This is the trait re-dispatch; get the trait from the type
bounds(xs::AbstractVector{T}) where T = bounds(statqualia(T), xs)

# These functions dispatch on the trait
bounds(::Categorical, xs) = unique(xs)
bounds(::Normable, xs) = maximum(norm.(xs))
bounds(::Union{Ordinal, Continuous}, xs) = extrema(xs)


Using the above:

julia> bounds([false, false, true])
2-element Array{Bool,1}:
false
true

julia> bounds([false, false, false])
1-element Array{Bool,1}:
false

julia> bounds([1,2,3,2])
(1, 3)

julia> bounds([1+1im, -2+4im, 0+-2im])
4.47213595499958


We can also extend traits after the fact: for example, if we want to add the property that vectors have norms defined, we could define:

julia> statqualia(::Type{<:AbstractVector}) = Normable()
statqualia (generic function with 6 methods)

julia> bounds([[1,1], [-2,4], [0,-2]])
4.47213595499958


### Back to AsList

First, we define the trait type and trait function:

struct List end
struct Nonlist end

islist(::Type{<:AbstractVector}) = List()
islist(::Type{<:Tuple}) = List()
islist(::Type{<:Number}) = Nonlist()


Then we define trait dispatch:

aslist(x::T) where T = aslist(islist(T), x)
aslist(::List, x) = x
aslist(::Nonlist, x) = [x]


This allows aslist to work as expected.

julia> aslist(1)
1-element Array{Int64,1}:
1

julia> aslist([1,2,3])
3-element Array{Int64,1}:
1
2
3

julia> aslist([1])
1-element Array{Int64,1}:
1


As discussed above this is fully extensible.

### Dynamic dispatch as fallback.

All the traits discussed so far have been fully-static, and they compile away. We can also write runtime code, at a small runtime cost. The following makes a runtime call to hasmethod to look up if the given type has an iterate method defined. (There are plans to make hasmethod compile time. But for now it can only be done at runtime.) Similar code to this can be used to dispatch on the values of objects.

islist(T) = hasmethod(iterate, Tuple{T}) ? List() : Nonlist()


We can see that this works on strings, as it does not wrap the following into an array.

julia> aslist("ABC")
"ABC"


### Traits on functions

We can also attach traits to functions, because functions are instances of singleton types, e.g. foo::typeof(foo). We can use this idea for declarative input transforms.

As an example, we could have different functions expect the arrangement of observations to be different. More specifically, different machine learning models might expect the inputs be:

• Iterator of Observations.
• Matrix with Observations in Rows.
• Matrix with Observations in Columns.

This isn’t a matter of personal perference or different field standards; there are good performance-related reasons to choose one the above options depending on what operations are needed. As a user of a model, however, we should not have to think about this.

The following examples use LIBSVM.jl, and DecisionTree.jl. In practice we can just use the MLJ interface instead, which takes care of this kind of thing.

In this demonstration of the simple use of a few ML libraries, first we need some basic functions to deal with our data. In particular we want to define get_true_classes which as it would naturally be written would expect an iterator of observations.

using Statistics
is_large(x) = mean(x) > 0.5
get_true_classes(xs) = map(is_large, xs)

inputs = rand(100, 1_000);  # 100 features, 1000 observations
labels = get_true_classes(eachcol(inputs));


For simplicity, we will simply test on our training data, which should be avoided in general (outside of simple validation that training is working).

The first library we will consider is LIBSVM, which expect the data to be a matrix with 1 observation per column.

using LIBSVM
svm = svmtrain(inputs, labels)

estimated_classes_svm, probs = svmpredict(svm, inputs)
mean(estimated_classes_svm .== labels)


Next, lets try DecisionTree.jl, which expects data to be a matrix with 1 observation per row.

using DecisionTree
tree = DecisionTreeClassifier(max_depth=10)
fit!(tree, permutedims(inputs), labels)

estimated_classes_tree = predict(tree, permutedims(inputs))
mean(estimated_classes_tree .== labels)


As we can see above, we had to know what each function needed, and use eachcol and permutedims to modify the data. The user should not need to remember these details; they should simply be encoded in the program.

### Traits to the Rescue

We will attach a trait to each function that needed to rearrange its inputs. There is a more sophisticated version of this, which also attaches traits to inputs saying how the observations are currently arranged, or lets the user specify. For simplicity, we will assume the data starts out as a matrix with 1 observations per column.

We are considering three possible ways a function might like its data to be arranged. Each of these will define a different trait type.

abstract type ObsArrangement end

struct IteratorOfObs <: ObsArrangement end
struct MatrixColsOfObs <: ObsArrangement end
struct MatrixRowsOfObs <: ObsArrangement end


We can encode this information about each function expectation into a trait, rather than force the user to look it up from the documentation.

# Our intial code:
obs_arrangement(::typeof(get_true_classes)) = IteratorOfObs()

# LIBSVM
obs_arrangement(::typeof(svmtrain)) = MatrixColsOfObs()
obs_arrangement(::typeof(svmpredict)) = MatrixColsOfObs()

# DecisionTree
obs_arrangement(::typeof(fit!)) = MatrixRowsOfObs()
obs_arrangement(::typeof(predict)) = MatrixRowsOfObs()


We are also going to attach some simple traits to the data types to say whether or not they contain observations. We will use value types for this, rather than fully declare the trait types, so we can just skip straight to declaring the trait functions:

# All matrices contain observations
isobs(::AbstractMatrix) = Val{true}()

# It must be iterator of vectors, else it doesn't contain observations
isobs(::T) where T = Val{eltype(T) isa AbstractVector}()


Next, we can define model_call: a function which uses the traits to decide how to rearrange the observations before calling the function, based on the type of the function and on the type of the argument.

function model_call(func, args...; kwargs...)
return func(maybe_organise_obs.(func, args)...; kwargs...)
end

# trait re-dispatch: don't rearrange things that are not observations
maybe_organise_obs(func, arg) = maybe_organise_obs(func, arg, isobs(arg))
maybe_organise_obs(func, arg, ::Val{false}) = arg
function maybe_organise_obs(func, arg, ::Val{true})
organise_obs(obs_arrangement(func), arg)
end

# The heavy lifting for rearranging the observations
organise_obs(::IteratorOfObs, obs_iter) = obs_iter
organise_obs(::MatrixColsOfObs, obsmat::AbstractMatrix) = obsmat

organise_obs(::IteratorOfObs, obsmat::AbstractMatrix) = eachcol(obsmat)
function organise_obs(::MatrixColsOfObs, obs_iter)
reduce(hcat, obs_iter)
end

function organise_obs(::MatrixRowsOfObs, obs)
permutedims(organise_obs(MatrixColsOfObs(), obs))
end


Now, rather than calling things directly, we can use model_call, which takes care of rearranging things. Notice that the code no longer needs to be aware of the particular cases for each library, which makes things much easier for the end-user: just use model_call and don’t worry about how the data is arranged.

inputs = rand(100, 1_000);  # 100 features, 1000 observations
labels = model_call(get_true_classes, inputs);

using LIBSVM
svm = model_call(svmtrain, inputs, labels)

estimated_classes_svm, probs = model_call(svmpredict, svm, inputs)
mean(estimated_classes_svm .== labels)

using DecisionTree
tree = DecisionTreeClassifier(max_depth=10)
model_call(fit!, tree, inputs, labels)

estimated_classes_tree = model_call(predict, tree, inputs)
mean(estimated_classes_tree .== labels)


# Conclusion: JuliaLang is not magic

In this series of posts we have seen a few examples of how certain features can give rise to other features. Unit syntax, Closure-based Objects, Contextual Compiler Passes, and Traits, all just fall out of the combination of other features. We have also seen that Types turn out to be very powerful, especially with multiple dispatch.

# The Emergent Features of JuliaLang: Part I

Author: Lyndon White

## Introduction

This blog post is based on a talk originally given at a Cambridge PyData Meetup, and also at a London Julia Users Meetup.

Julia is known to be a very expressive language with many interesting features. It is worth considering that some features were not planned, but simply emerged from combinations of other features. This post will describe how several interesting features are implemented:

1. Unit syntactic sugar,
2. Pseudo-OO objects with public/private methods,
3. Dynamic Source Tranformation / Custom Compiler Passes (Cassette),
4. Traits.

Some of these (1 and 4) should be used when appropriate, while others (2 and 3) are rarely appropriate, but are instructive.

Part I of this post (which you are reading now) will focus on the first three. Part II of this post is about traits, which deserve there own section because they are one of the most powerful and interesting Julia features. They emerge from types, multiple dispatch, and the ability to dispatch on the types themselves (rather than just instances). We will review both the common use of traits on types, and also the very interesting—but less common—use of traits on functions.

There are many other emergent features which are not discussed in this post. For example, creating a vector using Int[] is an overload of getindex, and constructors are overloads of ::Type{<:T}().

## Juxtaposition Multiplication: Convenient Syntax for Units

Using units in certain kinds of calculations can be very helpful for protecting against arithmetic mistakes. The package Unitful provides the ability to do calculations using units in a very natural way, and makes it easy to write things like “2 meters” as 2m. Here is an example of using Unitful units:

julia> using Unitful.DefaultSymbols

julia> 1m * 2m
2 m^2

julia> 10kg * 15m / 1s^2
150.0 kg m s^-2

julia> 150N == 10kg * 15m / 1s^2
true


How does this work? The answer is Juxtaposition Multiplication—a literal number placed before an expression results in multiplication:

julia> x = 0.5π
1.5707963267948966

julia> 2x
3.141592653589793

julia> 2sin(x)
2.0


To make this work, we need to overload multiplication to invoke the constructor of the unit type. Below is a simplified version of what goes on under the hood of Unitful.jl:

abstract type Unit<:Real end
struct Meter{T} <: Unit
val::T
end

Base.:*(x::Any, U::Type{<:Unit}) = U(x)


Here we are overloading multiplication with a unit subtype, not an instance of a unit subtype (Meter(2)), but with the subtype itself (Meter). This is what ::Type{<:Unit} refers to. We can see this if we write:

julia> 5Meter
Meter{Int64}(5)


This shows that we create a Meter object with val=5.

To get to a full units system, we then need to define methods for everything that numbers need to work with, such as addition and multiplication. The final result is units-style syntactic sugar.

## Closures give us “Classic Object-Oriented Programming”

First, it is important to emphasize: don’t do this in real Julia code. It is unidiomatic, and likely to hit edge-cases the compiler doesn’t optimize well (see for example the infamous closure boxing bug). This is all the more important because it often requires boxing (see below).

“Classic OO” has classes with member functions (methods) that can see all the fields and methods, but that outside the methods of the class, only public fields and methods can be seen. This idea was originally posted for Julia in one of Jeff Bezanson’s rare Stack Overflow posts, which uses a classic functional programming trick.

Let’s use closures to define a Duck type as follows:

function newDuck(name)
# Declare fields:
age=0

# Declare methods:
get_age() = age
inc_age() = age+=1

quack() = println("Quack!")
speak() = quack()

#Declare public:
()->(get_age, inc_age, speak, name)
end


This implements various things we would expect from “classic OO” encapsulation. We can construct an object and call public methods:

julia> bill_the_duck = newDuck("Bill")
#7 (generic function with 1 method)

julia> bill_the_duck.get_age()
0


Public methods can change private fields:

julia> bill_the_duck.inc_age()
1

julia> bill_the_duck.get_age()
1


Outside the class we can’t access private fields:

julia> bill_the_duck.age
ERROR: type ##7#12 has no field age


We also can’t access private methods:

julia> bill_the_duck.quack()
ERROR: type ##7#12 has no field quack


However, accessing their functionality via public methods works:

julia> bill_the_duck.speak()
Quack!


### How does this work?

Closures return singleton objects, with directly-referenced, closed variables as fields. All the public fields/methods are directly referenced, but the private fields (e.g age, quack) are not—they are closed over other methods that use them. We can see those private methods and fields via accessing the public method closures:

julia> bill_the_duck.speak.quack
(::var"#quack#10") (generic function with 1 method)

julia> bill_the_duck.speak.quack()
Quack!

julia> bill_the_duck.inc_age.age
Core.Box(1)


### An Aside on Boxing

The Box type is similar to the Ref type in function and purpose. Box is the type Julia uses for variables that are closed over, but which might be rebound to a new value. This is the case for primitives (like Int) which are rebound whenever they are incremented. It is important to be clear on the difference between mutating the contents of a variable, and rebinding that variable name.

julia> x = [1, 2, 3, 4];

julia> objectid(x)
0x79eedc509237c203

julia> x .= [10, 20, 30, 40];  # mutating contents

julia> objectid(x)
0x79eedc509237c203

julia> x = [100, 200, 300, 400];  # rebinding the variable name

julia> objectid(x)
0xfa2c022285c148ed


In closures, boxing applies only to rebinding, though the closure bug means Julia will sometimes over-eagerly box variables because it believes that they might be rebound. This has no bearing on what the code does, but it does impact performance.

While this kind of code itself should never be used (since Julia has a perfectly functional system for dispatch and works well without “Classic OO”-style encapsulation), knowing how closures work opens other opportunities to see how they can be used. In our ChainRules.jl project, we are considering the use of closures as callable named tuples as part of a difficult problem in extensibility and defaults.

## Cassette, etc.

Cassette is built around a notable Julia feature, which goes by several names: Custom Compiler Passes, Contextual Dispatch, Dynamically-scoped Macros or Dynamic Source Rewriting. This same trick is used in IRTools.jl, which is at the heart of the Zygote automatic differentiation package. This is an incredibly powerful and general feature, and was the result of a very specific Issue #21146 and very casual PR #22440 suggestion that it might be useful for one particular case. These describe everything about Cassette, but only in passing.

The Custom Compiler Pass feature is essential for:

One of the core uses of Cassette is the ability to effectively overload what it means to call a function. Call overloading is much more general than operator overloading, as it applies to every call (special-cased as appropriate), whereas operator overloading applies to just one function call and just one set of types.

To give a concrete example of how call overloading is more general, operator overloading/dispatch (multiple or otherwise) would allow us to to overload, for example, sin(::T) for different types T. This way sin(::DualNumber) could be specialized to be different from sin(::Float64), so that with DualNumber as input it could be set to calculate the nonreal part using the cos (which is the derivative of sin). This is exactly how DualNumbers operate. However, operator overloading can’t express the notion that, for all functions f, if the input type is DualNumber, that f should calculate the dual part using the deriviative of f. Call overloading allows for much more expressivity and massively simplifies the implementation of automatic differentiation.

The above written out in code:

function Cassette.overdub(::AutoDualCtx, f, d::DualNumber)
res = ChainRules.frule(f, val(d))
if res !== nothing
f_val, pushforward = res
f_diff = pushforward(partial(f))
return Dual(f_val, extern(f_diff))
else
return f(d)
end
end


That is just one of the more basic things that can be done with Cassette. Before we can explain how Cassette works, we need to understand how generated functions work, and before that we need to understand the different ways code is presented in Julia, as it is compiled.

### Layers of Representation

Julia has many representations of the code it moves through during compilation.

• Source code (CodeTracking.definition(String,....)),
• Abstract Syntax Tree (AST): (CodeTracking.definition(Expr, ...),
• Untyped Intermediate Represention (IR): @code_lowered,
• Typed Intermediate Represention: @code_typed,
• LLVM: @code_llvm,
• ASM: @code_native.

We can retrieve the different representations using the functions/macros indicated.

The Untyped IR is of particular interest as this is what is needed for Cassette. This is a linearization of the AST, with the following properties:

• Only 1 operation per statement (nested expressions get broken up);
• the return values for each statement are accessed as SSAValue(index);
• variables become Slots;
• control-flow becomes jumps (like Goto);
• function names become qualified as GlobalRef(mod, func).

Untyped IR is stored in a CodeInfo object. It is not particularly pleasent to write, though for a intermidate representation it is fairly readable. The particularly hard part is that expression return values are SSAValues which are defined by position. So if code is added, return values need to be renumbered. Core.Compiler, Cassette and IRTools define various helpers to make this easier, but for out simple example we don’t need them.

### Generated Functions

Generated functions take types as inputs and return the AST (Abstract Syntax Tree) for what code should run, based on information in the types. This is a kind of metaprogramming. Take for example a function f with input an N-dimensional array (type AbstractArray{T, N}). Then a generated function for f might construct code with N nested loops to process each element. It is then possible to generate code that only accesses the fields we want, which gives substantial performance improvements.

For a more detailed example lets consider merging NamedTuples. This could be done with a regular function:

function merge(a::NamedTuple{an}, b::NamedTuple{bn}) where {an, bn}
names = merge_names(an, bn)
types = merge_types(names, typeof(a), typeof(b))
NamedTuple{names,types}(map(n->getfield(sym_in(n, bn) ? b : a, n), names))
end


This checks at runtime what fields each NamedTuple has, in order to decide what will be in the merge. However, we actually know all this information based on the types alone, because a list of fields is stored as part of the type (that is the an and bn type-parameters.)

@generated function merge(a::NamedTuple{an}, b::NamedTuple{bn}) where {an, bn}
names = merge_names(an, bn)
types = merge_types(names, a, b)
vals = Any[ :(getfield($(sym_in(n, bn) ? :b : :a),$(QuoteNode(n)))) for n in names ]
# This condition is never hit, but at the moment
# the optimizer is not clever enough to split the union without it
Dy === nothing && return dest
(idx, state) = Dy
dest[idx] = Base.Cartesian.@ncall \$N getindex src j
Dy = iterate(D, state)
end
return dest
end
end


It is interesting that this now has speed within a factor of two of the following simplified version:

function _unsafe_getindex_modern!(dest::AbstractArray, src::AbstractArray, I::Vararg{Union{Real, AbstractArray}, N}) where N
@inbounds for (i, j) in zip(eachindex(dest), Iterators.product(I...))
dest[i] = src[j...]
end
return dest
end


In Julia 0.6, this was five times slower than the highly-optimized version above!

It turns out that the type parameter N matters a lot. Removing this explicit type parameter causes performance to degrade by an order of magnitude. While Julia generally specializes on the types of the arguments passed to a method, there are a few cases in which Julia will avoid that specialization unless an explicit type parameter is added: the three main cases are Function, Type, and as in the example above, Vararg arguments.

## Some of our other favourite talks

Here are some of our other favourite talks not discussed above:

It is always impressive how much is covered every JuliaCon, considering its size. It serves to show both how fast the community is growing and how versatile the language is. We look forward for another one in 2020, even bigger and broader.