NeurIPS 2019 in Review24 Jan 2020
In early December 2019, part of the Invenia team headed to Vancouver, Canada, for the annual Neural Information Processing Systems (NeurIPS, previously NIPS) conference. It was the biggest NeurIPS conference yet, with an estimated 13,000 attendees, 6,743 paper submissions, and almost 80 official meetups.
We sent developers, researchers, and people operation representatives from both Cambridge and Winnipeg to attend talks, meet people, and run our booth at the expo. Below are some of our highlights from the week.
Yoshua Bengio and Blaise Aguera y Arcas Keynotes
The keynote talks by Blaise Aguera y Arcas and Yoshua Bengio offered interesting perspectives and discussed a new, or arguably old, direction for the learning community. Bengio outlined how current deep learning is excelling with “system 1 learning”, tasks that we as humans do quickly and unconsciously. At the same time, Aguera y Arcas discussed AI’s superhuman performance in such tasks. Punctuating the message of both talks, Aguera y Arcas described just how little territory this covers if we are interested in building a more human-like intelligence.
Current learning methods are lacking in what Bengio refers to as “system 2 learning” - slow, conscious, and logical problem-solving. He underlined some of the missing pieces of current deep learning, such as better out-of-distribution generalisation, high-level semantic representation, and an understanding of causality. The notion that state-of-the-art methods may be inadequate seemed even clearer when Aguera y Arcas questioned whether natural intelligence ever performs optimisation. He highlighted that “evolution decides on what is good and bad” rather than having some predefined loss to optimise.
It’s interesting to see that, even after the many recent advances in deep learning, we are still at a loss when it comes to replicating a natural form of intelligence – which was the initial motivation for neural networks. Both speakers addressed meta-learning, a prominent research area at NeurIPS this year, as a step in the right direction. However, the concluding message from Aguera y Arcas was that, to build natural intelligence, we may need to explore brains with evolved architectures, cellular automata, and social or cooperative learning among agents.
On a completely unrelated note, Blaise Aguera also mentioned a great comic on the nascent field of federated learning. For anyone not familiar with the term, it’s a cute introduction to the topic.
Tackling Climate Change with Machine Learning
Among the workshops, this year’s “Tackling Climate Change with Machine Learning” was particularly interesting for us.
Jeff Dean from Google AI gave an interesting talk about using Bayesian inference for physical systems. One such application discussed was using TensorFlow Probability to help stabilize the plasma flows in a type of fusion reactor. Another topic was on using TPUs for simulating partial differential equations, such as hydraulic model simulation for flood forecast, and combining the results with neural nets for weather prediction. He also mentioned that Google data centers are “net zero-carbon” since 2017. However, this does not mean they consume no fossil fuels – they offset that consumption by producing renewable energy which is then made available to the grid – so there is still room for improvement.
In addition to this, Qinghu Tang discussed the use of semi-supervised learning to create a fine-grained mapping of the low voltage distribution grid using street view images. Their ambitious plan is to create “a comprehensive global database containing information on conventional and renewable energy systems ranging from supply, transmission, to demand-side.” This announcement is particularly exciting to those using cleaner energy sources to combat climate change, as this database would be largely useful for planning.
Earlier in 2019, at the ICML 2019 Workshop, we presented a poster in which we proposed an algorithm for reducing the computational time of solving Optimal Power Flow (OPF) problems. As a follow-up we presented another poster, in which we apply meta-optimisation in combination with a classifier that predicts the active constraints of the OPF problem. Our approach reduces the computational time significantly while still guaranteeing feasibility. A promising future direction is to combine it with the work of Frederik Diehl using graph neural networks (GNN) for predicting the solution of AC-OPF problems.
Responsible use of computing resources
On the last day of the conference, the Deep Reinforcement Learning (RL) panel was asked about the responsible use of computing resources. The answers highlighted the many different stances on what can be a contentious matter.
Some showed a sense of owing the community something, realising that they are in a privileged position, and choosing wisely which computationally expensive experiments to run. It was also mentioned that there is a need to provide all parts of the community with access to extensive computing resources, instead of just a few privileged research teams in a handful of industry labs. However, the challenge lies in how to achieve that.
A different point of view is that we should not be convinced that all research requires large amounts of computing resources. Novel research can still be done with tabular environment reinforcement learning experiments, for instance. It is important to avoid bias towards compute-heavy research.
There have been some attempts at limiting the available resources, in hopes of sparking new research insights, but we still have to see the results of this initiative. Ultimately, there are cases in which the adoption of compute-heavy approaches might simply be unavoidable, such as for reinforcement learning in large continuous control tasks.
Advances in Approximate Bayesian Inference (AABI) Workshop
One of the hidden gems of NeurIPS is the co-located workshop Advances in Approximate Bayesian Inference (AABI), which provides a venue for both established academics as well as more junior researchers to meet up and discuss all things Bayesian. This year the event was the largest to-date - with 120 accepted papers compared to just 40 last year - and consisted of 11 talks and two poster sessions, followed by a panel discussion.
Sergey Levine, from UC Berkeley discussed the mathematical connections between control and Bayesian inference (aka control-as-inference framework). More specifically, Levine presented an equivalence between maximum-entropy reinforcement learning (MaxEnt RL) and inference in a particular class of probabilistic graphical models. This highlighted the usefulness of finding a principled approach to the design of reward functions and addressing the issue of robustness in RL.
In the poster session, we presented GP-ALPS, our work on Bayesian model selection for linear multi-output Gaussian Process (GP) models, which find a parsimonious, yet descriptive model. This is done in an automated way by switching off latent processes that do not meaningfully contribute to explaining the data, using latent Bernoulli variables and approximate inference.
The day finished with a panel discussion that featured, among others, Radford Neal and James Hensman. The participants exchanged opinions on where Bayesian machine learning can add value in the age of big data and deep learning, for example, on tasks that require reasoning about uncertainty. They also discussed the trade-offs between modern Monte Carlo techniques and variational inference, as well as pointed out the current lack of standard benchmarks for uncertainty quantification.
Dealing with real-world data
It was nice to see some papers that specifically looked at modelling with asynchronous and incomplete data, data available at different granularities, and dataset shift - all scenarios that arise in real-world practice. These interesting studies make a refreshing change from the more numerous papers on efficient inference in Bayesian learning and include some fresh, outside-the-box ideas.
GPs were widely applied in order to deal with real-world data issues. One eye-catching poster was Modelling dynamic functional connectivity with latent factor GPs. This paper made clever use of transformations of the approximated covariance matrix to Euclidean space to efficiently adapt covariance estimates. A paper that discussed the problem of missing data in time-series was Deep Amortized Variational Inference for Multivariate Time Series Imputation with Latent Gaussian Process Models. There were also papers that looked at aggregated data issues such as Multi-task learning for aggregated data using Gaussian processes, Spatially aggregated Gaussian processes with multivariate areal outputs, and Multi-resolution, multi-task Gaussian processes. The latter contained some fancy footwork to incorporate multi-resolution data in a principled way within both shallow and deep GP frameworks.
Naturally, there were also several unsupervised learning works on the subject. A highlight was Sparse Variational Inference: Bayesian coresets from scratch, presenting the integration of data compression via Bayesian coresets in SVI with a neat framework for the exponential family. Additionally, the paper discussing Multiway clustering via tensor block modules, presented an algorithm for clustering in high-dimensional feature spaces, with convergence and consistency properties set out clearly in the paper. Finally, we must mention Missing not at random in matrix completion: the effectiveness of estimating missingness probabilities under a low nuclear norm assumption, a paper that proposes a principled approach to filling in missing data by exploiting an assumption that the ‘missingness’ process has a low nuclear norm structure.
Research Engineering is often overlooked in academic circles, even though this is a key role in most companies doing any amount of machine learning research. NeurIPS was a good venue for discussion with research engineers from several companies, which provided some interesting insights into how workflow varies across industry. We all had lots of things in common, such as figuring out how to match research engineers to research projects and trying to build support tools for everyday research tasks.
In many companies, the main output is often research (papers, not products), and having the same people who do novel research resulting in papers also work on putting machine learning into production seemed surprisingly rare. This choice can lead to the significant problem of having two different languages – for example, PyTorch for researchers and TensorFlow for production – which causes extra strain on research engineers who become responsible for translating code and dealing with language particularities. We are fortunate to avoid this kind of issue, both by enforcing high standards on code written by researchers, but more importantly having research code never too far from being able to run in production.
A new addition to the workshops this year was Retrospectives in Machine Learning . The core idea of a retrospective is to reflect on previous work and to discuss shortcomings in the initial intuition, analysis, or results. The goal of the workshop was to “improve the science, openness, and accessibility of the machine learning field, by widening what is publishable and helping to identify opportunities for improvement.” In a way, we can say that it discusses which things readers of important papers should know now, that were not in the original publication.
There were many great talks, with one of the most honest and direct discussions being David Duvenaud’s “Bullshit that I and others have said about Neural ODEs”. The talk discussed many insights on the science side, as well as around what happened after the Neural ODEs paper was published at NeurIPS 2018. An overarching theme of the talk is the sociology of the very human institutions of research. As an example, the talk suggests thinking about how science journalists have distinct incentives, how these interact with researcher/academic incentives, and how this can – even inadvertently – result in hype.
Program transformations workshop
The program transformations workshop was primarily focused on automatic differentiation (AD), which has been a key element in the development of modern machine learning techniques.
The workshop was a good size, with a substantial portion of all people who are currently doing interesting things in AD, such as Jax, Swift4TF and ChainRules.jl in attendance. Also present were some of the fathers of AD, such as Laurent Hascoët and Andreas Griewank. Their general attitude towards recent work seemed to be that there have not been significant theoretical advances since the ‘80s. Still, there has been considerable progress toward making AD more useful and accessible.
One of the major topics was AD theory: the investigation of the mathematical basis of current methods. As a highlight, the Differentiable Curry presentation introduced a diagrammatic notation which made the type-theory in the original paper much clearer and more intuitive.
Various new tools were also presented, such as Jax, which made its first public appearance at a major conference. There have been many Jax-related discussions in the Julia community, and it seems to be the best of all Python JITs for differentiable programming. Also of note was Torchscript, a tool that allows running Pytorch outside of Python.
There were also more traditional AD talks, such as high-performance computing (HPC) AD in Fortran and using MPI. A nice exception was Jesse Bettencourt’s work on Taylor-Mode Automatic Differentiation for Higher-Order Derivatives, which is an implementation of Griewank and Walther (2008) chapter 13 in Jax.
As usual, NeurIPS represents a great venue to stay connected with the newest advances and trends in the most diverse subfields of ML. Both academia and industry are largely represented, and that creates an unique forum. We look forward to joining the rest of the community again in the upcoming NeurIPS 2020 and invite everyone to stop by our booth and meet our team.