Jekyll2021-05-10T12:08:13+00:00https://invenia.github.io/blog/feed.xmlInvenia BlogBlogging About Electricity Grids, Julia, and Machine LearningInveniaScaling multi-output Gaussian process models with exact inference2021-03-19T00:00:00+00:002021-03-19T00:00:00+00:00https://invenia.github.io/blog/2021/03/19/OILMM-pt2<p>In our <a href="/blog/2021/02/19/OILMM-pt1/">previous post</a>, we explained that multi-output Gaussian processes (MOGPs) are not fundamentally different from their single-output counterparts. We also introduced the <em>Mixing Model Hierarchy</em> (MMH), which is a broad class of MOGPs that covers several popular and powerful models from the literature. In this post, we will take a closer look at the central model from the MMH, the <em>Instantaneous Linear Mixing Model</em> (ILMM). We will discuss the linear algebra tricks that make inference in this model much cheaper than for general MOGPs. Then, we will take an alternative and more intuitive approach and use it as motivation for a yet better scaling model, the <em><a href="http://proceedings.mlr.press/v119/bruinsma20a.html">Orthogonal Instantaneous Linear Mixing Model (OILMM)</a></em>. Like most linear MOGPs, the OILMM represents data in a smaller-dimensional subspace; but contrary to most linear MOGPs, in practice the OILMM scales <em>linearly</em> with the dimensionality of this subspace, retaining exact inference.</p>
<p>Check out our <a href="/blog/2021/02/19/OILMM-pt1/">previous post</a> for a brief intro to MOGPs. We also recommend checking out the definition of the MMH from that post, but this post should be self-sufficient for those familiar with MOGPs. Some familiarity with linear algebra is also assumed. We start with the definition of the ILMM.</p>
<h2 id="the-instantaneous-linear-mixing-model">The Instantaneous Linear Mixing Model</h2>
<p>Some data sets can exhibit a low-dimensional structure, where the data set presents an <a href="https://en.wikipedia.org/wiki/Intrinsic_dimension">intrinsic dimensionality</a> that is significantly lower than the dimensionality of the data. Imagine a data set consisting of coordinates in a 3D space. If the points in this data set form a single straight line, then the data is intrinsically one-dimensional, because each point can be represented by a single number once the supporting line is known. Mathematically, we say that the data lives in a lower dimensional (linear) subspace.</p>
<p>While this lower-dimensionality property may not be exactly true for most real data sets, many large real data sets frequently exhibit approximately low-dimensional structure, as discussed by <a href="https://epubs.siam.org/doi/pdf/10.1137/18M1183480">Udell & Townsend (2019)</a>. In such cases, we can represent the data in a lower-dimensional space without losing a significant part of the information. There is a large field in statistics dedicated to identifying suitable lower-dimensional representations of data (e.g. <a href="https://cseweb.ucsd.edu/~saul/papers/sde_cvpr04.pdf">Weinberger & Saul (2004)</a>) and assessing their quality (e.g. <a href="https://ieeexplore.ieee.org/abstract/document/8017645">Xia <em>et al.</em> (2017)</a>). These <a href="https://en.wikipedia.org/wiki/Dimensionality_reduction">dimensionality reduction</a> techniques play an important role in computer vision (see <a href="https://ieeexplore.ieee.org/document/1177153">Basri & Jacobs (2003)</a> for an example), and in other fields (a <a href="https://www.sciencedirect.com/science/article/abs/pii/S0005109807003950">paper by Markovsky (2008)</a> contains an overview).</p>
<p>The Instantaneous Linear Mixing Model (ILMM) is a simple model that appears in many fields, e.g. machine learning (<a href="https://papers.nips.cc/paper/2007/file/66368270ffd51418ec58bd793f2d9b1b-Paper.pdf">Bonilla <em>et al.</em> (2007)</a> and <a href="https://arxiv.org/abs/1702.08530">Dezfouli <em>et al.</em> (2017)</a>), signal processing (<a href="https://ieeexplore.ieee.org/abstract/document/4505467">Osborne <em>et al.</em> (2008)</a>), and geostatistics (<a href="https://www.researchgate.net/publication/224839861_Geostatistics_for_Natural_Resource_Evaluation">Goovaerts (1997)</a>). The model represents data in a lower-dimensional linear subspace—not unlike <a href="https://en.wikipedia.org/wiki/Principal_component_analysis">PCA</a>— which implies the model’s covariance matrix is low-rank. As we will discuss in the next sections, this low-rank structure can be exploited for efficient inference. In the ILMM, the observations are described as a linear combination of <em>latent processes</em> (i.e. unobserved stochastic processes), which are modelled as single-output GPs.</p>
<p>If we denote our observations as \(y\), the ILMM models the data according to the following generative model: \(y(t) \cond x, H = Hx(t) + \epsilon\). Here, \(y(t) \cond x, H\) is used to denote the value of \(y(t)\) given a known \(H\) and \(x(t)\), \(H\) is a matrix of weights, which we call the <em>mixing matrix</em>, \(\epsilon\) is Gaussian noise, and \(x(t)\) represents the (time-dependent) latent processes, described as independent GPs—note that we use \(x\) here to denote an unobserved (latent) stochastic process, not an input, which is represented by \(t\). Using an ILMM with \(m\) latent processes to model \(p\) outputs, \(y(t)\) is \(p\)-dimensional, \(x(t)\) is \(m\)-dimensional, and \(H\) is a \(p \times m\) matrix. Since the latent processes are GPs, and <a href="https://www.statlect.com/probability-distributions/normal-distribution-linear-combinations">Gaussian random variables are closed under linear combinations</a>, \(y(t)\) is also a GP. This means that the usual closed-form <a href="https://en.wikipedia.org/wiki/Gaussian_process#Gaussian_process_prediction,_or_Kriging">formulae for inference</a> in GPs can be used. However, naively computing these formulae is not computationally efficient, because a large covariance matrix has to be inverted, which is a significant bottleneck. In the next section we discuss tricks to speed up this inversion.</p>
<h2 id="leveraging-the-matrix-inversion-lemma">Leveraging the matrix inversion lemma</h2>
<p>When working with GPs, the computational bottleneck is typically the inversion of the covariance matrix over the training input locations. For the case of the ILMM, observing \(p\) outputs, each at \(n\) input locations, we require the inversion of an \(np \times np\) matrix. This operation quickly becomes computationally intractable. What sets the ILMM apart from the general MOGP case is that the covariance matrices generated via an ILMM have some structure that can be exploited.</p>
<p>There is a useful and widely used result from linear algebra that allows us to exploit this structure, known as the <a href="https://en.wikipedia.org/wiki/Woodbury_matrix_identity">matrix inversion lemma</a> (also known as the Sherman–Morrison–Woodbury formula, or simply as the Woodbury matrix formula). This lemma comes in handy whenever we want to invert a matrix that can be written as the sum of a low-rank matrix and a diagonal one.<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup> Whereas typically the inversion of such matrices scales with the size of the matrix, the lemma cleverly allows the inversion operation to scale with the rank of the low-rank matrix instead. Therefore, if the rank of the low-rank matrix is much smaller than the size of the matrix, the lemma can enable significant computational speed-ups. We can show that the covariance of the ILMM can be written as a sum of a low-rank and diagonal matrix, which means that the matrix inversion lemma can be applied.</p>
<p>For an ILMM that has \(n\) observations for each of \(p\) outputs and uses \(m\) latent processes, the covariance matrix has size \(np \times np\), but the low-rank part has rank \(nm \times nm\). Thus, by choosing an \(m\) that is smaller than \(p\) and using the matrix inversion lemma, we can effectively decrease the memory and time costs associated with the matrix inversion.</p>
<p>This is not the first time we leverage the matrix inversion lemma to make computations more efficient, see <a href="/blog/2021/01/19/linear-models-with-stheno-and-jax/#making-inference-fast">our post on linear models from a GP point of view</a> for another example. The ubiquity of models that represent data in lower-dimensional linear subspaces makes the use of this lemma widespread. However, this approach requires careful application of linear algebra tricks, and obfuscates the <em>reason</em> why it is even possible to get such a speed-up. In the next section we show an alternative view, which is more intuitive and leads to the same performance improvements.</p>
<h2 id="an-alternative-formulation">An alternative formulation</h2>
<p>Instead of focusing on linear algebra tricks, let’s try to understand <em>why</em> we can even reduce the complexity of the inversion. The ILMM, as a general GP, scales poorly with the number of observations; and the larger the number of outputs, the larger the number of observations. However, the ILMM makes the modelling assumption that the observations can be represented in a lower-dimensional space. Intuitively, that means that every observation contains a lot of redundant information, because it can be summarised by a much lower-dimensional representation. If it were possible to somehow extract these lower-dimensional representations and use them as observations instead, then that could lead to very appealing computational gains. The challenge here is that we don’t have access to the lower-dimensional representations, but we can try to estimate them.</p>
<p>Recall that the ILMM is a probabilistic model that connects every observation \(y(t)\) to a set of latent, unobserved variables \(x(t)\), defining the lower-dimensional representation. It is this lower-dimensional \(x(t)\) that we want to estimate. A natural choice is to find the <a href="https://en.wikipedia.org/wiki/Maximum_likelihood_estimation">maximum likelihood estimate</a> (MLE) \(T(y(t))\) of \(x(t)\) given the observations \(y(t)\):</p>
\[\begin{equation}
T(y(t)) = \underset{x(t)}{\mathrm{argmax}} \, p(y(t) \cond x(t)).
\end{equation}\]
<p>The solution to the equation above is \(T = (H^\top \Sigma^{-1} H)^{-1} H^\top \Sigma^{-1}\) (see prop. 2 of appendix D from <a href="http://proceedings.mlr.press/v119/bruinsma20a.html">Bruinsma <em>et al.</em> (2020)</a>), where \(H\) is the mixing matrix, and \(\Sigma\) is the noise covariance.</p>
<p>The advantage of working with \(Ty(t)\) instead of with \(y(t)\) is that \(y(t)\) comprises \(p\) outputs, while \(Ty(t)\) presents only \(m\) outputs. Thus, conditioning on \(Ty(t)\) is computationally cheaper because \(m < p\). When conditioning on \(n\) observations, this approach brings the memory cost from \(\mathcal{O}(n^2p^2)\) to \(\mathcal{O}(n^2m^2)\), and the time cost from \(\mathcal{O}(n^3p^3)\) to \(\mathcal{O}(n^3m^3)\).<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup> These savings are identical to those we get by using the matrix inversion lemma, as discussed in the previous section.</p>
<p>In general, we cannot arbitrarily transform observations and use the resulting transformed data as observations instead. We must show that our proposed procedure is valid, that conditioning on \(Ty(t)\) is equivalent to conditioning on \(y(t)\). We do this showing that (see <a href="http://proceedings.mlr.press/v119/bruinsma20a.html">Bruinsma <em>et al.</em> (2020)</a>),<sup id="fnref:3" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">3</a></sup> under the ILMM, \(Ty(t)\) is a <em><a href="https://en.wikipedia.org/wiki/Sufficient_statistic">sufficient statistic</a></em> for \(x(t)\) given \(y\)—and this is only possible because of the particular structure of the ILMM.</p>
<p>A sufficient statistic \(T(y)\) is a function of the data which is defined with respect to a probabilistic model, \(p(y \cond \theta)\), and an associated unknown parameter \(\theta\). When a statistic is sufficient, it means that computing it over a set of observations, \(y\), provides us with all the information about \(\theta\) that can be extracted from those observations (under that probabilistic model). Thus, there is no other quantity we can compute over \(y\) that will increase our knowledge of \(\theta\). Formally, that is to say that \(p(\theta \cond y) = p(\theta \cond T(y))\). For the ILMM, as \(Ty(t)\) is a sufficient statistic for \(x(t)\), we have that \(p(x(t) \cond y(t)) = p(x(t) \cond Ty(t))\). This property guarantees that the procedure of conditioning the model on the summary of the observations is mathematically valid.</p>
<p>The choice of \(m < p\) is exactly the choice of imposing a low-rank structure onto the model; and the lower the rank (controlled by \(m\), the number of latent processes), the more parsimonious is \(Ty\), the summary of our observations.</p>
<p>Besides being more intuitive, this approach based on the sufficient statistic makes it easy to see how introducing a simple constraint over the mixing matrix \(H\) will allow us to scale the model even further, and obtain linear scaling on the number of latent processes, \(m\).<sup id="fnref:4" role="doc-noteref"><a href="#fn:4" class="footnote" rel="footnote">4</a></sup> That’s what we discuss next.</p>
<h2 id="the-orthogonal-instantaneous-linear-mixing-model">The Orthogonal Instantaneous Linear Mixing Model</h2>
<p>Although the ILMM scales much better than a general MOGP, the ILMM still has a cubic (quadratic) scaling on \(m\) for time (memory), which quickly becomes intractable for large systems. We therefore want to identify a subset of the ILMMs that scale even more favourably. We show that a very simple restriction over the mixing matrix \(H\) can lead to linear scaling on \(m\) for both time and memory. We call this model the <em>Orthogonal Instantaneous Linear Mixing Model</em> (OILMM). The plots below show a comparison of the time and memory scaling for the ILMM vs the OILMM as \(m\) grows.</p>
<p><img src="/blog/public/images/oilmm2_scaling.png" alt="OILMM_scaling" />
Figure 1: Runtime (left) and memory usage (right) of the ILMM and OILMM for computing the evidence of \(n = 1500\) observations for \(p = 200\) outputs.</p>
<p>Let’s return to the definition of the ILMM, as discussed in the first section: \(y(t) \cond x(t), H = Hx(t) + \epsilon(t)\). Because \(\epsilon(t)\) is Gaussian noise, we can write that \(y(t) \cond x(t), H \sim \mathrm{GP}(Hx(t), \delta_{tt'}\Sigma)\), where \(\delta_{tt'}\) is the <a href="https://en.wikipedia.org/wiki/Kronecker_delta">Kronecker delta</a>. Because we know that \(Ty(t)\) is a sufficient statistic for \(x(t)\) under this model, we know that \(p(x \cond y(t)) = p(x \cond Ty(t))\). Then what is the distribution of \(Ty(t)\)? A simple calculation gives \(Ty(t) \cond x(t), H \sim \mathrm{GP}(THx(t), \delta_{tt'}T\Sigma T^\top)\). The crucial thing to notice here is that the summarised observations \(Ty(t)\) <em>only</em> couple the latent processes \(x\) via the noise matrix \(T\Sigma T^\top\). If that matrix were diagonal, then observations would not couple the latent processes, and we could condition each of them individually, which is <em>much</em> more computationally efficient.</p>
<p>This is the insight for the <em>Orthogonal Instantaneous Linear Mixing Model</em> (OILMM). If we let \(\Sigma = \sigma^2I_p\), then it can be shown that \(T\Sigma T^\top\) is diagonal if and only if the columns of \(H\) are orthogonal (Prop. 6 from the paper by <a href="http://proceedings.mlr.press/v119/bruinsma20a.html">Bruinsma <em>et al.</em> (2020)</a>), which means that \(H\) can be written as \(H = US^{1/2}\), with \(U\) a matrix of orthonormal columns, and \(S > 0\) diagonal. Because the columns of \(H\) are orthogonal, we name the OILMM <em>orthogonal</em>. In summary: by restricting the columns of the mixing matrix in an ILMM to be orthogonal, we make it possible to <strong>treat each latent process as an independent, single-output GP problem</strong>.</p>
<p>The result actually is a bit more general: we can allow any observation noise of the form \(\sigma^2I_p + H D H^\top\), with \(D>0\) diagonal. Thus, it is possible to have a non-diagonal noise matrix, i.e., noise that is correlated across different outputs, and still be able to decouple the latent processes and retain all computational gains from the OILMM (which we discuss next).</p>
<p>Computationally, the OILMM approach allows us to go from a cost of \(\mathcal{O}(n^3m^3)\) in time and \(\mathcal{O}(n^2m^2)\) in memory, for a regular ILMM, to \(\mathcal{O}(n^3m)\) in time and \(\mathcal{O}(n^2m)\) in memory. This is because now the problem reduces to \(m\) independent single-output GP problems.<sup id="fnref:5" role="doc-noteref"><a href="#fn:5" class="footnote" rel="footnote">5</a></sup> The figure below explains the inference process under the ILMM and the OILMM, with the possible paths and the associated costs.</p>
<p><img src="/blog/public/images/oilmm2_inference.png" alt="inference_process" />
Figure 2: Commutative diagrams depicting that conditioning on \(Y\) in the ILMM (left) and OILMM (right) is equivalent to conditioning respectively on \(TY\) and independently every \(x_i\) on \((TY)_{i:}\), but yield different computational complexities. The reconstruction costs assume computation of the marginals.</p>
<p>If we take the view of an (O)ILMM representing every data point \(y(t)\) via a set of basis vectors \(h_1, \ldots, h_m\) (the columns of the mixing matrix) and a set of time-dependent coefficients \(x_1(t), \ldots, x_m(t)\) (the latent processes), the difference between an ILMM and an OILMM is that in the latter the coordinate system is chosen to be orthogonal, as is common practice in most fields. This insight is illustrated below.</p>
<p><img src="/blog/public/images/oilmm2_basis.png" alt="basis_sets" style="width:400px;margin:0 auto 0 auto;" />
Figure 3: Illustration of the difference between the ILMM and OILMM. The trajectory of a particle (dashed line) in two dimensions is modelled by the ILMM (blue) and OILMM (orange). The noise-free position \(f(t)\) is modelled as a linear combination of basis vectors \(h_1\) and \(h_2\) with coefficients \(x_1(t)\) and \(x_2(t)\) (two independent GPs). In the OILMM, the basis vectors \(h_1\) and \(h_2\) are constrained to be orthogonal; in the ILMM, \(h_1\) and \(h_2\) are unconstrained.</p>
<p>Another important difference between a general ILMM and an OILMM is that, while in both cases the latent processes are independent <em>a priori</em>, only for an OILMM they remain so <em>a posteriori</em>. Besides the computational gains we already mentioned, this property also improves interpretability as the posterior latent marginal distributions can be inspected (and plotted) independently. In comparison, inspecting only the marginal distributions in a general ILMM would neglect the correlations between them, obscuring the interpretation.</p>
<p>Finally, the fact that an OILMM problem is just really a set of single-output GP problems makes the OILMM immediately compatible with any single-output GP approach. This allows us to trivially use powerful approaches, like sparse GPs (as detailed in the paper by <a href="http://proceedings.mlr.press/v5/titsias09a/titsias09a.pdf">Titsias (2009)</a>), or state-space approximations (as presented in the freely available book by <a href="https://users.aalto.fi/~asolin/sde-book/sde-book.pdf">Särkkä & Solin</a>), for scaling to extremely large data sets. We have illustrated this by using the OILMM, combined with state-space approximations, to model 70 million data points (see <a href="http://proceedings.mlr.press/v119/bruinsma20a.html">Bruinsma <em>et al.</em> (2020)</a> for details).</p>
<h2 id="conclusion">Conclusion</h2>
<p>In this post we have taken a deeper look into the <em>Instantaneous Linear Mixing Model</em> (ILMM), a widely used multi-output GP (MOGP) model which stands at the base of the <em>Mixing Model Hierarchy</em> (MMH)—which was described in detail in our <a href="/blog/2021/02/19/OILMM-pt1/">previous post</a>. We discussed how the <em>matrix inversion lemma</em> can be used to make computations much more efficient. We then showed an alternative but equivalent (and more intuitive) view based on a <em>sufficient statistic</em> for the model. This alternative view gives us a better understanding on <em>why</em> and <em>how</em> these computational gains are possible.</p>
<p>From the sufficient statistic formulation of the ILMM we showed how a simple constraint over one of the model parameters can decouple the MOGP problem into a set of independent single-output GP problems, greatly improving scalability. We call this model the <em>Orthogonal Instantaneous Linear Mixing Model</em> (OILMM), a subset of the ILMM.</p>
<p>In the next and last post in this series, we will discuss implementation details of the OILMM and show some of our implementations in Julia and in Python.</p>
<!-- Footnotes themselves at the bottom. -->
<h2 id="notes">Notes</h2>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:1" role="doc-endnote">
<p>The lemma can also be useful in case the matrix can be written as the sum of a low-rank matrix and a <em>block</em>-diagonal one. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:2" role="doc-endnote">
<p>It is true that computing the \(Ty\) also has an associated cost. This cost is of \(\mathcal{O}(nmp)\) in time and \(\mathcal{O}(mp)\) in memory.
These costs are usually dominated by the others, as the number of observations \(n\) tends to be much larger than the number of outputs \(p\). <a href="#fnref:2" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:3" role="doc-endnote">
<p>See prop. 3 of the paper by <a href="http://proceedings.mlr.press/v119/bruinsma20a.html">Bruinsma <em>et al.</em> (2020)</a>. <a href="#fnref:3" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:4" role="doc-endnote">
<p>It also shows that \(H\) can be trivially made time-dependent. This comes as a direct consequence of the MLE problem which we solve to determine \(T\).
If we adopt a time-dependent mixing matrix \(H(t)\), the solution still has the same form, with the only difference that it will also be time-varying: \(T(t) = (H(t)^\top \Sigma^{-1} H(t))^{-1} H(t)^\top \Sigma^{-1}\). <a href="#fnref:4" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:5" role="doc-endnote">
<p>There are costs associated with computing the projector \(T\) and executing the projections.
However, these costs are dominated by the ones related to storing and inverting the covariance matrix in practical scenarios (see appendix C of the paper by <a href="http://proceedings.mlr.press/v119/bruinsma20a.html">Bruinsma <em>et al.</em> (2020)</a>). <a href="#fnref:5" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>Eric Perim, Wessel Bruinsma, and Will TebbuttIn our previous post, we explained that multi-output Gaussian processes (MOGPs) are not fundamentally different from their single-output counterparts. We also introduced the Mixing Model Hierarchy (MMH), which is a broad class of MOGPs that covers several popular and powerful models from the literature. In this post, we will take a closer look at the central model from the MMH, the Instantaneous Linear Mixing Model (ILMM). We will discuss the linear algebra tricks that make inference in this model much cheaper than for general MOGPs. Then, we will take an alternative and more intuitive approach and use it as motivation for a yet better scaling model, the Orthogonal Instantaneous Linear Mixing Model (OILMM). Like most linear MOGPs, the OILMM represents data in a smaller-dimensional subspace; but contrary to most linear MOGPs, in practice the OILMM scales linearly with the dimensionality of this subspace, retaining exact inference.Gaussian Processes: from one to many outputs2021-02-19T00:00:00+00:002021-02-19T00:00:00+00:00https://invenia.github.io/blog/2021/02/19/OILMM-pt1<p>This is the first post in a three-part series we are preparing on multi-output Gaussian Processes. Gaussian Processes (GPs) are a popular tool in machine learning, and a technique that we routinely use in our work.
Essentially, GPs are a powerful Bayesian tool for regression problems (which can be extended to classification problems through some modifications).
As a Bayesian approach, GPs provide a natural and automatic mechanism to construct and calibrate uncertainties.
Naturally, getting <em>well</em>-calibrated uncertainties is not easy and depends on a combination of how well the model matches the data and on how much data is available.
Predictions made using GPs are not just point predictions: they are whole probability distributions, which is convenient for downstream tasks. There are several good references for those interested in learning more about the benefits of Bayesian methods, from <a href="https://towardsdatascience.com/an-introduction-to-bayesian-inference-e6186cfc87bc">introductory</a> <a href="https://towardsdatascience.com/what-is-bayesian-statistics-used-for">blog posts</a> to <a href="https://probml.github.io/pml-book/book0.html">classical</a> <a href="http://www.stat.columbia.edu/~gelman/book/">books</a>.</p>
<p>In this post (and in the forthcoming ones in this series), we are going to assume that the reader has some level of familiarity with GPs in the single-output setting.
We will try to keep the maths to a minimum, but will rely on mathematical notation whenever that helps making the message clear.
For those who are interested in an introduction to GPs—or just a refresher—we point towards <a href="https://distill.pub/2019/visual-exploration-gaussian-processes/">other</a> <a href="https://medium.com/analytics-vidhya/intuitive-intro-to-gaussian-processes-328740cdc37f">resources</a>.
For a rigorous and in-depth introduction, the <a href="http://www.gaussianprocess.org/gpml/">book by Rasmussen and Williams</a> stands as one of the best references (and it is made freely available in electronic form by the authors).</p>
<p>We will start by discussing the extension of GPs from one to multiple dimensions, and review popular (and powerful) approaches from the literature.
In the following posts, we will look further into some powerful tricks that bring improved scalability and will also share some of our code.</p>
<h2 id="multi-output-gps">Multi-output GPs</h2>
<p>While most people with a background in machine learning or statistics are familiar with GPs, it is not uncommon to have only encountered their single-output formulation.
However, many interesting problems require the modelling of multiple outputs instead of just one.
Fortunately, it is simple to extend single-output GPs to multiple outputs, and there are a few different ways of doing so. We will call these constructions multi-output GPs (MOGPs).</p>
<p>An example application of a MOGP might be to predict both temperature and humidity as a function of time. Sometimes we might want to include binary or categorical outputs as well, but in this article we will limit the discussion to real-valued outputs.
(MOGPs are also sometimes called multi-task GPs, in which case an output is instead referred to as a task. But the idea is the same.)
Moreover we will refer to inputs as time, as in the time series setting, but all the discussion in here is valid for any kind of input.</p>
<p>The simplest way to extend GPs to multi-output problems is to model each of the outputs independently, with single-output GPs.
We call this model IGPs (for independent GPs). While conceptually simple, computationally cheap, and easy to implement, this approach fails to account for correlations between outputs.
If the outputs are correlated, knowing one can provide useful information about others (as we illustrate below), so assuming independence can hurt performance and, in many cases, limit it to being used as a baseline.</p>
<p>To define a general MOGP, all we have to do is to also specify how the outputs covary.
Perhaps the simplest way of doing this is by prescribing an additional covariance function (kernel) over outputs, \(k_{\mathrm{o}}(i, j)\), which specifies the covariance between outputs \(i\) and \(j\).
Combining this kernel over outputs with a kernel over inputs, e.g. \(k_{\mathrm{t}}(t, t')\), the full kernel of the MOGP is then given by</p>
\[\begin{equation}
k((i, t), (j, t')) = \operatorname{cov}(f_i(t), f_j(t')) = k_{\mathrm{o}}(i, j) k_{\mathrm{t}}(t, t'),
\end{equation}\]
<p>which says that the covariance between output \(i\) at input \(t\) and output \(j\) at input \(t'\) is equal to the product \(k_{\mathrm{o}}(i, j) k_{\mathrm{t}}(t, t')\).
When the kernel \(k((i, t), (j, t'))\) is a product between a kernel over outputs \(k_{\mathrm{o}}(i, j)\) and a kernel over inputs \(k_{\mathrm{t}}(t,t’)\), the kernel \(k((i, t), (j, t'))\) is called <em>separable</em>.
In the general case, the kernel \(k((i, t), (j, t'))\) does not have to be separable, i.e. it can be any arbitrary <a href="https://en.wikipedia.org/wiki/Positive-definite_function">positive-definite function</a>.</p>
<p>Contrary to IGPs, general MOGPs do model correlations between outputs, which means that they are able to use observations from one output to better predict another output.
We illustrate this below by contrasting the predictions for two of the outputs in the <a href="https://sccn.ucsd.edu/~arno/fam2data/publicly_available_EEG_data.html">EEG dataset</a>, one observed and one not observed, using IGPs and another flavour of MOGPs, the ILMM, which we will discuss in detail in the next section. Contrary to the independent GP (IGP), the ILMM is able to successfully predict F2 by exploiting the observations for F3 (and other outputs not shown).</p>
<p><img src="/blog/public/images/eeg.png" alt="IGP_vs_MOGP" />
Figure 1: Predictions for two of the outputs in the EEG dataset using two distinct MOGP approaches, the ILMM and the IGP.
All outputs are modelled jointly, but we only plot two of them for clarity.</p>
<h3 id="equivalence-to-single-output-gps">Equivalence to single-output GPs</h3>
<p>An interesting thing to notice is that a general MOGP kernel is just another kernel, like those used in single-output GPs, but one that now operates on an <em>extended</em> input space (because it also takes in \(i\) and \(j\) as input).
Mathematically, say one wants to model \(p\) outputs over some input space \(\mathcal{T}\).
By also letting the index of the output be part of the input, we can construct this extended input space: \(\mathcal{T}_{\mathrm{ext}} = \{1,...,p\} \times \mathcal{T}\). Then, a multi-output Gaussian process (MOGP) can be defined via a mean function, \(m\colon \mathcal{T}_{\mathrm{ext}} \to \mathbb{R}\), and a kernel, \(k\colon \mathcal{T}_{\mathrm{ext}}^2 \to \mathbb{R}\).
Under this construction it is clear that any property of single-output GPs immediately transfers to MOGPs, because MOGPs can simply be seen as single-output GPs on an extended input space.</p>
<p>An equivalent formulation of MOGPs can be obtained by stacking the multiple outputs into a vector, creating a <em>vector-valued GP</em>.
It is sometimes helpful to view MOGPs from this perspective, in which the multiple outputs are viewed as one multidimensional output.
We can use this equivalent formulation to define MOGP via a <em>vector-valued</em> mean function, \(m\colon \mathcal{T} \to \mathbb{R}^p\), and a <em>matrix-valued</em> kernel, \(k\colon\mathcal{T}^2 \to \mathbb{R}^{p \times p}\). This mean function and kernel are <em>not</em> defined on the extended input space; rather, in this equivalent formulation, they produce <em>multi-valued outputs</em>.
The vector-valued mean function corresponds to the mean of the vector-valued GP, \(m(t) = \mathbb{E}[f(t)]\), and the matrix-valued kernel to the covariance matrix of vector-valued GP, \(k(t, t’) = \mathbb{E}[(f(t) - m(t))(f(t’) - m(t’))^\top]\).
When the matrix-valued kernel is evaluated at \(t = t’\), the resulting matrix \(k(t, t) = \mathbb{E}[(f(t) - m(t))(f(t) - m(t))^\top]\) is sometimes called the <em>instantaneous spatial covariance</em>: it describes a covariance between different outputs at a given point in time \(t\).</p>
<p>Because MOGPs can be viewed as single-output GPs on an extended input space, inference works exactly the same way.
However, by extending the input space we exacerbate the scaling issues inherent with GPs, because the total number of observations is counted by adding the numbers of observations for each output, and GPs scale badly in the number of observations.
While inference in the single-output setting requires the inversion of an \(n \times n\) matrix (where \(n\) is the number of data points), in the case of \(p\) outputs, assuming that at all times all outputs are observed,<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup> this turns into the inversion of an \(np \times np\) matrix (assuming the same number of input points for each output as in the single output case), which can quickly become computationally intractable (i.e. not feasible to compute with limited computing power and time).
That is because the inversion of a \(q \times q\) matrix takes \(\mathcal{O}(q^3)\) time and \(\mathcal{O}(q^2)\) memory, meaning that time and memory performance will scale, respectively, cubically and quadratically on the number of points in time, \(n\), and outputs, \(p\).
In practice this scaling characteristic limits the application of this general MOGP formulation to data sets with very few outputs and data points.</p>
<h3 id="low-rank-approximations">Low-rank approximations</h3>
<p>A popular and powerful approach to making MOGPs computationally tractable is to impose a <a href="https://en.wikipedia.org/wiki/Rank_(linear_algebra)">low-rank</a> structure over the covariance between outputs.
That is equivalent to assuming that the data can be described by a set of latent (unobserved) Gaussian processes in which the number of these <em>latent processes</em> is fewer than the number of outputs.
This builds a simpler lower-dimensional representation of the data. The structure imposed over the covariance matrices through this lower-dimensional representation of the data can be exploited to perform the inversion operation more efficiently (we are going to discuss in detail one of these cases in the next post of this series).
There are a variety of different ways in which this kind of structure can be imposed, leading to an interesting class of models which we discuss in the next section.</p>
<p>This kind of assumption is typically used to make the method computationally cheaper.
However, these assumptions do bring extra <a href="https://en.wikipedia.org/wiki/Inductive_bias">inductive bias</a> to the model. Introducing inductive bias <a href="https://towardsdatascience.com/supercharge-your-model-performance-with-inductive-bias-48559dba5133">can be a powerful tool</a> in making the model more data-efficient and better-performing in practice, provided that such assumptions are adequate to the particular problem at hand.
For instance, <a href="https://epubs.siam.org/doi/pdf/10.1137/18M1183480">low-rank data</a> <a href="https://ieeexplore.ieee.org/document/1177153">occurs naturally</a> in <a href="https://www.sciencedirect.com/science/article/abs/pii/S0005109807003950">different settings</a>.
This also happens to be true in electricity grids, due to the <a href="https://lib.dr.iastate.edu/cgi/viewcontent.cgi?article=1031&context=econ_las_conf">mathematical structure</a> of the price-forming process.
To make good choices about the kind of inductive bias to use, experience, domain knowledge, and familiarity with the data can be helpful.</p>
<h2 id="the-mixing-model-hierarchy">The Mixing Model Hierarchy</h2>
<p>Models that use a lower-dimensional representation for data have been present in the literature for a long time.
Well-known examples include <a href="https://en.wikipedia.org/wiki/Factor_analysis">factor analysis</a>, <a href="https://en.wikipedia.org/wiki/Principal_component_analysis">PCA</a>, and <a href="https://en.wikipedia.org/wiki/Autoencoder#Variational_autoencoder_(VAE)">VAEs</a>.
Even if we restrict ourselves to GP-based models there are still a significant number of notable and powerful models that make this simplifying assumption in one way or another.
However, models in the class of MOGPs that explain the data with a lower-dimensional representation are often framed in many distinct ways, which may obscure their relationship and overarching structure.
Thus, it is useful to try to look at all these models under the same light, forming a well-defined family that highlights their similarities and differences.</p>
<p>In one of the appendices of a <a href="http://proceedings.mlr.press/v119/bruinsma20a.html">recent paper of ours</a>,<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup> we presented what we called the <em>Mixing Model Hierarchy (MMH)</em>, an attempt at a unifying presentation of this large class of MOGP models.
It is important to stress that our goal with the Mixing Model Hierarchy is only to organise a large number of pre-existing models from the literature in a reasonably general way, and not to present new models nor claim ownership over important contributions made by other researchers.
Further down in this article we present a diagram that connects several relevant papers to the models from the MMH.</p>
<p>The simplest model from the Mixing Model Hierarchy is what we call the <em>Instantaneous Linear Mixing Model</em> (ILMM), which, despite its simplicity, is still a rather general way of describing low-rank covariance structures. In this model the observations are described as a linear combination of the <em>latent processes</em> (i.e. unobserved stochastic processes), given by a set of constant weights.
That is, we can write \(f(t) | x, H = Hx(t)\),<sup id="fnref:3" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">3</a></sup> where \(f\) are the observations, \(H\) is a matrix of weights, which we call the <em>mixing matrix</em>, and \(x(t)\) represents the (time-dependent) latent processes—note that we use \(x\) here to denote an unobserved (latent) stochastic process, not an input, which is represented by \(t\).
If the latent processes \(x(t)\) are described as GPs, then due to the fact that linear combinations of Gaussian variables are also Gaussian, \(f(t)\) will also be a (MO)GP.</p>
<p>The top-left part of the figure below illustrates the graphical model for the ILMM.<sup id="fnref:4" role="doc-noteref"><a href="#fn:4" class="footnote" rel="footnote">4</a></sup>
The graphical model highlights the two restrictions imposed by the ILMM when compared with a general MOGP: <em>(i)</em> the <em>instantaneous spatial covariance</em> of \(f\), \(\mathbb{E}[f(t) f^\top(t)] = H H^\top\), does not vary with time, because neither \(H\) nor \(K(t, t) = I_m\) varies with time; and <em>(ii)</em> the noise-free observation \(f(t)\) is a function of \(x(t')\) for \(t'=t\) only, meaning that, for example, \(f\) cannot be \(x\) with a delay or a smoothed version of \(x\). Reflecting this, we call the ILMM a <em>time-invariant</em> (due to <em>(i)</em>) and <em>instantaneous</em> (due to <em>(ii)</em>) MOGP.</p>
<p><img src="/blog/public/images/MMH-graphical_model.png" alt="MMH_graphical_model" />
Figure 2: Graphical model for different models in the MMH.<sup id="fnref:4:1" role="doc-noteref"><a href="#fn:4" class="footnote" rel="footnote">4</a></sup></p>
<p>There are three general ways in which we can generalise the ILMM within the MMH.
<em>The first one is to allow the mixing matrix \(H\) to vary in time</em>.
That means that the amount each latent process contributes to each output varies in time.
Mathematically, \(H \in \R^{p \times m}\) becomes a matrix-valued function \(H\colon \mathcal{T} \to \R^{p \times m}\), and the mixing mechanism becomes</p>
\[\begin{equation}
f(t)\cond H, x = H(t) x(t).
\end{equation}\]
<p>We call such MOGP models <em>time-varying</em>.
Their graphical model is shown in the figure above, on the top right corner.</p>
<p><em>A second way to generalise the ILMM is to assume that \(f(t)\) depends on \(x(t')\) for all \(t' \in \mathcal{T}\).</em>
That is to say that, at a given time, each output may depend on the values of the latent processes at any other time.
We say that these models become <em>non-instantaneous</em>.
The mixing matrix \(H \in \R^{p \times m}\) becomes a matrix-valued time-invariant filter \(H\colon \mathcal{T} \to \R^{p \times m}\), and the mixing mechanism becomes</p>
\[\begin{equation}
f(t)\cond H, x = \int H(t - \tau) x(\tau) \mathrm{d\tau}.
\end{equation}\]
<p>We call such MOGP models <em>convolutional</em>.
Their graphical model is shown in the figure above, in the bottom left corner.</p>
<p><em>A third generalising assumption that we can make is that \(f(t)\) depends on \(x(t')\) for all \(t' \in \mathcal{T}\) <span style="text-decoration:underline;">and</span> this relationship may vary with time.</em>
This is similar to the previous case, in that both models are non-instantaneous, but with the difference that this one is also time-varying.
The mixing matrix \(H \in \R^{p \times m}\) becomes a matrix-valued time-varying filter \(H\colon \mathcal{T}\times\mathcal{T} \to \R^{p \times m}\), and the mixing mechanism becomes</p>
\[\begin{equation}
f(t)\cond H, x = \int H(t, \tau) x(\tau) \mathrm{d\tau}.
\end{equation}\]
<p>We call such MOGP models <em>time-varying</em> and <em>convolutional</em>.
Their graphical model is shown in the figure above in the bottom right corner.</p>
<p>Besides these generalising assumptions, a further extension is to adopt a prior over \(H\).
Using such a prior allows for a principled way of further imposing inductive bias by, for instance, encouraging sparsity.
This extension and the three generalisations discussed above together form what we call the <em><a href="http://proceedings.mlr.press/v119/bruinsma20a.html">Mixing Model Hierarchy (MMH)</a></em>, which is illustrated in the figure below.
The MMH organises multi-output Gaussian process models according to their distinctive modelling assumptions.
The figure below shows how twenty one MOGP models from the machine learning and geostatistics literature can be recovered as special cases of the various generalisations of the ILMM.</p>
<p><img src="/blog/public/images/MMH-Zoubins_cube.png" alt="MMH" />
Figure 3: Diagram relating several models from the literature to the MMH, based on their properties.</p>
<p>Naturally, these different members of the MMH vary in complexity and each brings their own set of challenges.
Particularly, exact inference is computationally expensive or even intractable for many models in the MMH, which requires the use of approximate inference methods such as <a href="https://en.wikipedia.org/wiki/Variational_Bayesian_methods">variational inference</a> (VI), or even <a href="https://en.wikipedia.org/wiki/Markov_chain_Monte_Carlo">Markov Chain Monte Carlo</a> (MCMC).
We definitely recommend the reading of the original papers if one is interested in seeing all the clever ways the authors find to perform inference efficiently.</p>
<p>Although the MMH defines a large and powerful family of models, not all multi-output Gaussian process models are covered by it.
For example, <a href="https://arxiv.org/abs/1211.0358">Deep GPs</a> and its <a href="https://papers.nips.cc/paper/2018/hash/2974788b53f73e7950e8aa49f3a306db-Abstract.html">variations</a> are excluded because they transform the latent processes <em>nonlinearly</em> to generate the observations.</p>
<h2 id="conclusion">Conclusion</h2>
<p>In this post we have briefly discussed how to extend regular, single output Gaussian Processes (GP) to multi-output Gaussian Processes (MOGP), and argued that MOGPs are really just single-output GPs after all.
We have also introduced the Mixing Model Hierarchy (MMH), which classifies a large number of models from the MOGP literature based on the way they generalise a particular base model, the Instantaneous Linear Mixing Model (ILMM).</p>
<p>In the next post of this series, we are going to discuss the ILMM in more detail and show how some simple assumptions can lead to a much more scalable model, which is applicable to extremely large systems that not even the simplest members of the MMH can tackle in general.</p>
<!-- Footnotes themselves at the bottom. -->
<h3 id="notes">Notes</h3>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:1" role="doc-endnote">
<p>When every output is observed at each time stamp, we call them fully observed outputs. If the outputs are not fully observed, only a subset of them might be available for certain times (for example due to faulty sensors).
In this case, the number of data points will be smaller than \(np\), but will still scale proportionally with \(p\).
Thus, the scaling issues will still be present. In the case where only a single output is observed at any given time, the number of observations will be \(n\), and the MOGP would have the same time and memory scaling as a single-output GP. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:2" role="doc-endnote">
<p><a href="http://proceedings.mlr.press/v119/bruinsma20a.html">Bruinsma, Wessel, et al. “Scalable Exact Inference in Multi-Output Gaussian Processes.” International Conference on Machine Learning. PMLR, 2020</a>. <a href="#fnref:2" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:3" role="doc-endnote">
<p>Here \(f(t)\mid x,H\) is used to denote the value of \(f(t)\) given a known \(H\) and \(x(t)\). <a href="#fnref:3" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:4" role="doc-endnote">
<p>Although we illustrate the GPs in the graphical models as a Markov chain, that is just to improve clarity.
In reality, GPs are much more general than Markov chains, as there is no conditional independence between timestamps. <a href="#fnref:4" class="reversefootnote" role="doc-backlink">↩</a> <a href="#fnref:4:1" class="reversefootnote" role="doc-backlink">↩<sup>2</sup></a></p>
</li>
</ol>
</div>Eric Perim, Wessel Bruinsma, and Will TebbuttThis is the first post in a three-part series we are preparing on multi-output Gaussian Processes. Gaussian Processes (GPs) are a popular tool in machine learning, and a technique that we routinely use in our work. Essentially, GPs are a powerful Bayesian tool for regression problems (which can be extended to classification problems through some modifications). As a Bayesian approach, GPs provide a natural and automatic mechanism to construct and calibrate uncertainties. Naturally, getting well-calibrated uncertainties is not easy and depends on a combination of how well the model matches the data and on how much data is available. Predictions made using GPs are not just point predictions: they are whole probability distributions, which is convenient for downstream tasks. There are several good references for those interested in learning more about the benefits of Bayesian methods, from introductory blog posts to classical books.How to Start Contributing to Open Source Software2021-01-29T00:00:00+00:002021-01-29T00:00:00+00:00https://invenia.github.io/blog/2021/01/29/contribute-open-source<p>If you are someone who feels comfortable using code to solve a problem, answer a question, or just implement something for fun, chances are you are relying on <a href="https://opensource.com/resources/what-open-source">open source software</a>.
If you want to contribute to open source software, but don’t know how and where to start, this guide is for you.</p>
<p>In this post, we will first discuss some of the possible mental and technical barriers standing between you and your first meaningful contribution.
Then, we will go through the steps involved in making the first contribution.</p>
<p>For the sake of brevity, we assume some basic familiarity with <a href="https://guides.github.com/introduction/git-handbook/">git</a> (if you know how to commit changes and what a branch is, you’ll be fine).
Because many open source projects are hosted on GitHub, we also discuss <a href="https://guides.github.com/features/issues">GitHub Issues</a>, which refer to GitHub functionality that allows discussion about bugs and new features, and the term “pull/merge request” (PR/MR, they are the same thing), which is a mechanism for submitting your changes to be incorporated into the existing repository.</p>
<h2 id="mental-and-technical-barriers">Mental and Technical Barriers</h2>
<p>It is easy to observe the world of open source and think:
“Oh look at all these smart people doing meaningful work and developing meaningful relationships, building humanity’s digital heritage that helps to run our civilisation.
I wish I could join the party, <em>but</em> <something I believe to be true>.”</p>
<p>Some misconceptions I and other people have had are listed below, along with what we think about it now.</p>
<blockquote>
<p>“I need to be an expert in <the language> AND <the package> to even consider reporting a bug, making a fix, or implementing new functionality.”</p>
</blockquote>
<p>If you are using the package and something doesn’t work as expected, you should report it, for example by opening a GitHub issue.
You don’t need to know how the package works internally.
All you need to do is take a quick look at the documentation to see if anything is written about your problem, and check if a similar issue has been opened already.
If you want to fix something yourself, that’s fantastic! Just jump in and try to figure it out.</p>
<blockquote>
<p>“I know about <a thing>, but other people know much more, and it’s much better if they implemented it.”</p>
</blockquote>
<p>This is almost always the case, unless you are the world’s leading expert on <a thing>.
However, they are likely busy with other important work, and the issue is not a priority for them.
So, either you implement it, or it may not get done at all, so go ahead!
The best part, however, is that these other more knowledgeable people will usually be happy to review your solution and suggest how to make it even better, which means that a great thing gets built and you learn something in the process.</p>
<blockquote>
<p>“I don’t know anyone contributing to the package and they look like a team, isn’t it weird if I just jump in and open an issue/PR?”</p>
</blockquote>
<p>No, it’s not weird.
Teams make their code and issues public because they’re looking for new contributors like you.</p>
<blockquote>
<p>“I won’t be able to create a perfect solution, and people will point out flaws and ask me to change it.”</p>
</blockquote>
<p>Solutions to issues usually come in the form of pull requests.
However, opening a PR is best thought of as a conversation about a solution, rather than a finished product that is either approved or rejected.
Experienced contributors often open PRs to solicit feedback about an idea, because an open PR on GitHub offers convenient tools for discussion about code.
Even if you think your solution is complete, people will likely ask you to make changes, and that’s alright!
If it isn’t explicitly mentioned (it should be), ask why the changes are needed—these are valuable opportunities to learn.</p>
<blockquote>
<p>“I would like to make a contribution, but don’t know where to start.”</p>
</blockquote>
<p>Finding the right place to start can be challenging, but see advice below.
Once you make a contribution or two they will lead you on to others, so you typically only have to overcome this barrier once.</p>
<blockquote>
<p>“I know what is broken and I think I know how to fix it, but don’t know the steps to publish these to the official repository.”</p>
</blockquote>
<p>That’s fantastic! See the rest of the guide.</p>
<blockquote>
<p>“What if people ask for changes? How do I implement those?”</p>
</blockquote>
<p>Somehow I thought implementing review feedback was hard and messy, but, in practice, it’s as easy as adding more commits to a branch.</p>
<h2 id="steps-to-first-contribution">Steps to First Contribution</h2>
<p>Now that we have gone through some of the concerns you may have, here is the step-by-step guide to your first contribution.</p>
<h3 id="1-learn-the-mechanics-of-a-pull-request">1) Learn the mechanics of a pull request</h3>
<p>The workflow is described in this <a href="https://github.com/firstcontributions/first-contributions">excellent repository on GitHub</a>, built just for learning the mechanics of making a pull request.
I recommend to go through it by not just reading it, but by actually going through all the steps.
That exercise should make you comfortable with the process.</p>
<h3 id="2-find-something-you-want-to-fix">2) Find something you want to fix</h3>
<p>A good first project might be solving a bug that affects you, as that means you already have a test case and you will be more motivated to find a solution.
However, if the bug is in a large and complicated library or requires a lot of code refactoring to fix, it is probably better to start somewhere else.</p>
<p>It may be more enjoyable to start with smaller or medium-sized packages because they can be easier to understand.
When you find a package you would like to modify, make sure that it is the original (not a fork) and it is being maintained, which you can check by looking at the issues and pull requests on its GitHub page.</p>
<p>Then, look through the issues and see if there is something that you find interesting.
Pay attention to <code class="language-plaintext highlighter-rouge">good-first-issue</code> labels, which indicate issues that people think are appropriate for first-time contributors.
This usually means that they are nice and not too hard to solve.
You don’t have to restrict yourself to issues <code class="language-plaintext highlighter-rouge">good-first-issue</code> labels, feel free to tackle anything you feel motivated and able to do.
Keep in mind that it might be better to start with a smaller PR and get that merged first, before tackling a bigger issue.
You don’t want to submit a week worth of work only to find out that the package has been abandoned and there is nobody willing to review and merge your PR.</p>
<p>When you find an interesting issue and decide you want to work on it, it is a good idea to comment on the issue first and ask whether anyone is willing to review a potential PR.
Commenting will also create a feeling of responsibility and ownership of the issue which will motivate you and help you finish the PR.</p>
<p>As a few ideas, here are some concrete Julia packages that Invenia is involved with, for example <a href="https://github.com/JuliaCloud/AWS.jl">AWS.jl</a> for interacting with AWS, <a href="https://github.com/invenia/Intervals.jl">Intervals.jl</a> for working with intervals over ordered types, <a href="https://github.com/invenia/BlockDiagonals.jl">BlockDiagonals.jl</a> for working with block-diagonal matrices, and <a href="https://github.com/JuliaDiff/ChainRules.jl">ChainRules.jl</a> for automatic differentiation.
We are happy to help you contribute to these!</p>
<h3 id="3-implement-your-solution-and-open-a-pr">3) Implement your solution, and open a PR</h3>
<p>While you should be familiar with the mechanics of pull requests after step 1, there are some additional social/etiquette considerations.
Generally, the authors of open source packages are delighted when someone uses their package, opens issues about bugs or potential improvements, and especially so when someone opens a pull request with a solution to a known problem.
That said, they will appreciate it if you make things easy for them by linking the issue your PR is solving and a brief reason why you have chosen this approach.
If you are unsure about whether something was a good choice, point it out in the description.</p>
<p>If your background isn’t in computer science or software engineering, you might not have heard of <a href="https://ocw.mit.edu/ans7870/6/6.005/s16/classes/03-testing/index.html">unit testing</a>.
Testing is a way of ensuring the correctness of the code by checking the output of the code for a number of inputs.
In packages with unit tests, every new feature is typically expected to come with tests.
When fixing a bug, a test that fails using the old code and passes using the new code may also be expected.</p>
<p>You can make the review process more efficient and pleasant by quickly examining the PR yourself.
What needs to be included in the PR depends on the issue at hand, but there are some general questions you can think about:</p>
<ul>
<li>Does my code work in corner cases, did I include reasonable tests?</li>
<li>Did I add or change documentation to match my changes?</li>
<li>Does my code formatting follow the rest of the package? Some packages follow code style guides, such as <a href="https://www.python.org/dev/peps/pep-0008/">PEP8</a> or <a href="https://github.com/invenia/BlueStyle">BlueStyle</a>.</li>
<li>Did I include any lines or files by mistake?</li>
</ul>
<p>Don’t worry if you can’t answer these questions.
It is perfectly fine to ask!
You can also self-review your PR and add some thoughts as comments to the code.</p>
<p>The <a href="http://colprac.sciml.ai">contributor’s guide on collaborative practices</a> is a great resource about the best practices regarding collaboration on open source projects.
The packages that follow it display this badge: <a href="https://github.com/SciML/ColPrac"><img src="https://img.shields.io/badge/ColPrac-Contributor's%20Guide-blueviolet" alt="ColPrac: Contributor's Guide on Collaborative Practices for Community Packages" /></a>
Other packages typically have their own guidelines outlined in a file named <code class="language-plaintext highlighter-rouge">CONTRIBUTING.md</code>.</p>
<h3 id="4-address-feedback-and-wait-for-the-merge">4) Address feedback, and wait for the merge</h3>
<p>Once the PR is submitted someone will likely respond to it in a few days.
If it doesn’t happen, feel free to “bump” it by adding a comment to the PR asking for it to be reviewed.
Most maintainers do not mind if you bump the PR every ten days or so, and in fact find it useful in case it has slipped under their radar.</p>
<p>Ideally the feedback will be constructive, actionable, and educational.
Sometimes it isn’t, and if you are very unlucky the reviewer might come across as stern and critical.
It helps to remember that such feedback might have been unintentional and that you are in fact on the same side, both wanting the best code to be merged.
Using plural first-person pronouns (we) is a good way to convey this sentiment (and remind the reviewer about it), for example:
“What are the benefits if we implement <a feature> in this way?” is better than “Why do you think <a feature> should be implemented this way?”.</p>
<p>Addressing feedback is easy: simply add more commits to the branch that the PR is for.
Once you think you have addressed all the feedback, let the reviewer know explicitly, as they don’t know whether you plan to add more commits or not.
If all went well the reviewer will then merge the PR, hooray!</p>
<h2 id="conclusions">Conclusions</h2>
<p>Unless you have started programming very recently, you likely already have the technical/programming ability to contribute to open source projects.
You have valuable contributions to make, but psychological/sociological barriers may be holding you back.
Hopefully reading this post will help you overcome them: and we are looking forward to welcoming you to the community and seeing what you come up with!</p>Miha ZgubicIf you are someone who feels comfortable using code to solve a problem, answer a question, or just implement something for fun, chances are you are relying on open source software. If you want to contribute to open source software, but don’t know how and where to start, this guide is for you.Navigating Your First Technical Internship2021-01-22T00:00:00+00:002021-01-22T00:00:00+00:00https://invenia.github.io/blog/2021/01/22/navigating-first-internship<p>During the final weeks of my internship with Invenia, while looking back on my time here,
I had the idea to share some thoughts on my experience and what advice I wish I had been given leading up to my start date.
There are countless recipes for a successful internship, and I hope what follows can help guide you towards making yours great.</p>
<p>First and foremost, congratulations on landing your first technical
internship! After hours of fine-tuning your resume, countless cover
letters written, and (if you are in the same boat I was in) having faced
your fair share of rejections, you’ve made it. Before digging into the
rest of this post, take a moment to be proud of the work you’ve put in
and appreciate where it has led you. If perhaps you are reading this in
the middle of your application process and are yet to receive an offer,
don’t worry. The first summer I applied for technical internships, out
of roughly 20 applications sent, I received a single interview and zero
offers. It never feels like it is going to happen until it does. Here’s
a <a href="https://www.freecodecamp.org/news/how-to-land-a-top-notch-tech-job-as-a-student-5c97fec82f3d/">useful
guide</a>
if you’re looking for tips on the application process.</p>
<p>Hopefully the advice that follows in this post will give you some
insight that helps you make a strong impression throughout the course
of your internship. Some more technical subjects or toolkits will be
mentioned for the sake of example, but we won’t get into the details of
these in this post. There are plenty of online resources that provide
great introductions to
<a href="https://guides.github.com/introduction/git-handbook/">git</a>,
<a href="https://www.zdnet.com/article/what-is-cloud-computing-everything-you-need-to-know-about-the-cloud/">the
cloud</a>,
and countless other new topics you might encounter.</p>
<h2 id="before-you-start">Before you start</h2>
<p>The most important thing you can do leading up to your internship is <strong>try
to relax</strong>. It’s common to feel a nervous excitement as your first day
approaches. Remember that you will have plenty of time (give or take 40
hours per week) to focus on this soon enough. Take the time to enjoy
other hobbies or interests while you have the time.</p>
<p>If all distractions fail, a great way to spend any built up excitement
is to do some research. You likely already have a good sense of what the
company does, the general culture and what your role might entail from
the application and interview processes. Feel free to dig deeper into
these topics. See if there’s information on the company website about
your coworkers, the impact your work might drive and the industry as a
whole. This will help you feel more prepared when you start, as well as
settle some jitters in the weeks leading up.</p>
<h2 id="hitting-the-ground-running">Hitting the ground running</h2>
<p>The day has finally arrived, you’ve made it! Surely your first week will
be filled with lots of meetings, introductions, and learning. My advice
for this information overload is about as basic as it gets: <strong>take
notes, and lots of them</strong>. Ironically, being a software engineer, I am a
big fan of handwriting my notes. I also think it looks better from the
perspective of the person speaking to see someone writing notes by hand
versus typing away at a laptop, unable to see the screen.</p>
<p>In the same vein, any buzzwords you hear (any term, phrase, acronym or
title that you don’t fully understand) should be added to a personal
dictionary. This was really important for me, and I made reference to it
weekly throughout my 8 month internship. You should keep this up
throughout the entire internship, however the first two weeks is likely
when it will grow the most. <em>What is a Docker container and what makes
them helpful? Who knows what [insert some arbitrary industry acronym
here] stands for?</em> These are questions that are super easy to answer by
asking coworkers or even the internet. Maintaining your own shortlist of
important definitions will help fast track your learning and be a great
tool to pass along to new interns that join during your stay.</p>
<p>One of the best ways you can get to know the company and those you are
working with is simply to <strong>reach out to your coworkers</strong>. This can be
challenging depending on the size of the company you are with. It
becomes even more intimidating if you are in the middle of a global
pandemic and are onboarding from home. Try connecting via Slack, email
or set up calls to get to know those both in and outside of your team.
Learning about the different teams, individuals’ career paths and
building your network is one of the best ways to make the most of your
internship, both personally and professionally. Reaching out like this
can be intimidating, especially early on, but it will show that you are
interested in the company as a whole and highlight that you have strong
initiative. People generally are more than happy to talk about their
work and personal interests. There is no reason not to start this during
your first couple weeks.</p>
<h2 id="the-bulk">The Bulk</h2>
<p>Throughout my internship, I discovered three main takeaways that should
be considered by anyone working through an internship, especially if it
is your first.</p>
<h3 id="1-who-is-this-imposter">1. Who is this imposter?</h3>
<p>Imposter Syndrome is the feeling of doubting your own skills,
accomplishments and thinking of yourself as some kind of fraud. For me,
this manifested itself as a stream of questions such as <em>When will they
realize they’ve made a huge mistake in hiring me?</em> and thoughts similar
to <em>I am definitely not smart enough for this!</em> Temper this by
remembering <strong>they hired you for a reason</strong>. After interviewing many
candidates and having spent hours speaking to you and testing your
skills, they picked you. This reminder certainly won’t make all these
anxieties disappear, but can hopefully help mitigate any unnecessary
stress. Being a little nervous can help motivate you to work hard and
push you to succeed. It is worth remembering that all your coworkers
likely went through the same experience, or even still feel this way
from time to time. Feel free to get their insights or experience with
this if you feel comfortable doing so.</p>
<p>These thoughts and feelings might start well before your first day and
last well into the internship itself, as they did in my case. It will
improve with time and experience. Until that happens just
remember to use your available sources of support and try to translate
it into motivation.</p>
<h3 id="2-asking-for-help">2. Asking for help</h3>
<p>A big fear I had during my internship was coming across as naive and
inexperienced. I was very worried about asking a question and getting
“<em>How do you not know that? Don’t you know anything</em>?” as a response.
While this is certainly a normal thought process, it is misguided for a
few reasons. First off, my coworkers are great people, as I am sure are
yours. The odds of someone saying that are slim to none, and if they do,
it tells you a lot more about them than it does about you. Secondly, and
this is an important thing to keep in mind: <strong>no one expects you to know
everything and be great at everything,</strong> especially early on as an
intern. Asking questions is an important part of learning and
internships are no exception. This one comes in two parts: <em>when</em> and <em>how</em> to ask for help.</p>
<p>Let me save you the time I wasted trying to figure out when is the
perfect point to ask a question. While you definitely do not want to
just ask before thinking or doing any digging yourself, no one wants you
endlessly spinning your wheels. Take the time to think about the
problem, see if there are any reliable resources or answers in some
documentation, attempt a couple of solutions, but don’t fuss until the
end of time out of fear of looking dumb.</p>
<p>How you ask for help is the easy one. You’ve done all the work already.
Avoid questions like “<em>Hey how do you do [insert problem]?</em>”, or even
worse “<em>Hey I know this is probably SUPER stupid but I don’t get
[insert problem here] haha!</em>”. Do say something along the lines of
“<em>Hey [insert name]. I have been trying to figure out how to solve
[insert problem] and seem to be stuck. I have tried [insert attempted
solutions] with no success and was hoping you could point me in the
right direction.</em>” You can also frame it as a leading question, such as
“<em>So in order to do X, we have to do Y because of Z</em>?” It doesn’t have
to be a lengthy breakdown of every thought you had, it really shouldn’t
be. People generally prefer concise messages, just show that you have
put some thought and effort into it.</p>
<h3 id="3-make-your-voice-heard">3. Make your voice heard</h3>
<p>The scope of this point may vary depending on the size of your company,
its culture and your role, however, the point remains the same. Share
your thoughts on what you are working on. Share what interests you as
subjects, both within and outside your role.</p>
<p>I had the incredible opportunity to contribute to my organization in
ways beyond the scope of my job description. Yes, this is because of the
supportive nature of the company and the flexibility that comes with
working at a smaller organization, but it also would not have happened
had I not shared what I am interested in. I gained fantastic experience
in my role, but also developed an appreciation and better understanding
for other work being done at the company.</p>
<p>Share your thoughts at the weekly team meeting, don’t be afraid to
review code or improve documentation and bounce ideas off coworkers
during coffee breaks (oh, try not to drink too much coffee). They
hired you in large part for your brain, don’t be afraid to use it!</p>
<h2 id="final-impressions">Final Impressions</h2>
<p>You’ve made it to the final few weeks of your internship, congrats!
Hopefully you have had a fantastic experience, learning a lot and making
lasting relationships. Now is the time to think of who you would like to
connect with for a chat or call before you finish. This can be for any
number of reasons; giving a little extra thanks, asking for career
advice or even just for the sake of saying farewell to the friends
you’ve made along the way!</p>
<p>Regardless of whether or not you follow any of this advice, I wish you
the best of luck in your internship. While the advice above worked well
for me, it is by no means a one-size-fits-all magical recipe to the
perfect internship. There will certainly be hurdles along the way,
anxieties to overcome, and inevitable mistakes made, all of which will
contribute to making your internship a great learning experience. Good
luck and enjoy the ride.</p>Tom WrightDuring the final weeks of my internship with Invenia, while looking back on my time here, I had the idea to share some thoughts on my experience and what advice I wish I had been given leading up to my start date. There are countless recipes for a successful internship, and I hope what follows can help guide you towards making yours great.Linear Models from a Gaussian Process Point of View with Stheno and JAX2021-01-19T00:00:00+00:002021-01-19T00:00:00+00:00https://invenia.github.io/blog/2021/01/19/linear-models-with-stheno-and-jax<p><em>Cross-posted at <a href="https://wesselb.github.io/2021/01/19/linear-models-with-stheno-and-jax.html">wesselb.github.io</a>.</em></p>
<p>A linear model prescribes a linear relationship between inputs and outputs.
Linear models are amongst the simplest of models, but they are ubiquitous across science.
A linear model with Gaussian distributions on the coefficients forms one of the simplest instances of a <em><a href="https://en.wikipedia.org/wiki/Gaussian_process">Gaussian process</a></em>.
In this post, we will give a brief introduction to linear models from a Gaussian process point of view.
We will see how a linear model can be implemented with <em>Gaussian process probabilistic programming</em> using <a href="https://github.com/wesselb/stheno">Stheno</a>, and how this model can be used to denoise noisy observations.
(Disclosure: <a href="https://willtebbutt.github.io/">Will Tebbutt</a> and Wessel are the authors of Stheno;
Will maintains a <a href="https://github.com/willtebbutt/Stheno.jl">Julia version</a>.)
In short, <a href="https://en.wikipedia.org/wiki/Probabilistic_programming">probabilistic programming</a> is a programming paradigm that brings powerful probabilistic models to the comfort of your programming language, which often comes with tools to automatically perform inference (make predictions).
We will also use <a href="https://github.com/google/jax">JAX</a>’s just-in-time compiler to make our implementation extremely efficient.</p>
<h2 id="linear-models-from-a-gaussian-process-point-of-view">Linear Models from a Gaussian Process Point of View</h2>
<p>Consider a data set \((x_i, y_i)_{i=1}^n \subseteq \R \times \R\) consisting of \(n\) real-valued input–output pairs.
Suppose that we wish to estimate a linear relationship between the inputs and outputs:</p>
\[\label{eq:ax_b}
y_i = a \cdot x_i + b + \e_i,\]
<p>where \(a\) is an unknown slope, \(b\) is an unknown offset, and \(\e_i\) is some error/noise associated with the observation \(y_i\).
To implement this model with Gaussian process probabilistic programming, we need to cast the problem into a <em>functional form</em>.
This means that we will assume that there is some underlying, random function \(y \colon \R \to \R\) such that the observations are evaluations of this function: \(y_i = y(x_i)\).
The model for the random function \(y\) will embody the structure of the linear model \eqref{eq:ax_b}.
This may sound hard, but it is not difficult at all.
We let the random function \(y\) be of the following form:</p>
\[\label{eq:ax_b_functional}
y(x) = a(x) \cdot x + b(x) + \e(x)\]
<p>where \(a\colon \R \to \R\) is a <em>random constant function</em>.
An example of a <em>constant function</em> \(f\) is \(f(x) = 5\).
<em>Random</em> means that the value \(5\) is not fixed, but modelled with a random value drawn from some probability distribution, because we don’t know the true value.
We let \(b\colon \R \to \R\) also be a random <em>constant function</em>, and \(\e\colon \R \to \R\) a random <em>noise function</em>.
Do you see the similarities between \eqref{eq:ax_b} and \eqref{eq:ax_b_functional}?
If all that doesn’t fully make sense, don’t worry; things should become more clear as we implement the model.</p>
<p>To model random constant functions and random noise functions, we will use <a href="https://github.com/wesselb/stheno">Stheno</a>, which is a Python library for Gaussian process modelling.
We also have a <a href="https://github.com/willtebbutt/Stheno.jl">Julia version</a>, but in this post we’ll use the Python version.
To install Stheno, run the command</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>pip <span class="nb">install</span> <span class="nt">--upgrade</span> <span class="nt">--upgrade-strategy</span> eager stheno
</code></pre></div></div>
<p>In Stheno, a Gaussian process can be created with <code class="language-plaintext highlighter-rouge">GP(kernel)</code>, where <code class="language-plaintext highlighter-rouge">kernel</code> is the so-called <a href="https://en.wikipedia.org/wiki/Gaussian_process#Covariance_functions"><em>kernel</em> or <em>covariance function</em> of the Gaussian process</a>.
The kernel determines the properties of the function that the Gaussian process models.
For example, the kernel <code class="language-plaintext highlighter-rouge">EQ()</code> models smooth functions, and the kernel <code class="language-plaintext highlighter-rouge">Matern12()</code> models functions that look jagged.
See the <a href="https://www.cs.toronto.edu/~duvenaud/cookbook/">kernel cookbook</a> for an overview of commonly used kernels and the <a href="https://wesselb.github.io/stheno/docs/_build/html/readme.html#available-kernels">documentation of Stheno</a> for the corresponding classes.
For constant functions, you can set the kernel to simply a constant, for example <code class="language-plaintext highlighter-rouge">1</code>, which then models the constant function with a value drawn from \(\mathcal{N}(0, 1)\). (By default, in Stheno, all means are zero; but, if you like, <a href="https://wesselb.github.io/stheno/docs/_build/html/readme.html#available-means">you can also set a mean</a>.)</p>
<p>Let’s start out by creating a Gaussian process for the random constant function \(a(x)\) that models the slope.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">>>></span> <span class="kn">from</span> <span class="nn">stheno</span> <span class="kn">import</span> <span class="n">GP</span>
<span class="o">>>></span> <span class="n">a</span> <span class="o">=</span> <span class="n">GP</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>
<span class="o">>>></span> <span class="n">a</span>
<span class="n">GP</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span>
</code></pre></div></div>
<p>You can see how the Gaussian process looks by simply sampling from it.
To sample from the Gaussian process <code class="language-plaintext highlighter-rouge">a</code> at some inputs <code class="language-plaintext highlighter-rouge">x</code>, evaluate it at those inputs, <code class="language-plaintext highlighter-rouge">a(x)</code>, and call the method <code class="language-plaintext highlighter-rouge">sample</code>: <code class="language-plaintext highlighter-rouge">a(x).sample()</code>.
This shows that you can really think of a Gaussian process just like you think of a function:
pass it some inputs to get (the model for) the corresponding outputs.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">>>></span> <span class="n">x</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">linspace</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">10</span><span class="p">,</span> <span class="mi">100</span><span class="p">)</span>
<span class="o">>>></span> <span class="n">plt</span><span class="p">.</span><span class="n">plot</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">a</span><span class="p">(</span><span class="n">x</span><span class="p">).</span><span class="n">sample</span><span class="p">(</span><span class="mi">20</span><span class="p">));</span> <span class="n">plt</span><span class="p">.</span><span class="n">show</span><span class="p">()</span>
</code></pre></div></div>
<p><img src="/blog/public/images/linear-models-constant-functions.png" alt="Samples of a Gaussian process that models a constant function" />
Figure 1: Samples of a Gaussian process that models a constant function.</p>
<p>We’ve sampled a bunch of constant functions.
Sweet!
The next step in the model \eqref{eq:ax_b_functional} is to multiply the slope function \(a(x)\) by \(x\).
To multiply <code class="language-plaintext highlighter-rouge">a</code> by \(x\), we multiply <code class="language-plaintext highlighter-rouge">a</code> by the function <code class="language-plaintext highlighter-rouge">lambda x: x</code>, which casts also \(x\) as a function:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">>>></span> <span class="n">f</span> <span class="o">=</span> <span class="n">a</span> <span class="o">*</span> <span class="p">(</span><span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="n">x</span><span class="p">)</span>
<span class="o">>>></span> <span class="n">f</span>
<span class="n">GP</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="o"><</span><span class="k">lambda</span><span class="o">></span><span class="p">)</span>
</code></pre></div></div>
<p>This will give rise to functions like \(x \mapsto 0.1x\) and \(x \mapsto -0.4x\), depending on the value that \(a(x)\) takes.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">>>></span> <span class="n">plt</span><span class="p">.</span><span class="n">plot</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">f</span><span class="p">(</span><span class="n">x</span><span class="p">).</span><span class="n">sample</span><span class="p">(</span><span class="mi">20</span><span class="p">));</span> <span class="n">plt</span><span class="p">.</span><span class="n">show</span><span class="p">()</span>
</code></pre></div></div>
<p><img src="/blog/public/images/linear-models-slope-functions.png" alt="Samples of a Gaussian process that models functions with a random slope" />
Figure 2: Samples of a Gaussian process that models functions with a random slope.</p>
<p>This is starting to look good!
The only ingredient that is missing is an offset.
We model the offset just like the slope, but here we set the kernel to <code class="language-plaintext highlighter-rouge">10</code> instead of <code class="language-plaintext highlighter-rouge">1</code>, which models the offset with a value drawn from \(\mathcal{N}(0, 10)\).</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">>>></span> <span class="n">b</span> <span class="o">=</span> <span class="n">GP</span><span class="p">(</span><span class="mi">10</span><span class="p">)</span>
<span class="o">>>></span> <span class="n">f</span> <span class="o">=</span> <span class="n">a</span> <span class="o">*</span> <span class="p">(</span><span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="n">x</span><span class="p">)</span> <span class="o">+</span> <span class="n">b</span>
<span class="nb">AssertionError</span><span class="p">:</span> <span class="n">Processes</span> <span class="n">GP</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="o"><</span><span class="k">lambda</span><span class="o">></span><span class="p">)</span> <span class="ow">and</span> <span class="n">GP</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">10</span> <span class="o">*</span> <span class="mi">1</span><span class="p">)</span> <span class="n">are</span> <span class="n">associated</span> <span class="n">to</span> <span class="n">different</span> <span class="n">measures</span><span class="p">.</span>
</code></pre></div></div>
<p>Something went wrong.
Stheno has an abstraction called <em>measures</em>, where only <code class="language-plaintext highlighter-rouge">GP</code>s that are part of the same measure can be combined into new <code class="language-plaintext highlighter-rouge">GP</code>s;
the abstraction of measures is there to keep things safe and tidy.
What goes wrong here is that <code class="language-plaintext highlighter-rouge">a</code> and <code class="language-plaintext highlighter-rouge">b</code> are not part of the same measure.
Let’s explicitly create a new measure and attach <code class="language-plaintext highlighter-rouge">a</code> and <code class="language-plaintext highlighter-rouge">b</code> to it.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">>>></span> <span class="kn">from</span> <span class="nn">stheno</span> <span class="kn">import</span> <span class="n">Measure</span>
<span class="o">>>></span> <span class="n">prior</span> <span class="o">=</span> <span class="n">Measure</span><span class="p">()</span>
<span class="o">>>></span> <span class="n">a</span> <span class="o">=</span> <span class="n">GP</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="n">measure</span><span class="o">=</span><span class="n">prior</span><span class="p">)</span>
<span class="o">>>></span> <span class="n">b</span> <span class="o">=</span> <span class="n">GP</span><span class="p">(</span><span class="mi">10</span><span class="p">,</span> <span class="n">measure</span><span class="o">=</span><span class="n">prior</span><span class="p">)</span>
<span class="o">>>></span> <span class="n">f</span> <span class="o">=</span> <span class="n">a</span> <span class="o">*</span> <span class="p">(</span><span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="n">x</span><span class="p">)</span> <span class="o">+</span> <span class="n">b</span>
<span class="o">>>></span> <span class="n">f</span>
<span class="n">GP</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="o"><</span><span class="k">lambda</span><span class="o">></span> <span class="o">+</span> <span class="mi">10</span> <span class="o">*</span> <span class="mi">1</span><span class="p">)</span>
</code></pre></div></div>
<p>Let’s see how samples from <code class="language-plaintext highlighter-rouge">f</code> look like.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">>>></span> <span class="n">plt</span><span class="p">.</span><span class="n">plot</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">f</span><span class="p">(</span><span class="n">x</span><span class="p">).</span><span class="n">sample</span><span class="p">(</span><span class="mi">20</span><span class="p">));</span> <span class="n">plt</span><span class="p">.</span><span class="n">show</span><span class="p">()</span>
</code></pre></div></div>
<p><img src="/blog/public/images/linear-models-linear-functions.png" alt="Samples of a Gaussian process that models linear functions" />
Figure 3: Samples of a Gaussian process that models linear functions.</p>
<p>Perfect!
We will use <code class="language-plaintext highlighter-rouge">f</code> as our linear model.</p>
<p>In practice, observations are corrupted with noise.
We can add some noise to the lines in Figure 3 by adding a Gaussian process that models noise.
You can construct such a Gaussian process by using the kernel <code class="language-plaintext highlighter-rouge">Delta()</code>, which models the noise with independent \(\mathcal{N}(0, 1)\) variables.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">>>></span> <span class="kn">from</span> <span class="nn">stheno</span> <span class="kn">import</span> <span class="n">Delta</span>
<span class="o">>>></span> <span class="n">noise</span> <span class="o">=</span> <span class="n">GP</span><span class="p">(</span><span class="n">Delta</span><span class="p">(),</span> <span class="n">measure</span><span class="o">=</span><span class="n">prior</span><span class="p">)</span>
<span class="o">>>></span> <span class="n">y</span> <span class="o">=</span> <span class="n">f</span> <span class="o">+</span> <span class="n">noise</span>
<span class="o">>>></span> <span class="n">y</span>
<span class="n">GP</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="o"><</span><span class="k">lambda</span><span class="o">></span> <span class="o">+</span> <span class="mi">10</span> <span class="o">*</span> <span class="mi">1</span> <span class="o">+</span> <span class="n">Delta</span><span class="p">())</span>
<span class="o">>>></span> <span class="n">plt</span><span class="p">.</span><span class="n">plot</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">(</span><span class="n">x</span><span class="p">).</span><span class="n">sample</span><span class="p">(</span><span class="mi">20</span><span class="p">));</span> <span class="n">plt</span><span class="p">.</span><span class="n">show</span><span class="p">()</span>
</code></pre></div></div>
<p><img src="/blog/public/images/linear-models-noisy-linear-functions.png" alt="Samples of a Gaussian process that models noisy linear functions" />
Figure 4: Samples of a Gaussian process that models noisy linear functions.</p>
<p>That looks more realistic, but perhaps that’s a bit too much noise.
We can tune down the amount of noise, for example, by scaling <code class="language-plaintext highlighter-rouge">noise</code> by <code class="language-plaintext highlighter-rouge">0.5</code>.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">>>></span> <span class="n">y</span> <span class="o">=</span> <span class="n">f</span> <span class="o">+</span> <span class="mf">0.5</span> <span class="o">*</span> <span class="n">noise</span>
<span class="o">>>></span> <span class="n">y</span>
<span class="n">GP</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="o"><</span><span class="k">lambda</span><span class="o">></span> <span class="o">+</span> <span class="mi">10</span> <span class="o">*</span> <span class="mi">1</span> <span class="o">+</span> <span class="mf">0.25</span> <span class="o">*</span> <span class="n">Delta</span><span class="p">())</span>
<span class="o">>>></span> <span class="n">plt</span><span class="p">.</span><span class="n">plot</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">(</span><span class="n">x</span><span class="p">).</span><span class="n">sample</span><span class="p">(</span><span class="mi">20</span><span class="p">));</span> <span class="n">plt</span><span class="p">.</span><span class="n">show</span><span class="p">()</span>
</code></pre></div></div>
<p><img src="/blog/public/images/linear-models-noisy-linear-functions-2.png" alt="Samples of a Gaussian process that models noisy linear functions" />
Figure 5: Samples of a Gaussian process that models noisy linear functions.</p>
<p>Much better.</p>
<p>To summarise, our linear model is given by</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">prior</span> <span class="o">=</span> <span class="n">Measure</span><span class="p">()</span>
<span class="n">a</span> <span class="o">=</span> <span class="n">GP</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="n">measure</span><span class="o">=</span><span class="n">prior</span><span class="p">)</span> <span class="c1"># Model for slope
</span><span class="n">b</span> <span class="o">=</span> <span class="n">GP</span><span class="p">(</span><span class="mi">10</span><span class="p">,</span> <span class="n">measure</span><span class="o">=</span><span class="n">prior</span><span class="p">)</span> <span class="c1"># Model for offset
</span><span class="n">f</span> <span class="o">=</span> <span class="n">a</span> <span class="o">*</span> <span class="p">(</span><span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="n">x</span><span class="p">)</span> <span class="o">+</span> <span class="n">b</span> <span class="c1"># Noiseless linear model
</span>
<span class="n">noise</span> <span class="o">=</span> <span class="n">GP</span><span class="p">(</span><span class="n">Delta</span><span class="p">(),</span> <span class="n">measure</span><span class="o">=</span><span class="n">prior</span><span class="p">)</span> <span class="c1"># Model for noise
</span><span class="n">y</span> <span class="o">=</span> <span class="n">f</span> <span class="o">+</span> <span class="mf">0.5</span> <span class="o">*</span> <span class="n">noise</span> <span class="c1"># Noisy linear model
</span></code></pre></div></div>
<p>We call a program like this a <em>Gaussian process probabilistic program</em> (GPPP).
Let’s generate some noisy synthetic data, <code class="language-plaintext highlighter-rouge">(x_obs, y_obs)</code>, that will make up an example data set \((x_i, y_i)_{i=1}^n\).
We also save the observations without noise added—<code class="language-plaintext highlighter-rouge">f_obs</code>—so we can later check how good our predictions really are.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">>>></span> <span class="n">x_obs</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">linspace</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">10</span><span class="p">,</span> <span class="mi">50_000</span><span class="p">)</span>
<span class="o">>>></span> <span class="n">f_obs</span> <span class="o">=</span> <span class="mf">0.8</span> <span class="o">*</span> <span class="n">x_obs</span> <span class="o">-</span> <span class="mf">2.5</span>
<span class="o">>>></span> <span class="n">y_obs</span> <span class="o">=</span> <span class="n">f_obs</span> <span class="o">+</span> <span class="mf">0.5</span> <span class="o">*</span> <span class="n">np</span><span class="p">.</span><span class="n">random</span><span class="p">.</span><span class="n">randn</span><span class="p">(</span><span class="mi">50_000</span><span class="p">)</span>
<span class="o">>>></span> <span class="n">plt</span><span class="p">.</span><span class="n">scatter</span><span class="p">(</span><span class="n">x_obs</span><span class="p">,</span> <span class="n">y_obs</span><span class="p">);</span> <span class="n">plt</span><span class="p">.</span><span class="n">show</span><span class="p">()</span>
</code></pre></div></div>
<p><img src="/blog/public/images/linear-models-observations.png" alt="Some observations" />
Figure 6: Some observations.</p>
<p>We will see next how we can fit our model to this data.</p>
<h2 id="inference-in-linear-models">Inference in Linear Models</h2>
<p>Suppose that we wish to remove the noise from the observations in Figure 6.
We carefully phrase this problem in terms of our GPPP:
the observations <code class="language-plaintext highlighter-rouge">y_obs</code> are realisations of the <em>noisy</em> linear model <code class="language-plaintext highlighter-rouge">y</code> at <code class="language-plaintext highlighter-rouge">x_obs</code>—realisations of <code class="language-plaintext highlighter-rouge">y(x_obs)</code>—and we wish to make predictions for the <em>noiseless</em> linear model <code class="language-plaintext highlighter-rouge">f</code> at <code class="language-plaintext highlighter-rouge">x_obs</code>—predictions for <code class="language-plaintext highlighter-rouge">f(x_obs)</code>.</p>
<p>In Stheno, we can make predictions based on observations by <em>conditioning</em> the measure of the model on the observations.
In our GPPP, the measure is given by <code class="language-plaintext highlighter-rouge">prior</code>, so we aim to condition <code class="language-plaintext highlighter-rouge">prior</code> on the observations <code class="language-plaintext highlighter-rouge">y_obs</code> for <code class="language-plaintext highlighter-rouge">y(x_obs)</code>.
Mathematically, this process of incorporating information by conditioning happens through <a href="https://en.wikipedia.org/wiki/Bayes%27_theorem">Bayes’ rule</a>.
Programmatically, we first make an <code class="language-plaintext highlighter-rouge">Observations</code> object, which represents the information—the observations—that we want to incorporate, and then condition <code class="language-plaintext highlighter-rouge">prior</code> on this object:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">>>></span> <span class="kn">from</span> <span class="nn">stheno</span> <span class="kn">import</span> <span class="n">Observations</span>
<span class="o">>>></span> <span class="n">obs</span> <span class="o">=</span> <span class="n">Observations</span><span class="p">(</span><span class="n">y</span><span class="p">(</span><span class="n">x_obs</span><span class="p">),</span> <span class="n">y_obs</span><span class="p">)</span>
<span class="o">>>></span> <span class="n">post</span> <span class="o">=</span> <span class="n">prior</span><span class="p">.</span><span class="n">condition</span><span class="p">(</span><span class="n">obs</span><span class="p">)</span>
</code></pre></div></div>
<p>You can also more concisely perform these two steps at once, as follows:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">>>></span> <span class="n">post</span> <span class="o">=</span> <span class="n">prior</span> <span class="o">|</span> <span class="p">(</span><span class="n">y</span><span class="p">(</span><span class="n">x_obs</span><span class="p">),</span> <span class="n">y_obs</span><span class="p">)</span>
</code></pre></div></div>
<p>This mimics the mathematical notation used for conditioning.</p>
<p>With our updated measure <code class="language-plaintext highlighter-rouge">post</code>, which is often called the <em>posterior</em> measure, we can make a prediction for <code class="language-plaintext highlighter-rouge">f(x_obs)</code> by passing <code class="language-plaintext highlighter-rouge">f(x_obs)</code> to <code class="language-plaintext highlighter-rouge">post</code>:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">>>></span> <span class="n">pred</span> <span class="o">=</span> <span class="n">post</span><span class="p">(</span><span class="n">f</span><span class="p">(</span><span class="n">x_obs</span><span class="p">))</span>
<span class="o">>>></span> <span class="n">pred</span><span class="p">.</span><span class="n">mean</span>
<span class="o"><</span><span class="n">dense</span> <span class="n">matrix</span><span class="p">:</span> <span class="n">shape</span><span class="o">=</span><span class="mi">50000</span><span class="n">x1</span><span class="p">,</span> <span class="n">dtype</span><span class="o">=</span><span class="n">float64</span>
<span class="n">mat</span><span class="o">=</span><span class="p">[[</span><span class="o">-</span><span class="mf">2.498</span><span class="p">]</span>
<span class="p">[</span><span class="o">-</span><span class="mf">2.498</span><span class="p">]</span>
<span class="p">[</span><span class="o">-</span><span class="mf">2.498</span><span class="p">]</span>
<span class="p">...</span>
<span class="p">[</span> <span class="mf">5.501</span><span class="p">]</span>
<span class="p">[</span> <span class="mf">5.502</span><span class="p">]</span>
<span class="p">[</span> <span class="mf">5.502</span><span class="p">]]</span><span class="o">></span>
<span class="o">>>></span> <span class="n">pred</span><span class="p">.</span><span class="n">var</span>
<span class="o"><</span><span class="n">low</span><span class="o">-</span><span class="n">rank</span> <span class="n">matrix</span><span class="p">:</span> <span class="n">shape</span><span class="o">=</span><span class="mi">50000</span><span class="n">x50000</span><span class="p">,</span> <span class="n">dtype</span><span class="o">=</span><span class="n">float64</span><span class="p">,</span> <span class="n">rank</span><span class="o">=</span><span class="mi">2</span>
<span class="n">left</span><span class="o">=</span><span class="p">[[</span><span class="mf">1.e+00</span> <span class="mf">0.e+00</span><span class="p">]</span>
<span class="p">[</span><span class="mf">1.e+00</span> <span class="mf">2.e-04</span><span class="p">]</span>
<span class="p">[</span><span class="mf">1.e+00</span> <span class="mf">4.e-04</span><span class="p">]</span>
<span class="p">...</span>
<span class="p">[</span><span class="mf">1.e+00</span> <span class="mf">1.e+01</span><span class="p">]</span>
<span class="p">[</span><span class="mf">1.e+00</span> <span class="mf">1.e+01</span><span class="p">]</span>
<span class="p">[</span><span class="mf">1.e+00</span> <span class="mf">1.e+01</span><span class="p">]]</span>
<span class="n">middle</span><span class="o">=</span><span class="p">[[</span> <span class="mf">2.001e-05</span> <span class="o">-</span><span class="mf">2.995e-06</span><span class="p">]</span>
<span class="p">[</span><span class="o">-</span><span class="mf">2.997e-06</span> <span class="mf">6.011e-07</span><span class="p">]]</span>
<span class="n">right</span><span class="o">=</span><span class="p">[[</span><span class="mf">1.e+00</span> <span class="mf">0.e+00</span><span class="p">]</span>
<span class="p">[</span><span class="mf">1.e+00</span> <span class="mf">2.e-04</span><span class="p">]</span>
<span class="p">[</span><span class="mf">1.e+00</span> <span class="mf">4.e-04</span><span class="p">]</span>
<span class="p">...</span>
<span class="p">[</span><span class="mf">1.e+00</span> <span class="mf">1.e+01</span><span class="p">]</span>
<span class="p">[</span><span class="mf">1.e+00</span> <span class="mf">1.e+01</span><span class="p">]</span>
<span class="p">[</span><span class="mf">1.e+00</span> <span class="mf">1.e+01</span><span class="p">]]</span><span class="o">></span>
</code></pre></div></div>
<p>The prediction <code class="language-plaintext highlighter-rouge">pred</code> is a <a href="https://en.wikipedia.org/wiki/Multivariate_Gaussian_distribution">multivariate Gaussian distribution</a> with a particular mean and variance, which are displayed above.
You should view <code class="language-plaintext highlighter-rouge">post</code> as a function that assigns a probability distribution—the prediction—to every part of our GPPP, like <code class="language-plaintext highlighter-rouge">f(x_obs)</code>.
Note that the variance of the prediction is a <em>massive</em> matrix of size 50k \(\times\) 50k.
Under the hood, Stheno uses <a href="https://github.com/wesselb/matrix">structured representations for matrices</a> to compute and store matrices in an efficient way.</p>
<p>Let’s see how the prediction <code class="language-plaintext highlighter-rouge">pred</code> for <code class="language-plaintext highlighter-rouge">f(x_obs)</code> looks like.
The prediction <code class="language-plaintext highlighter-rouge">pred</code> exposes the method <code class="language-plaintext highlighter-rouge">marginals</code> that conveniently computes the mean and associated lower and upper error bounds for you.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">>>></span> <span class="n">mean</span><span class="p">,</span> <span class="n">error_bound_lower</span><span class="p">,</span> <span class="n">error_bound_upper</span> <span class="o">=</span> <span class="n">pred</span><span class="p">.</span><span class="n">marginals</span><span class="p">()</span>
<span class="o">>>></span> <span class="n">mean</span>
<span class="n">array</span><span class="p">([</span><span class="o">-</span><span class="mf">2.49818708</span><span class="p">,</span> <span class="o">-</span><span class="mf">2.49802708</span><span class="p">,</span> <span class="o">-</span><span class="mf">2.49786708</span><span class="p">,</span> <span class="p">...,</span> <span class="mf">5.50148996</span><span class="p">,</span>
<span class="mf">5.50164996</span><span class="p">,</span> <span class="mf">5.50180997</span><span class="p">])</span>
<span class="o">>>></span> <span class="n">error_bound_upper</span> <span class="o">-</span> <span class="n">error_bound_lower</span>
<span class="n">array</span><span class="p">([</span><span class="mf">0.01753381</span><span class="p">,</span> <span class="mf">0.01753329</span><span class="p">,</span> <span class="mf">0.01753276</span><span class="p">,</span> <span class="p">...,</span> <span class="mf">0.01761883</span><span class="p">,</span> <span class="mf">0.01761935</span><span class="p">,</span>
<span class="mf">0.01761988</span><span class="p">])</span>
</code></pre></div></div>
<p>The error is very small—on the order of \(10^{-2}\)—which means that Stheno predicted <code class="language-plaintext highlighter-rouge">f(x_obs)</code> with high confidence.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">>>></span> <span class="n">plt</span><span class="p">.</span><span class="n">scatter</span><span class="p">(</span><span class="n">x_obs</span><span class="p">,</span> <span class="n">y_obs</span><span class="p">);</span> <span class="n">plt</span><span class="p">.</span><span class="n">plot</span><span class="p">(</span><span class="n">x_obs</span><span class="p">,</span> <span class="n">mean</span><span class="p">);</span> <span class="n">plt</span><span class="p">.</span><span class="n">show</span><span class="p">()</span>
</code></pre></div></div>
<p><img src="/blog/public/images/linear-models-denoised-observations.png" alt="Mean of the prediction (blue line) for the denoised observations" />
Figure 7: Mean of the prediction (blue line) for the denoised observations.</p>
<p>The blue line in Figure 7 shows the mean of the predictions.
This line appears to nicely pass through the observations with the noise removed.
But let’s see how good the predictions really are by comparing to <code class="language-plaintext highlighter-rouge">f_obs</code>, which we previously saved.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">>>></span> <span class="n">f_obs</span> <span class="o">-</span> <span class="n">mean</span>
<span class="n">array</span><span class="p">([</span><span class="o">-</span><span class="mf">0.00181292</span><span class="p">,</span> <span class="o">-</span><span class="mf">0.00181292</span><span class="p">,</span> <span class="o">-</span><span class="mf">0.00181292</span><span class="p">,</span> <span class="p">...,</span> <span class="o">-</span><span class="mf">0.00180997</span><span class="p">,</span>
<span class="o">-</span><span class="mf">0.00180997</span><span class="p">,</span> <span class="o">-</span><span class="mf">0.00180997</span><span class="p">])</span>
<span class="o">>>></span> <span class="n">np</span><span class="p">.</span><span class="n">mean</span><span class="p">((</span><span class="n">f_obs</span> <span class="o">-</span> <span class="n">mean</span><span class="p">)</span> <span class="o">**</span> <span class="mi">2</span><span class="p">)</span> <span class="c1"># Compute the mean square error.
</span><span class="mf">3.281323087544209e-06</span>
</code></pre></div></div>
<p>That’s pretty close!
Not bad at all.</p>
<p>We wrap up this section by encapsulating everything that we’ve done so far in a function <code class="language-plaintext highlighter-rouge">linear_model_denoise</code>, which denoises noisy observations from a linear model:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">linear_model_denoise</span><span class="p">(</span><span class="n">x_obs</span><span class="p">,</span> <span class="n">y_obs</span><span class="p">):</span>
<span class="n">prior</span> <span class="o">=</span> <span class="n">Measure</span><span class="p">()</span>
<span class="n">a</span> <span class="o">=</span> <span class="n">GP</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="n">measure</span><span class="o">=</span><span class="n">prior</span><span class="p">)</span> <span class="c1"># Model for slope
</span> <span class="n">b</span> <span class="o">=</span> <span class="n">GP</span><span class="p">(</span><span class="mi">10</span><span class="p">,</span> <span class="n">measure</span><span class="o">=</span><span class="n">prior</span><span class="p">)</span> <span class="c1"># Model for offset
</span> <span class="n">f</span> <span class="o">=</span> <span class="n">a</span> <span class="o">*</span> <span class="p">(</span><span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="n">x</span><span class="p">)</span> <span class="o">+</span> <span class="n">b</span> <span class="c1"># Noiseless linear model
</span> <span class="n">noise</span> <span class="o">=</span> <span class="n">GP</span><span class="p">(</span><span class="n">Delta</span><span class="p">(),</span> <span class="n">measure</span><span class="o">=</span><span class="n">prior</span><span class="p">)</span> <span class="c1"># Model for noise
</span> <span class="n">y</span> <span class="o">=</span> <span class="n">f</span> <span class="o">+</span> <span class="mf">0.5</span> <span class="o">*</span> <span class="n">noise</span> <span class="c1"># Noisy linear model
</span>
<span class="n">post</span> <span class="o">=</span> <span class="n">prior</span> <span class="o">|</span> <span class="p">(</span><span class="n">y</span><span class="p">(</span><span class="n">x_obs</span><span class="p">),</span> <span class="n">y_obs</span><span class="p">)</span> <span class="c1"># Condition on observations.
</span> <span class="n">pred</span> <span class="o">=</span> <span class="n">post</span><span class="p">(</span><span class="n">f</span><span class="p">(</span><span class="n">x_obs</span><span class="p">))</span> <span class="c1"># Make predictions.
</span> <span class="k">return</span> <span class="n">pred</span><span class="p">.</span><span class="n">marginals</span><span class="p">()</span> <span class="c1"># Return the mean and associated error bounds.
</span></code></pre></div></div>
<p></p>
<p><!-- Prevent tabs. --></p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">>>></span> <span class="n">linear_model_denoise</span><span class="p">(</span><span class="n">x_obs</span><span class="p">,</span> <span class="n">y_obs</span><span class="p">)</span>
<span class="p">(</span><span class="n">array</span><span class="p">([</span><span class="o">-</span><span class="mf">2.49818708</span><span class="p">,</span> <span class="o">-</span><span class="mf">2.49802708</span><span class="p">,</span> <span class="o">-</span><span class="mf">2.49786708</span><span class="p">,</span> <span class="p">...,</span> <span class="mf">5.50148996</span><span class="p">,</span>
<span class="mf">5.50164996</span><span class="p">,</span> <span class="mf">5.50180997</span><span class="p">]),</span> <span class="n">array</span><span class="p">([</span><span class="o">-</span><span class="mf">2.50695399</span><span class="p">,</span> <span class="o">-</span><span class="mf">2.50679372</span><span class="p">,</span> <span class="o">-</span><span class="mf">2.50663346</span><span class="p">,</span> <span class="p">...,</span> <span class="mf">5.49268055</span><span class="p">,</span>
<span class="mf">5.49284029</span><span class="p">,</span> <span class="mf">5.49300003</span><span class="p">]),</span> <span class="n">array</span><span class="p">([</span><span class="o">-</span><span class="mf">2.48942018</span><span class="p">,</span> <span class="o">-</span><span class="mf">2.48926044</span><span class="p">,</span> <span class="o">-</span><span class="mf">2.4891007</span> <span class="p">,</span> <span class="p">...,</span> <span class="mf">5.51029937</span><span class="p">,</span>
<span class="mf">5.51045964</span><span class="p">,</span> <span class="mf">5.51061991</span><span class="p">]))</span>
<span class="o">>>></span> <span class="o">%</span><span class="n">timeit</span> <span class="n">linear_model_denoise</span><span class="p">(</span><span class="n">x_obs</span><span class="p">,</span> <span class="n">y_obs</span><span class="p">)</span>
<span class="mi">233</span> <span class="n">ms</span> <span class="err">±</span> <span class="mf">12.6</span> <span class="n">ms</span> <span class="n">per</span> <span class="n">loop</span> <span class="p">(</span><span class="n">mean</span> <span class="err">±</span> <span class="n">std</span><span class="p">.</span> <span class="n">dev</span><span class="p">.</span> <span class="n">of</span> <span class="mi">7</span> <span class="n">runs</span><span class="p">,</span> <span class="mi">1</span> <span class="n">loop</span> <span class="n">each</span><span class="p">)</span>
</code></pre></div></div>
<p>To denoise 50k observations, <code class="language-plaintext highlighter-rouge">linear_model_denoise</code> takes about 250 ms.
Not terrible, but we can do much better, which is important if we want to scale to larger numbers of observations.
In the next section, we will make this function really fast.</p>
<h2 id="making-inference-fast">Making Inference Fast</h2>
<p>To make <code class="language-plaintext highlighter-rouge">linear_model_denoise</code> fast, firstly, the linear algebra that happens under the hood when <code class="language-plaintext highlighter-rouge">linear_model_denoise</code> is called should be simplified as much as possible.
Fortunately, this happens automatically, due to <a href="https://github.com/wesselb/matrix">the structured representation of matrices</a> that Stheno uses.
For example, when making predictions with Gaussian processes, the main computational bottleneck is usually the construction and inversion of <code class="language-plaintext highlighter-rouge">y(x_obs).var</code>, the variance associated with the observations:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">>>></span> <span class="n">y</span><span class="p">(</span><span class="n">x_obs</span><span class="p">).</span><span class="n">var</span>
<span class="o"><</span><span class="n">Woodbury</span> <span class="n">matrix</span><span class="p">:</span> <span class="n">shape</span><span class="o">=</span><span class="mi">50000</span><span class="n">x50000</span><span class="p">,</span> <span class="n">dtype</span><span class="o">=</span><span class="n">float64</span>
<span class="n">diag</span><span class="o">=<</span><span class="n">diagonal</span> <span class="n">matrix</span><span class="p">:</span> <span class="n">shape</span><span class="o">=</span><span class="mi">50000</span><span class="n">x50000</span><span class="p">,</span> <span class="n">dtype</span><span class="o">=</span><span class="n">float64</span>
<span class="n">diag</span><span class="o">=</span><span class="p">[</span><span class="mf">0.25</span> <span class="mf">0.25</span> <span class="mf">0.25</span> <span class="p">...</span> <span class="mf">0.25</span> <span class="mf">0.25</span> <span class="mf">0.25</span><span class="p">]</span><span class="o">></span>
<span class="n">lr</span><span class="o">=<</span><span class="n">low</span><span class="o">-</span><span class="n">rank</span> <span class="n">matrix</span><span class="p">:</span> <span class="n">shape</span><span class="o">=</span><span class="mi">50000</span><span class="n">x50000</span><span class="p">,</span> <span class="n">dtype</span><span class="o">=</span><span class="n">float64</span><span class="p">,</span> <span class="n">rank</span><span class="o">=</span><span class="mi">2</span>
<span class="n">left</span><span class="o">=</span><span class="p">[[</span><span class="mf">1.e+00</span> <span class="mf">0.e+00</span><span class="p">]</span>
<span class="p">[</span><span class="mf">1.e+00</span> <span class="mf">2.e-04</span><span class="p">]</span>
<span class="p">[</span><span class="mf">1.e+00</span> <span class="mf">4.e-04</span><span class="p">]</span>
<span class="p">...</span>
<span class="p">[</span><span class="mf">1.e+00</span> <span class="mf">1.e+01</span><span class="p">]</span>
<span class="p">[</span><span class="mf">1.e+00</span> <span class="mf">1.e+01</span><span class="p">]</span>
<span class="p">[</span><span class="mf">1.e+00</span> <span class="mf">1.e+01</span><span class="p">]]</span>
<span class="n">middle</span><span class="o">=</span><span class="p">[[</span><span class="mf">10.</span> <span class="mf">0.</span><span class="p">]</span>
<span class="p">[</span> <span class="mf">0.</span> <span class="mf">1.</span><span class="p">]]</span>
<span class="n">right</span><span class="o">=</span><span class="p">[[</span><span class="mf">1.e+00</span> <span class="mf">0.e+00</span><span class="p">]</span>
<span class="p">[</span><span class="mf">1.e+00</span> <span class="mf">2.e-04</span><span class="p">]</span>
<span class="p">[</span><span class="mf">1.e+00</span> <span class="mf">4.e-04</span><span class="p">]</span>
<span class="p">...</span>
<span class="p">[</span><span class="mf">1.e+00</span> <span class="mf">1.e+01</span><span class="p">]</span>
<span class="p">[</span><span class="mf">1.e+00</span> <span class="mf">1.e+01</span><span class="p">]</span>
<span class="p">[</span><span class="mf">1.e+00</span> <span class="mf">1.e+01</span><span class="p">]]</span><span class="o">>></span>
</code></pre></div></div>
<p>Indeed observe that this matrix has particular structure:
it is a sum of a diagonal and a low-rank matrix.
In Stheno, the sum of a diagonal and a low-rank matrix is called a <em>Woodbury</em> matrix, because the <a href="https://en.wikipedia.org/wiki/Woodbury_matrix_identity">Sherman–Morrison–Woodbury formula</a> can be used to efficiently invert it.
Let’s see how long it takes to construct <code class="language-plaintext highlighter-rouge">y(x_obs).var</code> and then invert it.
We invert <code class="language-plaintext highlighter-rouge">y(x_obs).var</code> using <a href="https://github.com/wesselb/lab">LAB</a>, which is automatically installed alongside Stheno and exposes the API to efficiently work with structured matrices.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">>>></span> <span class="kn">import</span> <span class="nn">lab</span> <span class="k">as</span> <span class="n">B</span>
<span class="o">>>></span> <span class="o">%</span><span class="n">timeit</span> <span class="n">B</span><span class="p">.</span><span class="n">inv</span><span class="p">(</span><span class="n">y</span><span class="p">(</span><span class="n">x_obs</span><span class="p">).</span><span class="n">var</span><span class="p">)</span>
<span class="mf">28.5</span> <span class="n">ms</span> <span class="err">±</span> <span class="mf">1.69</span> <span class="n">ms</span> <span class="n">per</span> <span class="n">loop</span> <span class="p">(</span><span class="n">mean</span> <span class="err">±</span> <span class="n">std</span><span class="p">.</span> <span class="n">dev</span><span class="p">.</span> <span class="n">of</span> <span class="mi">7</span> <span class="n">runs</span><span class="p">,</span> <span class="mi">10</span> <span class="n">loops</span> <span class="n">each</span><span class="p">)</span>
</code></pre></div></div>
<p>That’s only 30 ms! Not bad, for such a big matrix. Without exploiting structure, a 50k \(\times\) 50k matrix takes 20 GB of memory to store and about an hour to invert.</p>
<p>Secondly, we would like the code implemented by <code class="language-plaintext highlighter-rouge">linear_model_denoise</code> to be as efficient as possible.
To achieve this, we will use <a href="https://github.com/google/jax">JAX</a> to compile <code class="language-plaintext highlighter-rouge">linear_model_denoise</code> with <a href="https://www.tensorflow.org/xla">XLA</a>, which generates blazingly fast code.
We start out by importing JAX and loading the JAX extension of Stheno.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">>>></span> <span class="kn">import</span> <span class="nn">jax</span>
<span class="o">>>></span> <span class="kn">import</span> <span class="nn">jax.numpy</span> <span class="k">as</span> <span class="n">jnp</span>
<span class="o">>>></span> <span class="kn">import</span> <span class="nn">stheno.jax</span> <span class="c1"># JAX extension for Stheno
</span></code></pre></div></div>
<p>We use JAX’s just-in-time (JIT) compiler <code class="language-plaintext highlighter-rouge">jax.jit</code> to compile <code class="language-plaintext highlighter-rouge">linear_model_denoise</code>:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">>>></span> <span class="n">linear_model_denoise_jitted</span> <span class="o">=</span> <span class="n">jax</span><span class="p">.</span><span class="n">jit</span><span class="p">(</span><span class="n">linear_model_denoise</span><span class="p">)</span>
</code></pre></div></div>
<p>Let’s see what happens when we run <code class="language-plaintext highlighter-rouge">linear_model_denoise_jitted</code>.
We must pass <code class="language-plaintext highlighter-rouge">x_obs</code> and <code class="language-plaintext highlighter-rouge">y_obs</code> as JAX arrays to use the compiled version.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">>>></span> <span class="n">linear_model_denoise_jitted</span><span class="p">(</span><span class="n">jnp</span><span class="p">.</span><span class="n">array</span><span class="p">(</span><span class="n">x_obs</span><span class="p">),</span> <span class="n">jnp</span><span class="p">.</span><span class="n">array</span><span class="p">(</span><span class="n">y_obs</span><span class="p">))</span>
<span class="n">Invalid</span> <span class="n">argument</span><span class="p">:</span> <span class="n">Cannot</span> <span class="n">bitcast</span> <span class="n">types</span> <span class="k">with</span> <span class="n">different</span> <span class="n">bit</span><span class="o">-</span><span class="n">widths</span><span class="p">:</span> <span class="n">F64</span> <span class="o">=></span> <span class="n">S32</span><span class="p">.</span>
</code></pre></div></div>
<p>Oh no!
What went wrong is that the JIT compiler wasn’t able to deal with the complicated control flow from the automatic linear algebra simplifications.
Fortunately, there is a simple way around this:
we can run the function once with NumPy to see how the control flow should go, <em>cache that control flow</em>, and then use this cache to run <code class="language-plaintext highlighter-rouge">linear_model_denoise</code> with JAX.
Sounds complicated, but it’s really just a bit of boilerplate:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">>>></span> <span class="kn">import</span> <span class="nn">lab</span> <span class="k">as</span> <span class="n">B</span>
<span class="o">>>></span> <span class="n">control_flow_cache</span> <span class="o">=</span> <span class="n">B</span><span class="p">.</span><span class="n">ControlFlowCache</span><span class="p">()</span>
<span class="o">>>></span> <span class="n">control_flow_cache</span>
<span class="o"><</span><span class="n">ControlFlowCache</span><span class="p">:</span> <span class="n">populated</span><span class="o">=</span><span class="bp">False</span><span class="o">></span>
</code></pre></div></div>
<p>Here <code class="language-plaintext highlighter-rouge">populated=False</code> means that the cache is not yet populated.
Let’s populate it by running <code class="language-plaintext highlighter-rouge">linear_model_denoise</code> once with NumPy:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">>>></span> <span class="k">with</span> <span class="n">control_flow_cache</span><span class="p">:</span>
<span class="n">linear_model_denoise</span><span class="p">(</span><span class="n">x_obs</span><span class="p">,</span> <span class="n">y_obs</span><span class="p">)</span>
<span class="o">>>></span> <span class="n">control_flow_cache</span>
<span class="o"><</span><span class="n">ControlFlowCache</span><span class="p">:</span> <span class="n">populated</span><span class="o">=</span><span class="bp">True</span><span class="o">></span>
</code></pre></div></div>
<p>We now construct a compiled version of <code class="language-plaintext highlighter-rouge">linear_model_denoise</code> that uses the control flow cache:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">@</span><span class="n">jax</span><span class="p">.</span><span class="n">jit</span>
<span class="k">def</span> <span class="nf">linear_model_denoise_jitted</span><span class="p">(</span><span class="n">x_obs</span><span class="p">,</span> <span class="n">y_obs</span><span class="p">):</span>
<span class="k">with</span> <span class="n">control_flow_cache</span><span class="p">:</span>
<span class="k">return</span> <span class="n">linear_model_denoise</span><span class="p">(</span><span class="n">x_obs</span><span class="p">,</span> <span class="n">y_obs</span><span class="p">)</span>
</code></pre></div></div>
<p></p>
<p><!-- Prevent tabs. --></p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">>>></span> <span class="n">linear_model_denoise_jitted</span><span class="p">(</span><span class="n">jnp</span><span class="p">.</span><span class="n">array</span><span class="p">(</span><span class="n">x_obs</span><span class="p">),</span> <span class="n">jnp</span><span class="p">.</span><span class="n">array</span><span class="p">(</span><span class="n">y_obs</span><span class="p">))</span>
<span class="p">(</span><span class="n">DeviceArray</span><span class="p">([</span><span class="o">-</span><span class="mf">2.4981871</span> <span class="p">,</span> <span class="o">-</span><span class="mf">2.4980271</span> <span class="p">,</span> <span class="o">-</span><span class="mf">2.49786709</span><span class="p">,</span> <span class="p">...,</span> <span class="mf">5.50149004</span><span class="p">,</span>
<span class="mf">5.50165005</span><span class="p">,</span> <span class="mf">5.50181005</span><span class="p">],</span> <span class="n">dtype</span><span class="o">=</span><span class="n">float64</span><span class="p">),</span> <span class="n">DeviceArray</span><span class="p">([</span><span class="o">-</span><span class="mf">2.5069514</span> <span class="p">,</span> <span class="o">-</span><span class="mf">2.50679114</span><span class="p">,</span> <span class="o">-</span><span class="mf">2.50663087</span><span class="p">,</span> <span class="p">...,</span> <span class="mf">5.4927699</span> <span class="p">,</span>
<span class="mf">5.49292964</span><span class="p">,</span> <span class="mf">5.49308938</span><span class="p">],</span> <span class="n">dtype</span><span class="o">=</span><span class="n">float64</span><span class="p">),</span> <span class="n">DeviceArray</span><span class="p">([</span><span class="o">-</span><span class="mf">2.4894228</span> <span class="p">,</span> <span class="o">-</span><span class="mf">2.48926306</span><span class="p">,</span> <span class="o">-</span><span class="mf">2.48910332</span><span class="p">,</span> <span class="p">...,</span> <span class="mf">5.51021019</span><span class="p">,</span>
<span class="mf">5.51037046</span><span class="p">,</span> <span class="mf">5.51053072</span><span class="p">],</span> <span class="n">dtype</span><span class="o">=</span><span class="n">float64</span><span class="p">))</span>
</code></pre></div></div>
<p>Nice!
Let’s see how much faster <code class="language-plaintext highlighter-rouge">linear_model_denoise_jitted</code> is:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">>>></span> <span class="o">%</span><span class="n">timeit</span> <span class="n">linear_model_denoise</span><span class="p">(</span><span class="n">x_obs</span><span class="p">,</span> <span class="n">y_obs</span><span class="p">)</span>
<span class="mi">233</span> <span class="n">ms</span> <span class="err">±</span> <span class="mf">12.6</span> <span class="n">ms</span> <span class="n">per</span> <span class="n">loop</span> <span class="p">(</span><span class="n">mean</span> <span class="err">±</span> <span class="n">std</span><span class="p">.</span> <span class="n">dev</span><span class="p">.</span> <span class="n">of</span> <span class="mi">7</span> <span class="n">runs</span><span class="p">,</span> <span class="mi">1</span> <span class="n">loop</span> <span class="n">each</span><span class="p">)</span>
<span class="o">>>></span> <span class="o">%</span><span class="n">timeit</span> <span class="n">linear_model_denoise_jitted</span><span class="p">(</span><span class="n">jnp</span><span class="p">.</span><span class="n">array</span><span class="p">(</span><span class="n">x_obs</span><span class="p">),</span> <span class="n">jnp</span><span class="p">.</span><span class="n">array</span><span class="p">(</span><span class="n">y_obs</span><span class="p">))</span>
<span class="mf">1.63</span> <span class="n">ms</span> <span class="err">±</span> <span class="mf">16.5</span> <span class="n">µs</span> <span class="n">per</span> <span class="n">loop</span> <span class="p">(</span><span class="n">mean</span> <span class="err">±</span> <span class="n">std</span><span class="p">.</span> <span class="n">dev</span><span class="p">.</span> <span class="n">of</span> <span class="mi">7</span> <span class="n">runs</span><span class="p">,</span> <span class="mi">1000</span> <span class="n">loops</span> <span class="n">each</span><span class="p">)</span>
</code></pre></div></div>
<p>The compiled function <code class="language-plaintext highlighter-rouge">linear_model_denoise_jitted</code> only takes 2 ms to denoise 50k observations!
Compared to <code class="language-plaintext highlighter-rouge">linear_model_denoise</code>, that’s a speed-up of two orders of magnitude.</p>
<h2 id="conclusion">Conclusion</h2>
<p>We’ve seen how a linear model can be implemented with a Gaussian process probabilistic program (GPPP) using <a href="https://github.com/wesselb/stheno">Stheno</a>.
Stheno allows us to focus on model construction, and takes away the distraction of the technicalities that come with making predictions.
This flexibility, however, comes at the cost of some complicated machinery that happens in the background, such as structured representations of matrices.
Fortunately, we’ve seen that this overhead can be completely avoided by compiling your program using <a href="https://github.com/google/jax">JAX</a>, which can result in extremely efficient implementations.
To close this post and to warm you up for <a href="https://github.com/wesselb/stheno#examples">what’s further possible with Gaussian process probabilistic programming using Stheno</a>, the linear model that we’ve built can easily be extended to, for example, include a <em>quadratic</em> term:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">quadratic_model_denoise</span><span class="p">(</span><span class="n">x_obs</span><span class="p">,</span> <span class="n">y_obs</span><span class="p">):</span>
<span class="n">prior</span> <span class="o">=</span> <span class="n">Measure</span><span class="p">()</span>
<span class="n">a</span> <span class="o">=</span> <span class="n">GP</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="n">measure</span><span class="o">=</span><span class="n">prior</span><span class="p">)</span> <span class="c1"># Model for slope
</span> <span class="n">b</span> <span class="o">=</span> <span class="n">GP</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="n">measure</span><span class="o">=</span><span class="n">prior</span><span class="p">)</span> <span class="c1"># Model for coefficient of quadratic term
</span> <span class="n">c</span> <span class="o">=</span> <span class="n">GP</span><span class="p">(</span><span class="mi">10</span><span class="p">,</span> <span class="n">measure</span><span class="o">=</span><span class="n">prior</span><span class="p">)</span> <span class="c1"># Model for offset
</span> <span class="c1"># Noiseless quadratic model
</span> <span class="n">f</span> <span class="o">=</span> <span class="n">a</span> <span class="o">*</span> <span class="p">(</span><span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="n">x</span><span class="p">)</span> <span class="o">+</span> <span class="n">b</span> <span class="o">*</span> <span class="p">(</span><span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="n">x</span> <span class="o">**</span> <span class="mi">2</span><span class="p">)</span> <span class="o">+</span> <span class="n">c</span>
<span class="n">noise</span> <span class="o">=</span> <span class="n">GP</span><span class="p">(</span><span class="n">Delta</span><span class="p">(),</span> <span class="n">measure</span><span class="o">=</span><span class="n">prior</span><span class="p">)</span> <span class="c1"># Model for noise
</span> <span class="n">y</span> <span class="o">=</span> <span class="n">f</span> <span class="o">+</span> <span class="mf">0.5</span> <span class="o">*</span> <span class="n">noise</span> <span class="c1"># Noisy quadratic model
</span>
<span class="n">post</span> <span class="o">=</span> <span class="n">prior</span> <span class="o">|</span> <span class="p">(</span><span class="n">y</span><span class="p">(</span><span class="n">x_obs</span><span class="p">),</span> <span class="n">y_obs</span><span class="p">)</span> <span class="c1"># Condition on observations.
</span> <span class="n">pred</span> <span class="o">=</span> <span class="n">post</span><span class="p">(</span><span class="n">f</span><span class="p">(</span><span class="n">x_obs</span><span class="p">))</span> <span class="c1"># Make predictions.
</span> <span class="k">return</span> <span class="n">pred</span><span class="p">.</span><span class="n">marginals</span><span class="p">()</span> <span class="c1"># Return the mean and associated error bounds.
</span></code></pre></div></div>
<p>To use Gaussian process probabilistic programming for your specific problem, the main challenge is to figure out which model you need to use.
Do you need a quadratic term?
Maybe you need an exponential term!
But, using Stheno, implementing the model and making predictions should then be simple.</p>Wessel Bruinsma and James RequeimaCross-posted at wesselb.github.io.A Gentle Introduction to Power Flow2020-12-04T00:00:00+00:002020-12-04T00:00:00+00:00https://invenia.github.io/blog/2020/12/04/pf-intro<p>Although governed by simple physical laws, power grids are among the most complex human-made systems.
The main source of the complexity is the large number of components of the power systems that interact with each other: one needs to maintain a balance between power injections and withdrawals while satisfying certain physical, economic, and environmental conditions.
For instance, a central task of daily planning and operations of electricity grid operators<sup id="fnref:Tong04" role="doc-noteref"><a href="#fn:Tong04" class="footnote" rel="footnote">1</a></sup> is to dispatch generation in order to meet demand at minimum cost, while respecting reliability and security constraints.
These tasks require solving a challenging constrained optimization problem, often referred to as some form of optimal power flow (OPF).<sup id="fnref:Cain12" role="doc-noteref"><a href="#fn:Cain12" class="footnote" rel="footnote">2</a></sup></p>
<p>In a series of two blog posts, we are going to discuss the basics of power flow and optimal power flow problems.
In this first post, we focus on the most important component of OPF: the power flow (PF) equations.
For this, first we introduce some basic definitions of power grids and AC circuits, then we define the power flow problem.
<br />
<br /></p>
<p><img src="/blog/public/images/us_power_grid.jpg" alt="us_power_grid" />
Figure 1. Complexity of electricity grids: the electric power transmission grid of the United States (source: FEMA and <a href="https://en.wikipedia.org/wiki/North_American_power_transmission_grid">Wikipedia</a>).</p>
<p><br /></p>
<h2 id="power-grids-as-graphs">Power grids as graphs</h2>
<p>Power grids are networks that include two main components: buses that represent important locations of the grid (e.g. generation points, load points, substations) and transmission (or distribution) lines that connect these buses.
It is pretty straightforward, therefore, to look at power grid networks as graphs: buses and transmission lines can be represented by nodes and edges of a corresponding graph.
There are two equivalent graph models that can be used to derive the basic power flow equations<sup id="fnref:Low14" role="doc-noteref"><a href="#fn:Low14" class="footnote" rel="footnote">3</a></sup>:</p>
<ul>
<li>directed graph representation (left panel of Figure 2): \(\mathbb{G}_{D}(\mathcal{N}, \mathcal{E})\);</li>
<li>undirected graph representation (right panel of Figure 2): \(\mathbb{G}_{U}(\mathcal{N}, \mathcal{E} \cup \mathcal{E}^{R})\),</li>
</ul>
<p>where \(\mathcal{N}\), \(\mathcal{E} \subseteq \mathcal{N} \times \mathcal{N}\) and \(\mathcal{E}^{R} \subseteq \mathcal{N} \times \mathcal{N}\) denote the set of nodes (buses), and the forward and reverse orientations of directed edges (branches) of the graph, respectively.
<br />
<br /></p>
<p><img src="/blog/public/images/power_grid_graphs.png" alt="power_grid_graphs" />
Figure 2. Directed graph representation of synthetic grid 14-ieee (left) and undirected graph representation of synthetic grid 30-ieee (right). Red and blue circles denote generator and load buses, respectively.</p>
<p><br /></p>
<h2 id="complex-power-in-ac-circuits">Complex power in AC circuits</h2>
<p>Power can be transmitted more efficiently at high voltages as high <a href="https://en.wikipedia.org/wiki/Voltage">voltage</a> (or equivalently, low <a href="https://en.wikipedia.org/wiki/Electric_current">current</a>) reduces the loss of power due to its dissipation on transmission lines.
Power grids generally use <a href="https://en.wikipedia.org/wiki/Alternating_current">alternating current</a> (AC) since the AC voltage can be altered (from high to low) easily via transformers.
Therefore, we start with some notation and definitions for AC circuits.</p>
<p>The most important characteristics of AC circuits is that, unlike in <a href="https://en.wikipedia.org/wiki/Direct_current">direct current</a> (DC) circuits, the currents and voltages are not constant in time: both their <em>magnitude</em> and <em>direction</em> vary periodically.
Because of several technical reasons (like low losses and disturbances), power generators use sinusoidal alternating quantities that can be straightforwardly modeled by <a href="https://en.wikipedia.org/wiki/Complex_number}{complex numbers">complex numbers</a>.</p>
<p>We will consistently use capital and small letters to denote complex and real-valued quantities, respectively.
For instance, let us consider two buses, \(i, j \in \mathcal{N}\), that are directly connected by a transmission line \((i, j) \in \mathcal{E}\).
The <a href="https://en.wikipedia.org/wiki/AC_power">complex power</a> flowing from bus \(i\) to bus \(j\) is denoted by \(S_{ij}\) and it can be decomposed into its active (\(p_{ij}\)) and reactive (\(q_{ij}\)) components:</p>
\[\begin{equation}
S_{ij} = p_{ij} + \mathrm{j}q_{ij},
\end{equation}\]
<p>where \(\mathrm{j} = \sqrt{-1}\).
The complex power flow can be expressed as the product of the complex voltage at bus \(i\), \(V_{i}\) and the complex conjugate of the current flowing between the buses, \(I_{ij}^{*}\):</p>
\[\begin{equation}
S_{ij} = V_{i}I_{ij}^{*},
\label{power_flow}
\end{equation}\]
<p>It is well known that transmission lines have power losses due to their <a href="https://en.wikipedia.org/wiki/Electrical_resistance_and_conductance">resistance</a> (\(r_{ij}\)), which is a measure of the opposition to the flow of the current.
For AC-circuits, a dynamic effect caused by the line <a href="https://en.wikipedia.org/wiki/Electrical_reactance">reactance</a> (\(x_{ij}\)) also plays a role.
Unlike resistance, reactance does not cause any loss of power but has a delayed effect by storing and later returning power to the circuit.
The effect of resistance and reactance together can be represented by a single complex quantity, the <a href="https://en.wikipedia.org/wiki/Electrical_impedance">impedance</a>: \(Z_{ij} = r_{ij} + \mathrm{j}x_{ij}\).
Another useful complex quantity is the <a href="https://en.wikipedia.org/wiki/Admittance">admittance</a>, which is the reciprocal of the impedance: \(Y_{ij} = \frac{1}{Z_{ij}}\).
Similarly to the impedance, the admittance can be also decomposed into its real, <a href="https://en.wikipedia.org/wiki/Electrical_resistance_and_conductance">conductance</a> (\(g_{ij}\)), and imaginary, <a href="https://en.wikipedia.org/wiki/Susceptance">susceptance</a> (\(b_{ij}\)), components: \(Y_{ij} = g_{ij} + \mathrm{j}b_{ij}\).</p>
<p>Therefore, the current can be written as a function of the line voltage drop and the admittance between the two buses, which is an alternative form of <a href="https://en.wikipedia.org/wiki/Ohm%27s_law">Ohm’s law</a>:</p>
\[\begin{equation}
I_{ij} = Y_{ij}(V_{i} - V_{j}).
\end{equation}\]
<p>Replacing the above expression for the current in the power flow equation (eq. \(\ref{power_flow}\)), we get</p>
\[\begin{equation}
S_{ij} = Y_{ij}^{*}V_{i}V_{i}^{*} - Y_{ij}^{*}V_{i}V_{j}^{*} = Y_{ij}^{*} \left( |V_{i}|^{2} - V_{i}V_{j}^{*} \right).
\end{equation}\]
<p>The above power flow equation can be expressed by using the polar form of voltage, i.e. \(V_{i} = v_{i}e^{\mathrm{j} \delta_{i}} = v_{i}(\cos\delta_{i} + \mathrm{j}\sin\delta_{i})\) (where \(v_{i}\) and \(\delta_{i}\) are the voltage magnitude and angle of bus \(i\), respectively), and the admittance components:</p>
\[\begin{equation}
S_{ij} = \left(g_{ij} - \mathrm{j}b_{ij}\right) \left(v_{i}^{2} - v_{i}v_{j}\left(\cos\delta_{ij} + \mathrm{j}\sin\delta_{ij}\right)\right),
\end{equation}\]
<p>where for brevity we introduced the voltage angle difference \(\delta_{ij} = \delta_{i} - \delta_{j}\).
Similarly, using a simple algebraic identity of \(g_{ij} - \mathrm{j}b_{ij} = \frac{g_{ij}^{2} + b_{ij}^{2}}{g_{ij} + \mathrm{j}b_{ij}} = \frac{|Y_{ij}|^{2}}{Y_{ij}} = \frac{Z_{ij}}{|Z_{ij}|^{2}} = \frac{r_{ij} + \mathrm{j}x_{ij}}{r_{ij}^{2} + x_{ij}^{2}}\), the impedance components-based expression has the following form:</p>
\[\begin{equation}
S_{ij} = \frac{r_{ij} + \mathrm{j}x_{ij}}{r_{ij}^{2} + x_{ij}^{2}} \left( v_{i}^{2} - v_{i}v_{j}\left(\cos\delta_{ij} + \mathrm{j}\sin\delta_{ij}\right)\right).
\end{equation}\]
<p>Finally, the corresponding real equations can be written as</p>
\[\begin{equation}
\left\{
\begin{aligned}
p_{ij} & = g_{ij} \left( v_{i}^{2} - v_{i} v_{j} \cos\delta_{ij} \right) - b_{ij} \left( v_{i} v_{j} \sin\delta_{ij} \right) \\
q_{ij} & = b_{ij} \left( -v_{i}^{2} + v_{i} v_{j} \cos\delta_{ij} \right) - g_{ij} \left( v_{i} v_{j} \sin\delta_{ij} \right), \\
\end{aligned}
\right.
\label{power_flow_y}
\end{equation}\]
<p>and</p>
\[\begin{equation}
\left\{
\begin{aligned}
p_{ij} & = \frac{1}{r_{ij}^{2} + x_{ij}^{2}} \left[ r_{ij} \left( v_{i}^{2} - v_{i} v_{j} \cos\delta_{ij} \right) + x_{ij} \left( v_{i} v_{j} \sin\delta_{ij} \right) \right] \\
q_{ij} & = \frac{1}{r_{ij}^{2} + x_{ij}^{2}} \left[ x_{ij} \left( v_{i}^{2} - v_{i} v_{j} \cos\delta_{ij} \right) + r_{ij} \left( v_{i} v_{j} \sin\delta_{ij} \right) \right]. \\
\end{aligned}
\right.
\label{power_flow_z}
\end{equation}\]
<h2 id="power-flow-models">Power flow models</h2>
<p>In the previous section we presented the power flow between two connected buses and established a relationship between complex power flow and complex voltages.
In power flow problems, the entire power grid is considered and the task is to calculate certain quantities based on some other specified ones.
There are two equivalent power flow models depending on the graph model used: the bus injection model (based on the undirected graph representation) and the branch flow model (based on the directed graph representation).
First, we introduce the basic formulations.
Then, we show the most widely used technique to solve power flow problems.
Finally, we extend the basic equations and derive more sophisticated models including additional components for real power grids.</p>
<h3 id="bus-injection-model">Bus injection model</h3>
<p>The bus injection model (BIM) uses the undirected graph model of the power grid, \(\mathbb{G}_{U}\).
For each bus \(i\), we denote by \(\mathcal{N}_{i} \subset \mathcal{N}\) the set of buses directly connected to bus \(i\).
Also, for each bus we introduce the following quantities<sup id="fnref:Low14:1" role="doc-noteref"><a href="#fn:Low14" class="footnote" rel="footnote">3</a></sup><sup id="fnref:Wood14" role="doc-noteref"><a href="#fn:Wood14" class="footnote" rel="footnote">4</a></sup> (Figure 3):</p>
<ul>
<li>\(S_{i}^{\mathrm{gen}}\): generated power flowing into bus \(i\).</li>
<li>\(S_{i}^{\mathrm{load}}\): demand power or load flowing out of the bus \(i\).</li>
<li>\(S_{i}\): net power injection at bus \(i\), i.e. \(S_{i} = S_{i}^{\mathrm{gen}} - S_{i}^{\mathrm{load}}\).</li>
<li>\(S_{i}^{\mathrm{trans}}\): transmitted power flowing between bus \(i\) and its adjacent buses.
<br />
<br /></li>
</ul>
<p><img src="/blog/public/images/power_quantities.png" alt="..." style="width:30em; height:auto; margin:0 auto;" />
Figure 3. Power balance and quantities of a bus connected to three adjacent buses and including a single generator and a single load.</p>
<p><br />
<a href="https://en.wikipedia.org/wiki/Tellegen%27s_theorem">Tellegen’s theorem</a> establishes a simple relationship between these power quantities:</p>
\[\begin{equation}
S_{i} = S_{i}^{\mathrm{gen}} - S_{i}^{\mathrm{load}} = S_{i}^{\mathrm{trans}} \ \ \ \ \forall i \in \mathcal{N}.
\label{power_balance}
\end{equation}\]
<p>Eq. \(\ref{power_balance}\) expresses the law of conservation of power (energy): the power injected (\(S_{i}^{\mathrm{gen}}\)) to bus \(i\) must be equal to the power going out from the bus, i.e. the sum of the withdrawn (\(S_{i}^{\mathrm{load}}\)) and transmitted power (\(S_{i}^{\mathrm{trans}}\)).
In the most basic model, a bus can represent either a generator (i.e. \(S_{i} = S_{i}^{\mathrm{gen}}\)) or a load (i.e. \(S_{i} = -S_{i}^{\mathrm{load}}\)).
For a given bus, the transmitted power can be obtained simply as a sum of the powers flowing in to and out from the bus, \(S_{i}^{\mathrm{trans}} = \sum \limits_{j \in \mathcal{N}_{i}} S_{ij}\).
Therefore, the basic BIM has the following concise form:</p>
\[\begin{equation}
S_{i} = \sum \limits_{j \in \mathcal{N}_{i}} Y_{ij}^{*} \left( |V_{i}|^{2} - V_{i}V_{j}^{*} \right) \ \ \ \ \forall i \in \mathcal{N} .
\label{bim_basic_concise}
\end{equation}\]
<h3 id="branch-flow-model">Branch flow model</h3>
<p>We briefly describe an alternative formulation of the power flow problem, the branch flow model (BFM).<sup id="fnref:Farivar13" role="doc-noteref"><a href="#fn:Farivar13" class="footnote" rel="footnote">5</a></sup>
The BFM is based on the directed graph representation of the grid network, \(\mathbb{G}_{D}\), and directly models branch flows and currents.
Let us fix an arbitrary orientation of \(\mathbb{G}_{D}\) and let \((i, j) \in \mathcal{E}\) denote an edge pointing from bus \(i\) to bus \(j\).
Then, the BFM is defined by the following set of complex equations:</p>
\[\begin{equation}
\left\{
\begin{aligned}
S_{i} & = \sum \limits_{(i, j) \in \mathcal{E}_{D}} S_{ij} - \sum \limits_{(k, i) \in \mathcal{E}} \left( S_{ki} - Z_{ki} |I_{ki}|^{2} \right) & \forall i \in \mathcal{N}, \\
I_{ij} & = Y_{ij} \left( V_{i} - V_{j} \right) & \forall (i, j) \in \mathcal{E}, \\
S_{ij} & = V_{i}I_{ij}^{*} & \forall (i, j) \in \mathcal{E}, \\
\end{aligned}
\right.
\label{bfm_basic_concise}
\end{equation}\]
<p>where the first, second and third sets of equations correspond to the power balance, Ohm’s law, and branch power definition, respectively.</p>
<h3 id="the-power-flow-problem">The power flow problem</h3>
<p>The power flow problem is to find a unique solution of a power system (given certain input variables) and, therefore, it is central to power grid analysis.</p>
<p>First, let us consider the BIM equations (eq. \(\ref{bim_basic_concise}\)) that define a complex non-linear system with \(N = |\mathcal{N}|\) complex equations, and \(2N\) complex variables, \(\left\{S_{i}, V_{i}\right\}_{i=1}^{N}\).
Equivalently, either using the admittance (eq. \(\ref{power_flow_y}\))
or the impedance (eq. \(\ref{power_flow_z}\))
components we can construct \(2N\) real equations with \(4N\) real variables: \(\left\{p_{i}, q_{i}, v_{i}, \delta_{i}\right\}_{i=1}^{N}\), where \(S_{i} = p_{i} + \mathrm{j}q_{i}\).
The power flow problem is the following: for each bus we specify two of the four real variables and then using the \(2N\) equations we derive the remaining variables.
Depending on which variables are specified there are three basic types of buses:</p>
<ul>
<li><em>Slack</em> (or \(V\delta\) bus) is usually a reference bus, where the voltage angle and magnitude are specified. Also, slack buses are used to make up any generation and demand mismatch caused by line losses. The voltage angle is usually set to 0, while the magnitude to 1.0 per unit.</li>
<li><em>Load</em> (or \(PQ\) bus) is the most common bus type, where only demand but no power generation takes place. For such buses the active and reactive powers are specified.</li>
<li><em>Generator</em> (or \(PV\) bus) specifies the active power and voltage magnitude variables.
<br />
<br /></li>
</ul>
<p>Table 1. Basic bus types in power flow problem.</p>
<table>
<thead>
<tr>
<th style="text-align: left">Bus type</th>
<th style="text-align: center">Code</th>
<th style="text-align: center">Specified variables</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: left">Slack</td>
<td style="text-align: center">\(V\delta\)</td>
<td style="text-align: center">\(v_{i}\), \(\delta_{i}\)</td>
</tr>
<tr>
<td style="text-align: left">Load</td>
<td style="text-align: center">\(PQ\)</td>
<td style="text-align: center">\(p_{i}\), \(q_{i}\)</td>
</tr>
<tr>
<td style="text-align: left">Generator</td>
<td style="text-align: center">\(PV\)</td>
<td style="text-align: center">\(p_{i}\), \(v_{i}\)</td>
</tr>
</tbody>
</table>
<p><br />
Further, let \(E = |\mathcal{E}|\) denote the number of directed edges.
The BFM formulation (eq. \(\ref{bfm_basic_concise}\)) includes \(N + 2E\) complex equations with \(2N + 2E\) complex variables: \(\left\{ S_{i}, V_{i} \right\}_{i \in \mathcal{N}} \cup \left\{ S_{ij}, I_{ij} \right\}_{(i, j) \in \mathcal{E}}\) or equivalently, \(2N + 4E\) real equations with \(4N + 4E\) real variables.</p>
<h3 id="bim-vs-bfm">BIM vs BFM</h3>
<p>The BIM and the BFM are equivalent formulations, i.e. they define the same physical problem and provide the same solution.<sup id="fnref:Subhonmesh12" role="doc-noteref"><a href="#fn:Subhonmesh12" class="footnote" rel="footnote">6</a></sup>
Although the formulations are equivalent, in practice we might prefer one model over the other.
Depending on the actual problem and the structure of the system, it might be easier to obtain results and derive an exact solution or an approximate relaxation from one formulation than from the other one.
<br />
<br /></p>
<p>Table 2. Comparison of basic BIM and BFM formulations: complex variables, number of complex variables and equations. Corresponding real variables, their numbers as well as the number of real equations are also shown in parentheses.</p>
<table>
<thead>
<tr>
<th style="text-align: center">Formulation</th>
<th style="text-align: center">Variables</th>
<th style="text-align: center">Number of variables</th>
<th style="text-align: center">Number of equations</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: center">BIM</td>
<td style="text-align: center">\(V_{i} \ (v_{i}, \delta_{i})\) <br /> \(S_{i} \ (p_{i}, q_{i})\)</td>
<td style="text-align: center">\(2N\) <br /> \((4N)\)</td>
<td style="text-align: center">\(N\) <br /> \((2N)\)</td>
</tr>
<tr>
<td style="text-align: center">BFM</td>
<td style="text-align: center">\(V_{i} \ (v_{i}, \delta_{i})\) <br /> \(S_{i} \ (p_{i}, q_{i})\) <br /> \(I_{ij} \ (i_{ij}, \gamma_{ij})\) <br /> \(S_{ij} \ (p_{ij}, q_{ij})\)</td>
<td style="text-align: center">\(2N + 2E\) <br /> \((4N + 4E)\)</td>
<td style="text-align: center">\(N + 2E\) <br /> \((2N + 4E)\)</td>
</tr>
</tbody>
</table>
<p><br /></p>
<h3 id="solving-the-power-flow-problem">Solving the power flow problem</h3>
<p>Power flow problems (formulated as either a BIM or a BFM) define a non-linear system of equations.
There are multiple approaches to solve power flow systems but the most widely used technique is the Newton–Raphson method.
Below we demonstrate how it can be applied to the BIM formulation.
First, we rearrange eq. \(\ref{bim_basic_concise}\):</p>
\[\begin{equation}
F_{i} = S_{i} - \sum \limits_{j \in \mathcal{N}_{i}} Y_{ij}^{*} \left( |V_{i}|^{2} - V_{i}V_{j}^{*} \right) = 0 \ \ \ \ \forall i \in \mathcal{N}.
\end{equation}\]
<p>The above set of equations can be expressed simply as \(F(X) = 0\), where \(X\) denotes the \(N\) complex or more conveniently, the \(2N\) real unknown variables and \(F\) represents the \(N\) complex or \(2N\) real equations.</p>
<p>In the <a href="https://en.wikipedia.org/wiki/Newton%27s_method">Newton—Raphson method</a> the solution is sought in an iterative fashion, until a certain threshold of convergence is satisfied.
In the \((n+1)\)th iteration we obtain:</p>
\[\begin{equation}
X_{n+1} = X_{n} - J_{F}(X_{n})^{-1} F(X_{n}),
\end{equation}\]
<p>where \(J_{F}\) is the Jacobian matrix with \(\left[J_{F}\right]_{ij} = \frac{\partial F_{i}}{\partial X_{j}}\) elements.
We also note that, instead of computing the inverse of the Jacobian matrix, a numerically more stable approach is to solve first a linear system of \(J_{F}(X_{n}) \Delta X_{n} = -F(X_{n})\) and then obtain \(X_{n+1} = X_{n} + \Delta X_{n}\).</p>
<p>Since Newton—Raphson method requires only the computation of first derivatives of the \(F_{i}\) functions with respect to the variables, this technique is in general very efficient even for large, real-size grids.</p>
<h3 id="towards-more-realistic-power-flow-models">Towards more realistic power flow models</h3>
<p>In the previous sections we introduced the main concepts needed to understand and solve the power flow problem.
However, the derived models were rather basic ones, and real systems require more reasonable approaches.
Below we will construct more sophisticated models by extending the basic equations considering the following:</p>
<ul>
<li>multiple generators and loads included in buses;</li>
<li>shunt elements connected to buses;</li>
<li>modeling transformers and phase shifters;</li>
<li>modeling asymmetric line charging.</li>
</ul>
<p>We note that actual power grids can include additional elements besides the ones we discuss.</p>
<h4 id="improving-the-bus-model">Improving the bus model</h4>
<p>We start with the generated (\(S_{i}^{\mathrm{gen}}\)) and load (\(S_{i}^{\mathrm{load}}\)) powers in the power balance equations, i.e. \(S_{i} = S_{i}^{\mathrm{gen}} - S_{i}^{\mathrm{load}} = S_{i}^{\mathrm{trans}}\) for each bus \(i\).
Electric grid buses can actually include multiple generators and loads together.
In order to take this structure into account we introduce \(\mathcal{G}_{i}\) and \(\mathcal{L}_{i}\), denoting the set of generators and the set of loads at bus \(i\), respectively.
Also, \(\mathcal{G} = \bigcup \limits_{i \in \mathcal{N}} \mathcal{G}_{i}\) and \(\mathcal{L} = \bigcup \limits_{i \in \mathcal{N}} \mathcal{L}_{i}\) indicate the sets of all generators and loads, respectively.
Then we can write the following complex-valued expressions:</p>
\[\begin{equation}
S_{i}^{\mathrm{gen}} = \sum \limits_{g \in \mathcal{G}_{i}} S_{g}^{G} \ \ \ \ \forall i \in \mathcal{N},
\end{equation}\]
\[\begin{equation}
S_{i}^{\mathrm{load}} = \sum \limits_{l \in \mathcal{L}_{i}} S_{l}^{L} \ \ \ \ \forall i \in \mathcal{N},
\end{equation}\]
<p>where \(\left\{S_{g}^{G}\right\}_{g \in \mathcal{G}}\) and \(\left\{S_{l}^{L}\right\}_{l \in \mathcal{L}}\) are the complex powers of generator dispatches and load consumptions, respectively.
It is easy to see from the equations above that having multiple generators and loads per bus increases the total number of variables, while the number of equations does not change.
If the system includes \(N = \lvert \mathcal{N} \rvert\) buses, \(G = \lvert \mathcal{G} \rvert\) generators and \(L = \lvert \mathcal{L} \rvert\) loads, then the total number of complex variables changes to \(N + G + L\) and \(N + G + L + 2E\) for BIM and BFM, respectively.</p>
<p>Buses can also include shunt elements that are used to model connected capacitors and inductors.
They are represented by individual admittances resulting in additional power flowing out of the bus:</p>
\[\begin{equation}
S_{i} = S_{i}^{\mathrm{gen}} - S_{i}^{\mathrm{load}} - S_{i}^{\mathrm{shunt}} = \sum \limits_{g \in \mathcal{G}_{i}} S_{g}^{G} - \sum \limits_{l \in \mathcal{L}_{i}} S_{l}^{L} - \sum \limits_{s \in \mathcal{S}_{i}} S_{s}^{S} \ \ \ \ \forall i \in \mathcal{N},
\end{equation}\]
<p>where \(\mathcal{S}_{i}\) is the set of shunts attached to bus \(i\) and \(S^{S}_{s} = \left( Y_{s}^{S} \right)^{*} \lvert V_{i} \rvert^{2}\) with admittance \(Y_{s}^{S}\) of shunt \(s \in \mathcal{S}_{i}\).
Shunt elements do not introduce additional variables in the system.</p>
<h4 id="improving-the-branch-model">Improving the branch model</h4>
<p>In real power grids, branches can be transmission lines, transformers, or phase shifters.
Transformers are used to control voltages, active and, sometimes, reactive power flows.
In this section we take the transformer and line charging also into account by a general branch model (Figure 4).
<br />
<br /></p>
<p><img src="/blog/public/images/branch_model.png" alt="branch_model" />
Figure 4. General branch model including transformers and \(\pi\)-section model (source: MATPOWER manual<sup id="fnref:Zimmerman20" role="doc-noteref"><a href="#fn:Zimmerman20" class="footnote" rel="footnote">7</a></sup>).</p>
<p><br />
Transformers (and also phase shifters) are represented by the complex tap ratio \(T_{ij}\), or equivalently, by its magnitude \(t_{ij}\) and phase angle \(\theta_{ij}\).
Also, line charging is usually modeled by the \(\pi\) transmission line (or \(\pi\)-section) model, which places shunt admittances at the two ends of the transmission line, beside its series admittance.
We will treat the shunt admittances in a fairly general way, i.e. \(Y_{ij}^{C}\) and \(Y_{ji}^{C}\) are not necessarily equal but we also note that a widely adopted practice is to assume equal values and consider their susceptance component only.
Since this model is not symmetric anymore, for consistency we introduce the following convention: we select the orientation of each branch \((i, j) \in \mathcal{E}\) that matches the structure presented in Figure 4.</p>
<p>In order to derive the corresponding power flow equations of this model (for both directions due to the asymmetric arrangement), we first look for expressions of the current \(I_{ij}\) and \(I_{ji}\). <br />
The transformer alters the input voltage \(V_{i}\) and current \(I_{ij}\).
Using the complex tap ratio the output voltage and current are \(\frac{V_{i}}{T_{ij}}\) and \(T_{ij}^{*}I_{ij}\), respectively.
Let \(I_{ij}^{s}\) denote the series current flowing between point \(A\) and \(B\) in Figure 4.
Based on <a href="https://en.wikipedia.org/wiki/Kirchhoff%27s_circuit_laws">Kirchhoff’s current law</a> the net current flowing into node \(A\) is equal to the net current flowing out from it, i.e. \(T_{ij}^{*} I_{ij} = I_{ij}^{s} + Y_{ij}^{C} \frac{V_{i}}{T_{ij}}\).
Rearranging this expression for \(I_{ij}\) we get:</p>
\[\begin{equation}
I_{ij} = \frac{I_{ij}^{s}}{T_{ij}^{*}} + Y_{ij}^{C} \frac{V_{i}}{\lvert T_{ij} \rvert^{2}} \quad \forall (i, j) \in \mathcal{E}.
\end{equation}\]
<p>Similarly, applying Kirchhoff’s current law for node \(B\) we can obtain \(I_{ji}\):</p>
\[\begin{equation}
I_{ji} = -I_{ij}^{s} + Y_{ji}^{C} V_{j} \quad \forall (j, i) \in \mathcal{E}^{R}.
\end{equation}\]
<p>Finally, using again Ohm’s law, the series current has the following simple expression: \(I_{ij}^{s} = Y_{ij} \left( \frac{V_{i}}{T_{ij}} - V_{j} \right)\).
Now we can easily obtain the corresponding power flows:</p>
\[\begin{equation}
\begin{aligned}
S_{ij} & = V_{i}I_{ij}^{*} = V_{i} \left( \frac{I_{ij}^{s}}{T_{ij}^{*}} + Y_{ij}^{C} \frac{V_{i}}{\lvert T_{ij} \rvert^{2}}\right)^{*} & \\
& = Y_{ij}^{*} \left( \frac{\lvert V_{i} \rvert^{2}}{\lvert T_{ij} \rvert^{2}} - \frac{V_{i} V_{j}^{*}}{T_{ij}} \right) + \left( Y_{ij}^{C} \right)^{*} \frac{\lvert V_{i} \rvert^{2}}{\lvert T_{ij} \rvert^{2}} \ \ \ \ & \forall (i, j) \in \mathcal{E} \\
S_{ji} & = V_{j}I_{ji}^{*} = V_{j} \left( -I_{ij}^{s} + Y_{ji}^{C}V_{j} \right)^{*} & \\
& = Y_{ij}^{*} \left( \lvert V_{j} \rvert^{2} - \frac{V_{j}V_{i}^{*}}{T_{ij}^{*}} \right) + \left( Y_{ji}^{C} \right)^{*} \lvert V_{j} \rvert^{2} \ \ \ \ & \forall (j, i) \in \mathcal{E}^{R} \\
\end{aligned}
\end{equation}\]
<h3 id="improved-bim-and-bfm-models">Improved BIM and BFM models</h3>
<p>Putting everything together from the previous sections we present the improved BIM and BFM models:</p>
<p>BIM:</p>
\[\begin{equation}
\begin{aligned}
\sum \limits_{g \in \mathcal{G}_{i}} S_{g}^{G} - \sum \limits_{l \in \mathcal{L}_{i}} S_{l}^{L} - \sum \limits_{s \in \mathcal{S}_{i}} \left( Y_{s}^{S} \right)^{*} \lvert V_{i} \rvert^{2} = \sum \limits_{(i, j) \in \mathcal{E}} Y_{ij}^{*} \left( \frac{\lvert V_{i} \rvert^{2}}{\lvert T_{ij} \rvert^{2}} - \frac{V_{i} V_{j}^{*}}{T_{ij}} \right) + \left( Y_{ij}^{C} \right)^{*} \frac{\lvert V_{i} \rvert^{2}}{\lvert T_{ij} \rvert^{2}} \\
+ \sum \limits_{(k, i) \in \mathcal{E}^{R}} Y_{ik}^{*} \left( \lvert V_{k} \rvert^{2} - \frac{V_{k}V_{i}^{*}}{T_{ik}^{*}} \right) + \left( Y_{ki}^{C} \right)^{*} \lvert V_{k} \rvert^{2} \ \ \ \ \forall i \in \mathcal{N}, \\
\end{aligned}
\label{bim_concise}
\end{equation}\]
<p>where, on the left-hand side, the net power at bus \(i\) is computed from the power injection of multiple generators and power withdrawals of multiple loads and shunt elements, while, on the right-hand side, the transmitted power is a sum of the outgoing (first term) and incoming (second term) power flows to and from the corresponding adjacent buses.</p>
<p>BFM:</p>
\[\begin{equation}
\begin{aligned}
& \sum \limits_{g \in \mathcal{G}_{i}} S_{g}^{G} - \sum \limits_{l \in \mathcal{L}_{i}} S_{l}^{L} - \sum \limits_{s \in \mathcal{S}_{i}} \left( Y_{s}^{S} \right)^{*} \lvert V_{i} \rvert^{2} = \sum \limits_{(i, j) \in \mathcal{E}} S_{ij} + \sum \limits_{(k, i) \in \mathcal{E}^{R}} S_{ki} & \forall i \in \mathcal{N} \\
& I_{ij}^{s} = Y_{ij} \left( \frac{V_{i}}{T_{ij}} - V_{j} \right) & \forall (i, j) \in \mathcal{E} \\
& S_{ij} = V_{i} \left( \frac{I_{ij}^{s}}{T_{ij}^{*}} + Y_{ij}^{C} \frac{V_{i}}{\lvert T_{ij} \rvert^{2}}\right)^{*} & \forall (i, j) \in \mathcal{E} \\
& S_{ji} = V_{j} \left( -I_{ij}^{s} + Y_{ji}^{C}V_{j} \right)^{*} & \forall (j, i) \in \mathcal{E}^{R} \\
\end{aligned}
\label{bfm_concise}
\end{equation}\]
<p>As before, BFM is an equivalent formulation to BIM (first set of equations), with the only difference being that BFM treats the series currents (second set of equations) and power flows (third and fourth sets of equations) as explicit variables.
<br />
<br /></p>
<p>Table 3. Comparison of improved BIM and BFM formulations: complex variables, number of complex variables and equations. Corresponding real variables, their numbers as well as the number of real equations are also shown in parentheses.</p>
<table>
<thead>
<tr>
<th style="text-align: center">Formulation</th>
<th style="text-align: center">Variables</th>
<th style="text-align: center">Number of variables</th>
<th style="text-align: center">Number of equations</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: center">BIM</td>
<td style="text-align: center">\(V_{i} \ (v_{i}, \delta_{i})\) <br /> \(S_{g}^{G} \ (p_{g}^{G}, q_{g}^{G})\) <br /> \(S_{l}^{L} \ (p_{l}^{L}, q_{l}^{L})\)</td>
<td style="text-align: center">\(N + G + L\) <br /> \((2N + 2G + 2L)\)</td>
<td style="text-align: center">\(N\) <br /> \((2N)\)</td>
</tr>
<tr>
<td style="text-align: center">BFM</td>
<td style="text-align: center">\(V_{i} \ (v_{i}, \delta_{i})\) <br /> \(S_{g}^{G} \ (p_{g}^{G}, q_{g}^{G})\) <br /> \(S_{l}^{L} \ (p_{l}^{L}, q_{l}^{L})\) <br /> \(I_{ij}^{s} \ (i_{ij}^{s}, \gamma_{ij}^{s})\) <br /> \(S_{ij} \ (p_{ij}, q_{ij})\) <br /> \(S_{ji} \ (p_{ji}, q_{ji})\)</td>
<td style="text-align: center">\(N + G + L + 3E\) <br /> \((2N + 2G + 2L + 6E)\)</td>
<td style="text-align: center">\(N + 3E\) <br /> \((2N + 6E)\)</td>
</tr>
</tbody>
</table>
<p><br /></p>
<h2 id="conclusions">Conclusions</h2>
<p>In this blog post, we introduced the main concepts of the power flow problem and derived the power flow equations for a basic and a more realistic model using two equivalent formulations.
We also showed that the resulting mathematical problems are non-linear systems that can be solved by the Newton—Raphson method.</p>
<p>Solving power flow problems is essential for the analysis of power grids.
Based on a set of specified variables, the power flow solution provides the unknown ones.
Consider a power grid with \(N\) buses, \(G\) generators and \(L\) loads, and let us specify only the active and reactive powers of the loads, which means \(2L\) number of known real-valued variables.
Whichever formulation we use, this system is underdetermined and we would need to specify \(2G\) additional variables (Table 3.) to obtain a unique solution.
However, generators have distinct power supply capacities and cost curves, and we can ask an interesting question: <em>among the many physically possible specifications of the generator variables which is the one that would minimize the total economic cost?</em></p>
<p>This question leads to the mathematical model of economic dispatch, which is one of the most widely used <em>optimal power flow</em> problems.
Optimal power flow problems are constrained optimization problems that are based on the power flow equations with additional physical constraints and an objective function.
In our next post, we will discuss the main types and components of optimal power flow, and show how exact and approximate solutions can be obtained for these significantly more challenging problems.</p>
<hr />
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:Tong04" role="doc-endnote">
<p>J. Tong, “Overview of PJM energy market design, operation and experience”, <em>2004 IEEE International Conference on Electric Utility Deregulation, Restructuring and Power Technologies. Proceedings</em>, <strong>1</strong>, pp. 24, (2004). <a href="#fnref:Tong04" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:Cain12" role="doc-endnote">
<p>M. B. Cain, R. P. Oneill, and A. Castillo, “History of optimal power flow and formulations”, <em>Federal Energy Regulatory Commission</em>, <strong>1</strong>, pp. 1, (2012). <a href="#fnref:Cain12" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:Low14" role="doc-endnote">
<p>S. H. Low, “Convex Relaxation of Optimal Power Flow—Part I: Formulations and Equivalence,” <em>IEEE Transactions on Control of Network Systems</em>, <strong>1</strong>, pp. 15, (2014). <a href="#fnref:Low14" class="reversefootnote" role="doc-backlink">↩</a> <a href="#fnref:Low14:1" class="reversefootnote" role="doc-backlink">↩<sup>2</sup></a></p>
</li>
<li id="fn:Wood14" role="doc-endnote">
<p>A. J. Wood, B. F. Wollenberg and G. B. Sheblé, “Power generation, operation, and control,” <em>Hoboken, New Jersey: Wiley-Interscience</em>, (2014). <a href="#fnref:Wood14" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:Farivar13" role="doc-endnote">
<p>M. Farivar and S. H. Low, “Branch Flow Model: Relaxations and Convexification—Part I,” <em>IEEE Transactions on Power Systems</em>, <strong>28</strong>, pp. 2554, (2013). <a href="#fnref:Farivar13" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:Subhonmesh12" role="doc-endnote">
<p>B. Subhonmesh, S. H. Low and K. M. Chandy, “Equivalence of branch flow and bus injection models,” <em>2012 50th Annual Allerton Conference on Communication, Control, and Computing</em>, pp. 1893, (2012). <a href="#fnref:Subhonmesh12" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:Zimmerman20" role="doc-endnote">
<p>R. D. Zimmerman, C. E. Murillo-Sanchez, “MATPOWER User’s Manual, Version 7.1. 2020.”, (2020). <a href="#fnref:Zimmerman20" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>Letif MonesAlthough governed by simple physical laws, power grids are among the most complex human-made systems. The main source of the complexity is the large number of components of the power systems that interact with each other: one needs to maintain a balance between power injections and withdrawals while satisfying certain physical, economic, and environmental conditions. For instance, a central task of daily planning and operations of electricity grid operators1 is to dispatch generation in order to meet demand at minimum cost, while respecting reliability and security constraints. These tasks require solving a challenging constrained optimization problem, often referred to as some form of optimal power flow (OPF).2 J. Tong, “Overview of PJM energy market design, operation and experience”, 2004 IEEE International Conference on Electric Utility Deregulation, Restructuring and Power Technologies. Proceedings, 1, pp. 24, (2004). ↩ M. B. Cain, R. P. Oneill, and A. Castillo, “History of optimal power flow and formulations”, Federal Energy Regulatory Commission, 1, pp. 1, (2012). ↩Development with Interface Packages2020-11-06T00:00:00+00:002020-11-06T00:00:00+00:00https://invenia.github.io/blog/2020/11/06/interfacetesting<p>Over the last two years, our Julia codebase has grown in size and complexity, and is now the centerpiece of both our operations and research.
This implies that we need to routinely replace parts of the system like puzzle pieces, and carefully test if the results lead to improvements along various dimensions of performance.
However, those pieces are not designed to work in isolation, and thus cannot be tested in a vacuum.
They also tend to be quite large and complex, and therefore often need to be independent packages, which need to “fit” together in precise ways determined by the rest of the system.
This is where interfaces come in handy: they are tools for isolating and precisely testing specific pieces of a large system.</p>
<p>In order to have the option of reusing an interface without pulling in a lot of extra code, we avoid embedding the interface into any package using it, allowing it to exist in a package on its own.
This approach also makes it easier to keep track of the changes to the interface, when these are not tied to unrelated changes in other packages.
This results in a set of very slim interface packages that define what functional pieces need to be filled in and how they should fit.</p>
<h2 id="interface-packages">Interface Packages</h2>
<p>For our purposes, an interface package is any Julia package that contains an abstract type with no concrete implementation—a sort of IOU in code.
Most functions defined for the type are either un-implemented or are based on other, un-implemented functions.
Despite having very little code, the docstrings form a sort of contract laying out what methods a concrete implementation should have and what they should do in order for the implementation to be considered valid.
A package implementing the interface but missing methods or using different function signatures from interface package can be said to break the contract.
As long as the contract between the concrete and abstract interface remains unbroken, a package using the interface should be able to swap out any concrete implementation for any other and still run as expected provided the user package is using the interface as documented.</p>
<p>In order to improve the modularity of the code, they are separated from the code that uses the abstract type.
This results in interface packages essentially looking like design documents.
For example, below we sketch an interface package for a database.</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">module</span> <span class="n">AbstractDB</span>
<span class="s">"""
AbstractDB
A simple API for read only access to a database.
"""</span>
<span class="k">abstract type</span><span class="nc"> AbstractDB</span> <span class="k">end</span>
<span class="s">"""
AbstractResponse
Query data needed for fetch.
"""</span>
<span class="k">abstract type</span><span class="nc"> AbstractResponse</span> <span class="k">end</span>
<span class="s">"""
query(
db::AbstractDB,
table::String,
features::Vector{Feature}
) -> AbstractResponse
Submit an appropriate query to `table` in `db` asking for results matching
`features`.
"""</span>
<span class="k">function</span><span class="nf"> query</span> <span class="k">end</span>
<span class="s">"""
fetch(
db::AbstractDB,
response::Union{Response,Vector{Response}}
) -> DataFrame
Retrieve the results from `response` and parses them into a `DataFrame`.
"""</span>
<span class="k">function</span><span class="nf"> fetch</span> <span class="k">end</span>
<span class="s">"""
easy_query(
db::AbstractDB,
table::String,
features::Vector{Feature}
) -> DataFrame
Submit and fetches a query to `table` in `db` asking for results matching
`features`.
"""</span>
<span class="k">function</span><span class="nf"> easy_query</span><span class="x">(</span><span class="n">db</span><span class="o">::</span><span class="n">AbstractDB</span><span class="x">,</span> <span class="n">table</span><span class="o">::</span><span class="kt">String</span><span class="x">,</span> <span class="n">features</span><span class="o">::</span><span class="kt">Vector</span><span class="x">{</span><span class="n">Feature</span><span class="x">})</span>
<span class="k">return</span> <span class="n">fetch</span><span class="x">(</span><span class="n">db</span><span class="x">,</span> <span class="n">query</span><span class="x">(</span><span class="n">db</span><span class="x">,</span> <span class="n">table</span><span class="x">,</span> <span class="n">features</span><span class="x">))</span>
<span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>
<p>For our purposes, we will assume a <code class="language-plaintext highlighter-rouge">Feature</code> type exists and corresponds to a column in a database table with some criteria for selecting rows.
We will also assume that <code class="language-plaintext highlighter-rouge">filter</code>ing a <code class="language-plaintext highlighter-rouge">DataFrame</code> by some <code class="language-plaintext highlighter-rouge">Feature</code>s works, but is too inefficient to be used in practice.</p>
<h3 id="a-problem-with-interface-packages">A problem with interface packages</h3>
<p>While simple to write, interface packages can fall victim to a common failure.
Since interface packages provide little more than function names and docstrings, it’s easy for the packages around them to “go rogue”, changing the function signatures for the interface as they see fit without updating the actual interface package.
This ends up creating a secret, undocumented interface known only to the implemented types and whichever packages manage to use them.</p>
<p>For example, let’s say a concrete subtype of our <code class="language-plaintext highlighter-rouge">AbstractDB</code> database example above changes to let a user define a fancy transform to join up results during <code class="language-plaintext highlighter-rouge">fetch</code> in a new version.</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># RogueDB.jl</span>
<span class="n">fetch</span><span class="x">(</span><span class="n">db</span><span class="o">::</span><span class="n">RogueDB</span><span class="x">,</span> <span class="n">response</span><span class="o">::</span><span class="kt">Vector</span><span class="x">{</span><span class="n">Response</span><span class="x">},</span> <span class="n">fancy_transform</span><span class="o">::</span><span class="n">Callable</span><span class="x">)</span>
<span class="n">easy_query</span><span class="x">(</span>
<span class="n">db</span><span class="o">::</span><span class="n">RogueDB</span><span class="x">,</span>
<span class="n">table</span><span class="o">::</span><span class="kt">String</span><span class="x">,</span>
<span class="n">features</span><span class="o">::</span><span class="kt">Vector</span><span class="x">{</span><span class="n">Feature</span><span class="x">},</span>
<span class="n">fancy_transform</span><span class="o">=</span><span class="n">donothing</span><span class="x">()</span>
<span class="x">)</span>
</code></pre></div></div>
<p>The <code class="language-plaintext highlighter-rouge">fancy_transform</code> has a fallback and does not break the contract the interface sets out, but it can cause problems if other packages only refer back to <code class="language-plaintext highlighter-rouge">RogueDB</code>.
If another package <code class="language-plaintext highlighter-rouge">X</code> claims to use an <code class="language-plaintext highlighter-rouge">AbstractDB</code> but makes use of the <code class="language-plaintext highlighter-rouge">fancy_transform</code> in both functions and doesn’t fall back to the documented version of <code class="language-plaintext highlighter-rouge">fetch</code>, it now can’t switch to another <code class="language-plaintext highlighter-rouge">AbstractDB</code>.
This goes unnoticed, as <code class="language-plaintext highlighter-rouge">X</code> is only tested against <code class="language-plaintext highlighter-rouge">RogueDB</code>.</p>
<p>Say we now want to hook up another DB to <code class="language-plaintext highlighter-rouge">X</code> so we make the fetch function when <code class="language-plaintext highlighter-rouge">NewDB</code> is created.</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># NewDB.jl</span>
<span class="n">fetch</span><span class="x">(</span><span class="n">db</span><span class="o">::</span><span class="n">NewDB</span><span class="x">,</span> <span class="n">response</span><span class="o">::</span><span class="kt">Vector</span><span class="x">{</span><span class="n">Response</span><span class="x">},</span> <span class="n">fancy_transform</span><span class="o">::</span><span class="n">Callable</span><span class="x">)</span>
</code></pre></div></div>
<p>The above <code class="language-plaintext highlighter-rouge">fetch</code> <em>does</em> break the contract with the interface package, but <code class="language-plaintext highlighter-rouge">X</code> has been used for several months and nobody questions the API, so <code class="language-plaintext highlighter-rouge">fetch</code> without the <code class="language-plaintext highlighter-rouge">fancy_transform</code> is not implemented.
<code class="language-plaintext highlighter-rouge">X</code> uses the default <code class="language-plaintext highlighter-rouge">easy_query</code> with <code class="language-plaintext highlighter-rouge">NewDB</code>, which does not include the <code class="language-plaintext highlighter-rouge">fancy_transform</code> argument.
Calling <code class="language-plaintext highlighter-rouge">easy_query</code> will cause a <code class="language-plaintext highlighter-rouge">MethodError</code>, but <code class="language-plaintext highlighter-rouge">X</code> only calls that function in an untested (or mocked) special case and the bug goes unseen.</p>
<p>Further suppose that <code class="language-plaintext highlighter-rouge">AbstractDB</code> has been at v0.1 for 2 years and still has the original definition of <code class="language-plaintext highlighter-rouge">fetch</code>.
There are several other DB packages being used in other packages that use the documented <code class="language-plaintext highlighter-rouge">fetch</code>, but nobody has noticed they are different.
Now the interface package doesn’t serve its purpose, as other packages using the interface as documented will fail to run the rogue implementations.
New implementations will have to guess at the secret interface set up by the rogue packages in order to fit into the existing ecosystem.
The version of the interface being used is dependent upon multiple packages, potentially causing chaos!</p>
<h2 id="test-utils">Test Utils</h2>
<p>Having a robust set of test utilities gives an interface package enough muscle to enforce the contract it sets out.</p>
<p>A good test utility for an interface package contains two key components:</p>
<ul>
<li>A test fake: a simplified implementation of the type for user packages to play with.</li>
<li>A test suite codifying a minimum set of constraints an implementation must obey to be considered legitimate.</li>
</ul>
<h3 id="the-test-suite">The Test Suite</h3>
<h4 id="how">How?</h4>
<p>The test suite is a function or set of functions that work as a validation test for a concrete implementation of a type.
It should codify the expectations of the abstract type (e.g., function signatures, return types) as precisely as possible.
The tests should check that a new implementation does not break the contract that an abstract type sets out.</p>
<p>The test suite should simply test that the interface for the type works as expected.
It does not replace unit tests for the correctness of output or any special-cased behaviour, nor can it be expected to check any implementation details or corner cases.
The test suites should be run as part of the unit tests for any implementation of the type.
If the API changes, it should give instant feedback when the expectations of the abstract type are not met.</p>
<p>A minimal test suite for the database example might look like the following:</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="s">"""
test_interface(db::AbstractDB, table::String, features::Vector{Feature})
Check if an `AbstractDB` follows the documented structure.
The arguments supplied to this function should be identical to those for a
successful `query`.
"""</span>
<span class="k">function</span><span class="nf"> test_interface</span><span class="x">(</span><span class="n">db</span><span class="o">::</span><span class="n">AbstractDB</span><span class="x">,</span> <span class="n">table</span><span class="o">::</span><span class="kt">String</span><span class="x">,</span> <span class="n">features</span><span class="o">::</span><span class="kt">Vector</span><span class="x">{</span><span class="n">Feature</span><span class="x">})</span>
<span class="n">q</span> <span class="o">=</span> <span class="n">query</span><span class="x">(</span><span class="n">db</span><span class="x">,</span> <span class="n">table</span><span class="x">,</span> <span class="n">features</span><span class="x">)</span>
<span class="nd">@test</span> <span class="n">q</span> <span class="k">isa</span> <span class="n">AbstractResponse</span>
<span class="n">df</span> <span class="o">=</span> <span class="n">fetch</span><span class="x">(</span><span class="n">db</span><span class="x">,</span> <span class="n">resp</span><span class="x">)</span>
<span class="nd">@test</span> <span class="n">df</span> <span class="k">isa</span> <span class="n">DataFrame</span>
<span class="nd">@test</span> <span class="n">filter</span><span class="x">(</span><span class="n">df</span><span class="x">,</span> <span class="n">features</span><span class="x">)</span> <span class="o">==</span> <span class="n">df</span>
<span class="c"># Check easy_query gives the same results as long form</span>
<span class="nd">@test</span> <span class="n">easy_query</span><span class="x">(</span><span class="n">db</span><span class="x">,</span> <span class="n">table</span><span class="x">,</span> <span class="n">features</span><span class="x">)</span> <span class="o">==</span> <span class="n">df</span>
<span class="n">resp2</span> <span class="o">=</span> <span class="n">fetch</span><span class="x">(</span><span class="n">db</span><span class="x">,</span> <span class="x">[</span><span class="n">resp</span><span class="x">,</span> <span class="n">resp</span><span class="x">])</span>
<span class="nd">@test</span> <span class="n">resp2</span> <span class="k">isa</span> <span class="n">AbstractResponse</span>
<span class="nd">@test</span> <span class="n">fetch</span><span class="x">(</span><span class="n">db</span><span class="x">,</span> <span class="n">resp</span><span class="x">)</span> <span class="k">isa</span> <span class="n">DataFrame</span>
<span class="k">end</span>
</code></pre></div></div>
<h4 id="why">Why?</h4>
<p>The test suite puts the power to define the API back into the hands of the interface package.
A package using these tests can be trusted to have defined the functions we need in the form that we expect, so long as the tests pass.
When the interface package changes, the failing tests will serve as a sign to update all related packages to stay consistent with the interface.</p>
<p>The test suite is also handy for creating new implementations.
The test suite is ready-made to be used in <a href="https://en.wikipedia.org/wiki/Test-driven_development">test-driven development</a> and helps clear up any ambiguities that might exist in the docstrings.</p>
<h3 id="the-test-fake">The Test Fake</h3>
<h4 id="how-1">How?</h4>
<p>Test fakes are a kind of <a href="https://en.wikipedia.org/wiki/Test_double">test double</a>.
They have a working implementation but take some shortcuts, which makes them suitable only for testing functionality.
Ideally, a test fake should:</p>
<ul>
<li>Be easy to construct.</li>
<li>Only enact the minimum expectations of the API for the abstract type.</li>
<li>Give easily verifiable outputs.</li>
</ul>
<p>Test fakes should be used in tests for any package using the abstract type in order to avoid depending on any “real” version.</p>
<p>Below we show what a lazy implementation of a test fake for the database example might look like:</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span><span class="nc"> FakeDB</span> <span class="o"><:</span> <span class="n">AbstractDB</span>
<span class="n">data</span><span class="o">::</span><span class="kt">Dict</span><span class="x">{</span><span class="kt">String</span><span class="x">,</span> <span class="n">DataFrame</span><span class="x">}</span>
<span class="k">end</span>
<span class="k">struct</span><span class="nc"> FakeResponse</span> <span class="o"><:</span> <span class="n">AbstractResponse</span>
<span class="n">fetched</span><span class="o">::</span><span class="n">DataFrame</span>
<span class="k">end</span>
<span class="n">FakeDB</span><span class="x">(</span><span class="n">data_dir</span><span class="o">::</span><span class="n">Path</span><span class="x">)</span> <span class="o">=</span> <span class="n">FakeDB</span><span class="x">(</span><span class="n">populate_data</span><span class="x">(</span><span class="n">data_dir</span><span class="x">))</span>
<span class="k">function</span><span class="nf"> query</span><span class="x">(</span><span class="n">db</span><span class="o">::</span><span class="n">FakeDB</span><span class="x">,</span> <span class="n">table</span><span class="o">::</span><span class="kt">String</span><span class="x">,</span> <span class="n">features</span><span class="o">::</span><span class="kt">Vector</span><span class="x">{</span><span class="n">Feature</span><span class="x">})</span>
<span class="n">data</span> <span class="o">=</span> <span class="n">db</span><span class="x">[</span><span class="n">table</span><span class="x">]</span>
<span class="k">return</span> <span class="n">FakeResponse</span><span class="x">(</span><span class="n">filter</span><span class="x">(</span><span class="n">data</span><span class="x">,</span> <span class="n">features</span><span class="x">))</span>
<span class="k">end</span>
<span class="n">fetch</span><span class="x">(</span><span class="o">::</span><span class="n">FakeDB</span><span class="x">,</span> <span class="n">r</span><span class="o">::</span><span class="n">FakeResponse</span><span class="x">)</span> <span class="o">=</span> <span class="n">r</span><span class="o">.</span><span class="n">fetched</span>
<span class="n">fetch</span><span class="x">(</span><span class="o">::</span><span class="n">FakeDB</span><span class="x">,</span> <span class="n">rs</span><span class="o">::</span><span class="kt">Vector</span><span class="x">{</span><span class="n">FakeResponse</span><span class="x">})</span> <span class="o">=</span> <span class="n">join</span><span class="x">(</span><span class="n">fetch</span><span class="o">.</span><span class="x">(</span><span class="n">rs</span><span class="x">)</span><span class="n">…</span><span class="x">)</span>
</code></pre></div></div>
<p>To make sure things are consistent, the <code class="language-plaintext highlighter-rouge">test_interface</code> function from the section above should be run on <code class="language-plaintext highlighter-rouge">FakeDB</code> as part of the tests for <code class="language-plaintext highlighter-rouge">AbstractDB</code>.</p>
<h4 id="why-1">Why?</h4>
<p>Because the test fake is included as part of the interface package, it can be trusted to enact the interface as advertised.
Any code using the fake can be expected to work with any legitimate implementation of the interface that is already passing tests with the test suite.
Since they are minimal examples, they are unlikely to stray from the interface-defined API by adding extra functions that user packages may mistakenly begin to depend on.</p>
<p>Test fakes also help to keep user packages in line with the interface package versions.
When the interface changes, the fakes change and the user code can be fixed to stay consistent with the interface package.
Test fakes can be especially useful shortcuts if a “real” implementation would be awkward to construct or would rely on network access, as they are usually locally-hosted, simplified examples.</p>
<h2 id="conclusion">Conclusion</h2>
<p>Robust testing for interface packages helps reduce confusion about how abstract types work.
People who are new to the interface can look at the test suite to know how information will flow and can use the included tests as a sanity check while making changes to a concrete implementation.
The test fakes give users a toy example to play with and a minimum set of requirements for what a “real” implementation must do.</p>
<p>The test utilities also create a clear separation of concerns between the packages implementing the code and the packages using the code.
Combined with the docstrings, the test suite should provide implementer packages with all the structure the code needs.
Likewise, the test fakes should let user packages test their code without having to depend on a possibly incorrect implementer package.
Neither side should have to look at the details of the other to know how to use the interface itself.</p>
<p>Lastly, forcing both user and implementer packages to test against the interface package guarantees that the version number of the interface package dictates the version of the interface API both can use.
If an interface adds or changes a function and the implementer hasn’t added it, the test suite will fail.
If an interface removes or changes a function and the user package is testing against the test fake, the tests will fail.
If two packages are added using conflicting versions of the API, assuming both packages have bounded the interface package, we get a useful version conflict rather than unexpected behaviour.</p>
<p>Adding interface testing helps interface packages do their job, helping users be sure that when they run their code, a piece will fit where it should without the whole puzzle falling apart.</p>Sam MorrisonOver the last two years, our Julia codebase has grown in size and complexity, and is now the centerpiece of both our operations and research. This implies that we need to routinely replace parts of the system like puzzle pieces, and carefully test if the results lead to improvements along various dimensions of performance. However, those pieces are not designed to work in isolation, and thus cannot be tested in a vacuum. They also tend to be quite large and complex, and therefore often need to be independent packages, which need to “fit” together in precise ways determined by the rest of the system. This is where interfaces come in handy: they are tools for isolating and precisely testing specific pieces of a large system.JuliaCon 2020 in Retrospective2020-08-12T00:00:00+00:002020-08-12T00:00:00+00:00https://invenia.github.io/blog/2020/08/12/juliacon<p>We at Invenia are heavy users of Julia, and are proud to <a href="/blog/2019/08/09/juliacon/">once again</a> have been a part of <a href="https://juliacon.org/2020/">this year’s JuliaCon</a>. This was the first year the conference was fully online, with about 10,000 registrations and 26,000 people tuning in. Besides being <a href="https://www.youtube.com/watch?v=xN6ZXfKHDPI">sponsors</a> of the conference, Invenia also had several team members attending, helping host sessions, and presenting some of their work.</p>
<p>This year we had five presentations: <a href="https://www.youtube.com/watch?v=XI58hlGA7Is">“Design documents are great, here’s why you should consider one”</a>, by Matt Brzezinski; <a href="https://www.youtube.com/watch?v=B4NfkkkJ7rs">“ChainRules.jl”</a>, by Lyndon White; <a href="https://www.youtube.com/watch?v=xUpX-k0oZmo">“HydroPowerModels.jl: Impacts of Network Simplifications”</a>, by Andrew Rosemberg; <a href="https://www.youtube.com/watch?v=nq6X-w5xgLo">“Convolutional Conditional Neural Processes in Flux”</a>, by Wessel Bruinsma; and <a href="https://www.youtube.com/watch?v=dysmEpX1QoE">“Fast Gaussian processes for time series”</a>, by Will Tebbutt.</p>
<p>JuliaCon always brings some really exciting work, and this year it was no different. We are eager to share some of our highlights.</p>
<h3 id="juliacon-is-not-just-about-research">JuliaCon is not just about research</h3>
<p>There were a lot of good talks and workshops at JuliaCon this year, but one which stood out was “Building microservices and applications in Julia”, by Jacob Quinn. This workshop was about creating a music album management microservice, and provided useful information for both beginners and more experienced users. Jacob explained how to define the architectural layers, solving common problems such as authentication and caching, as well as deploying the service to Google Cloud Platform.</p>
<p>A very interesting aspect of the talk was that it exposed Julia users to the field of software engineering. JuliaCon usually has a heavy emphasis on academic and research-focused talks, so it was nice to see the growth of a less represented field within the community. There were a few other software engineering related talks, but having a hands-on practical approach is a great way to showcase a different approach to architecting code.</p>
<p>Among the other software engineering talks and posters, we can highlight <a href="https://live.juliacon.org/talk/KCP9NT">“Reproducible environments with Singularity”</a>, by Steffen Ridderbusch; the aforementioned <a href="https://youtu.be/XI58hlGA7Is">“Design documents are great, here’s why you should consider one”</a>, by Matt Brzezinski; <a href="https://youtu.be/nkSuEkmsB28">“Dispatching Design Patterns”</a>, by Aaron Christianson; and <a href="https://live.juliacon.org/uploads/posters/M8KTBL.pdf">“Fantastic beasts and how to show them”</a>, by Joris Kraak.</p>
<h3 id="but-it-stays-strong-in-the-machine-learning-community">But it stays strong in the machine learning community</h3>
<p>The conference kicked off with a brief and fun session on work related to Gaussian processes, including our own Will Tebbutt who talked about <a href="https://www.youtube.com/watch?v=dysmEpX1QoE">TemporalGPs.jl</a>, which provides fast inference for certain types of GP models for time series, as well as Théo Galy-Fajou’s talk on <a href="https://www.youtube.com/watch?v=0fKGICZrk3w">KernelFunctions.jl</a>. Although there was no explicit talk on the topic, there were productive discussions about the move towards a common set of abstractions provided by <a href="https://github.com/JuliaGaussianProcesses/AbstractGPs.jl/">AbstractGPs.jl</a>.</p>
<p>It was also great to see so many people at the <a href="https://discourse.julialang.org/t/juliacon-2020-birds-of-a-feather/39181">Probabilistic Programming Bird of a Feather</a>, and it feels like there is a proper community in Julia working on various approaches to problems in Probabilistic Programming. There were discussions around helpful abstractions, and whether there are common ones that can be more widely shared between projects. A commitment was made to having monthly discussions aimed at understanding how the wider community is approaching Probabilistic Programming.</p>
<p>Another interesting area that ties into both our work on <a href="https://github.com/JuliaDiff/ChainRules.jl/">ChainRules.jl</a>, the AD ecosystem and the Probabilistic Programming world, is Keno Fischer’s work. He has been working on improving the degree to which you can manipulate the compiler and changing the points at which you can inject additional compiler passes. This intends to mitigate the type-inference issues that plague <a href="https://github.com/jrevels/Cassette.jl">Cassette.jl</a> and <a href="https://github.com/FluxML/IRTools.jl">IRTools.jl</a>. Those issues lead to problems in <a href="https://github.com/FluxML/Zygote.jl/">Zygote.jl</a> (and other tools). We expect great things from changes to how compiler pass injection works with the compiler’s usual optimisation passes.</p>
<p>Finally, Chris Elrod’s work on <a href="https://github.com/chriselrod/LoopVectorization.jl">LoopVectorization.jl</a> is very exciting for performance. <a href="https://www.youtube.com/watch?v=qz2kJdVDWi0">His talk</a> contained an interesting example involving Automatic Differentiation (AD), and we’re hoping to help him integrate this insight into <a href="https://github.com/JuliaDiff/ChainRules.jl/">ChainRules.jl</a> in the upcoming months.</p>
<h3 id="as-well-as-in-the-engineering-community">As well as in the engineering community</h3>
<p>This year we saw a significant number of projects on direct applications to engineering, including interesting work on <a href="https://pretalx.com/juliacon2020/talk/review/APWY839YWNAYXCG9GXSVWJJLP7LQ98DW">steel truss design</a> and <a href="https://pretalx.com/juliacon2020/talk/review/KY87TTQHX9BSHQPDT8HHSTDVZ3G8CJJG">structural engineering</a>. Part of why the engineering community is fond of Julia is the type structure paired with multiple dispatch, which allows developers to easily extend types and functions from other packages, and build complex frameworks in a Lego-like manner.</p>
<p>A direct application of Julia in engineering that leverages the existing ecosystem is <a href="https://www.youtube.com/watch?v=xUpX-k0oZmo">HydroPowerModels.jl</a>, developed by our own Andrew Rosemberg. HydroPowerModels.jl is a tool for planning and simulating the operation of hydro-dominated power systems. It builds on three main dependencies (PowerModels.jl, SDDP.jl, and JuMP.jl) to efficiently construct and solve the desired problem.</p>
<p>The pipeline for HydroPowerModels.jl uses <a href="https://github.com/lanl-ansi/PowerModels.jl">PowerModels.jl</a>—a package for parsing system data and modeling optimal power flow (OPF) problems—to build the OPF problem as a JuMP.jl model. Then the model is modified in <a href="https://github.com/jump-dev/JuMP.jl">JuMP.jl</a> to receive the appropriate hydro variables and constraints. Lastly, it is passed to <a href="https://github.com/odow/SDDP.jl">SDDP.jl</a>, which builds the multistage problem and provides a solution algorithm (SDDP) to solve it.</p>
<h3 id="there-were-several-tools-for-working-with-networks-and-graphs">There were several tools for working with networks and graphs</h3>
<p>As a company that works on problems related to electricity grids, new developments on how to deal with networks and graphs are always interesting. Several talks this year featured useful new tools.</p>
<p><a href="https://github.com/yuehhua/GeometricFlux.jl">GeometricFlux.jl</a> adds to <a href="https://github.com/FluxML/Flux.jl">Flux.jl</a> the capability to perform deep learning on graph-structured data. This area of research is <a href="https://arxiv.org/abs/1611.08097">opening up</a> new opportunities in diverse applications such as social network analysis, protein folding, and natural language processing. GeometricFlux.jl defines several types of graph-convolutional layers. Also of particular interest is the ability to define a <code class="language-plaintext highlighter-rouge">FeaturedGraph</code>, where you specify not just the structure of the graph, but can also provide feature vectors for individual nodes and edges.</p>
<p>Practical applications of networks were shown in talks on economics and energy systems.</p>
<p>Work done by the Federal Reserve Bank of New York on <a href="https://www.youtube.com/watch?v=q3KoMloafwY">Estimation of Macroeconomic Models</a> showed how Julia is being applied to speed up calculations on equilibrium models, which are a classic way of simulating the interconnections in the economy and how interventions such as policy changes can have rippling impacts through the system. Similarly, work by the National Renewable Energy Laboratory (NREL) on <a href="https://www.youtube.com/watch?v=IU4PVKTVNTI">Intertwined Economic and Energy Analysis using Julia</a> demonstrated equilibrium models that couple economic and energy systems.</p>
<p>Quite a few talks dealt specifically with power networks. These systems can be computationally challenging to model, particularly when considering the complexity of actual large-scale power grids and not simple test cases. <a href="https://www.youtube.com/watch?v=GrmnbDYr6mM">NetworkDynamics.jl</a> allows for modelling dynamic systems on networks, by bridging existing work in LightGraphs.jl and DifferentialEquations.jl. This has, in turn, been used to help build <a href="https://juliaenergy.github.io/PowerDynamics.jl/stable/">PowerDynamics.jl</a>. Approaches to speed up power simulations were discussed in <a href="https://www.youtube.com/watch?v=RKtIxZfhdXU">A Parallel Time-Domain Power System Simulation Toolbox in Julia</a>. Finally, another talk by NREL on a <a href="https://www.youtube.com/watch?v=kQNOG4tGJdg">Crash Course in Energy Systems Modeling & Analysis with Julia</a> showed off a collection of packages for power simulations they are developed.</p>
<h3 id="this-year-the-whole-event-happened-online">This year the whole event happened online</h3>
<p>It may not have been the JuliaCon we envisioned, but the organisers this year did an incredible job in adjusting to extraordinary circumstances and hosting an entirely virtual conference.</p>
<p>A distinct silver lining in moving online is that attendance was free, which opened the conference up to a much larger community. The boost in attendance no doubt increased the engagement with contributors to the Julia project and provided presenters with a much wider audience than would otherwise be possible in a lecture hall.</p>
<p>Even with the usual initialization issues with conference calls (“Can you hear me now?”), the technical set-up of the conference was superb. In previous years, JuliaCon had the talks swiftly available on YouTube and this year they outdid themselves by simultaneously live-streaming multiple tracks. Being able to pause and rewind live talks and switch between tracks without leaving the room made for a convenient viewing experience. The Discord forum also proved great for interacting with others and for asking questions in a manner that may have appealed to the more shy audience members.</p>
<p>Perhaps the most pivotal, yet inconspicuous, benefit of hosting JuliaCon online is the considerably reduced carbon footprint. Restricted international movement has brought to light the travel industry’s impact on the planet and international conferences have their role to play. Maybe the time has come for communities that are underpinned by strong social and scientific principles, like the Julia community, to make the reduction of emissions an explicit priority in future gatherings.</p>
<p>In spite of JuliaCon’s overall success, there are still kinks to iron out in the online conference experience: the digital interface makes it difficult to spontaneously engage with other participants, which tends to be one of the main reasons to attend conferences in the first place, and the lack of “water cooler”-talk (although <a href="https://gather.town/rBrwIUqeDkb5JTxu/juliacon2020">Gather.Town</a> certainly helped in providing a similar experience) means missed connections and opportunities for ideas to cross-pollinate. Not for a lack of trying, JuliaCon seemed to miss an atmosphere that can only be captured by being in the same physical space as the community. We don’t doubt that the online experience will improve in the future one way or the other, but JuliaCon certainly hit the ground running.</p>
<p>We look forward to seeing what awaits for JuliaCon 2021, and we’ll surely be part of it once more, however it happens.</p>Andrew Rosemberg, Chris Davis, Glenn Moynihan, Matt Brzezinski, and Will TebbuttWe at Invenia are heavy users of Julia, and are proud to once again have been a part of this year’s JuliaCon. This was the first year the conference was fully online, with about 10,000 registrations and 26,000 people tuning in. Besides being sponsors of the conference, Invenia also had several team members attending, helping host sessions, and presenting some of their work.The Hitchhiker’s Guide to Research Software Engineering: From PhD to RSE2020-07-07T00:00:00+00:002020-07-07T00:00:00+00:00https://invenia.github.io/blog/2020/07/07/software-engineering<p>In 2017, the twilight days of my PhD in computational physics, I found myself ready to leave academia behind.
While my research was interesting, it was not what I wanted to pursue full time.
However, I was happy with the type of work I was doing, contributing to research software, and I wanted to apply myself in a more industrial setting.</p>
<p>Many postgraduates face a similar decision.
A <a href="https://royalsociety.org/-/media/Royal_Society_Content/policy/publications/2010/4294970126.pdf">study conducted by the Royal Society</a> in 2010 reported that only 3.5% of PhD graduates end up in permanent research positions in academia.
Leaving aside the roots of the <a href="https://jakevdp.github.io/blog/2013/10/26/big-data-brain-drain/">brain drain</a> on Universities, it is a compelling statistic that the <em>vast</em> majority of post-graduates end up leaving academia for industry at some point in their career.
It comes as no surprise that there are a growing number of bootcamps like <a href="https://www.s2ds.org/index.html">S2DS</a>, <a href="https://faculty.ai/">faculty.ai</a>, and <a href="https://insightfellows.com/data-science">Insight</a> that have sprung up in response to this trend, for machine learning and data science especially.
There are also no shortage of helpful <a href="https://news.ycombinator.com/item?id=17944306">forum discussions</a> and <a href="https://pascalbugnion.net/blog/from-academia-to-data-science.html">blog posts</a> outlining what you should do in order to “break into the industry”, as well as many that relate the personal experiences of those who ultimately made the switch.</p>
<p>While the advice that follows in this blog post is directed at those looking to change careers, it would equally benefit those who opt to remain in the academic track.
Since the environment and incentives around building academic research software are very different to those of industry, the workflows around the former are, in general, not guided by the same engineering practices that are valued in the latter.</p>
<p>That is to say: <em>there is a difference between what is important in writing software for research, and for a user-focused, software product</em>.
Academic research software prioritises scientific correctness and flexibility to experiment above all else in pursuit of the researchers’ end product: published papers.
Industry software, on the other hand, prioritises maintainability, robustness, and testing as the software (generally speaking) <em>is</em> the product.</p>
<p>However, the two tracks share many common goals as well, such as catering to “users”, emphasising performance and <a href="https://codecheck.org.uk/">reproducibility</a>, but most importantly both ventures are <em>collaborative</em>.
Arguably then, both sets of principles are needed to write and maintain high-quality research software.
Incidentally, the <a href="https://society-rse.org/about/">Research Software Engineering</a> group at Invenia is uniquely tasked with incorporating all these incentives into the development of our research packages in order to get the best of both worlds.
But I digress.</p>
<h2 id="what-i-wish-i-knew-in-my-phd">What I wish I knew in my PhD</h2>
<p>Most postgrads are self-taught programmers and learn from the same resources as their peers and collaborators, which are ostensibly adequate for academia.
Many also tend to work in isolation on their part of the code base and don’t require merging with other contributors’ work very frequently.
In industry, however, <a href="https://en.wikipedia.org/wiki/Continuous_integration">continuous integration</a> underpins many development workflows.
Under a continuous delivery cycle, a developer benefits from the prompt feedback and cooperation of a full team of professional engineers and can, therefore, learn to implement engineering best practices more efficiently.</p>
<p>As such, it feels like a missed opportunity for universities not to promote good engineering practices more and teach them to their students.
Not least because having stable and maintainable tools are, in a sense, <a href="https://en.wikipedia.org/wiki/Commons#Digital_commons">“public goods”</a> in academia as much as industry.
Yet, while everyone gains from improving the tools, researchers are not generally incentivised to invest their precious time or effort on these tasks unless it is part of some well-funded, high-impact initiative.
As Jake VanderPlas <a href="https://jakevdp.github.io/blog/2013/10/26/big-data-brain-drain/">remarked</a>: “any time spent building and documenting software tools is time spent not writing research papers, which are the primary currency of the academic reward structure”.</p>
<p>Speaking personally, I learned a great deal about conducting research and scientific computing in my PhD; I could read and write code, squash bugs, and I wasn’t afraid of getting my hands dirty in monolithic code bases.
As such, I felt comfortable at the command line but I failed to learn the basic tenets of proper code maintenance, unit testing, code review, version control, etc., that underpin good software engineering.
While I had enough coding experience to have a sense of this at the time, I lacked the awareness of what I needed to know in order to improve or even where to start looking.</p>
<p>As is clear from the earlier statistic, this experience is likely not unique to me.
It prompted me to share what I’ve learned since joining Invenia 18 months ago, so that it might guide those looking to make a similar move.
The advice I provide is organised into three sections: the first recommends ways to learn a new programming language efficiently<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>; the second describes some best practices you can adopt to improve the quality of the code you write; and the last commends the social aspect of community-driven software collaborations.</p>
<h2 id="lesson-1-hone-your-craft">Lesson 1: Hone your craft</h2>
<p><strong>Practice</strong>: While clichéd, there is no avoiding the fact that it takes consistent practice <a href="https://norvig.com/21-days.html">over many many years</a> to become masterful at anything, and programming is no exception.</p>
<p><strong>Have personal projects</strong>: Practicing is easier said than done if your job doesn’t revolve around programming.
A good way to get started either way is to undertake personal side-projects as a fun way to get to grips with a language, for instance via <a href="https://projecteuler.net/">Project Euler</a>, <a href="https://www.kaggle.com/">Kaggle Competitions</a>, etc.
These should be enough to get you off the ground and familiar with the syntax of the language.</p>
<p><strong>Read code</strong>: Personal projects on their own are not enough to improve.
If you really want to get better, you’ve got to read other people’s code: a lot of it.
Check out the repositories of some of your favourite or most used packages—particularly if they are considered “high quality”<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup>.
See how the package is organised, how the documentation is written, and how the code is structured.
Look at the open issues and pull requests.
Who are the main contributors? Get a sense of what is being worked on and how the open-source community operates.
This will give you an idea of the open issues facing the package and the language and the direction it is taking.
It will also show you how to write <a href="https://stackoverflow.com/questions/84102/what-is-idiomatic-code"><em>idiomatic code</em></a>, that is, in a way that is natural for that language.</p>
<p><strong>Contribute</strong>: You should actually contribute to the code base you use.
This is by far the most important advice for improving and I cannot overstate how instructive an experience this is.
By getting your code reviewed you get prompt and informative feedback on what you’re doing wrong and how you can do better.
It gives you the opportunity to try out what you’ve learned, learn something new, and improves your confidence in your ability.
Contributing to open source and seeing your features being used is also rewarding, and that starts a positive feedback loop where you feel like contributing more.
Further, when you start applying for jobs in industry people can see your work, and so know that you are good at what you do (I say this as a person who is now involved in reviewing these applications).</p>
<p><strong>Study</strong>: Learning by experience is great but—at least for me—it takes a deliberate approach to formalise and cement new ideas.
Read well-reviewed books on your language (appropriate for your level) and reinforce what you learn by tackling more complex tasks and venturing <a href="https://www.geeksaresexy.net/2016/12/20/comfort-zone-comic/">outside your comfort zone</a>.
Reading blog posts and articles about the language is also a great idea.</p>
<p><strong>Ask for help:</strong> Sometimes a bug just stumps you, or you just don’t know how to implement a feature.
In these circumstances, it’s quicker to reach out to experts who can help and maybe teach you something at the same time.
More often than not, someone has had the same problem or they’re happy to point you in the right direction.
I’m fortunate to work with Julia experts at Invenia, so when I have a problem they are always most helpful.
But posting on public fora like <a href="https://slackinvite.julialang.org/">Slack</a>, <a href="https://discourse.julialang.org/">Discourse</a>, or <a href="https://stackoverflow.com/">StackOverflow</a> is an option we all have.</p>
<h2 id="lesson-2-software-engineering-practices">Lesson 2: Software Engineering Practices</h2>
<p>With respect to the environment and incentives in industry surrounding code maintainability, robustness, and testing, there are certain practices in place to encourage, enable, and ensure these qualities are met.
These key practices can turn a collection of scripts into a fully implemented package one can use and rely upon with high confidence.</p>
<p>While there are without doubt many universities and courses that teach these practices to their students, I find they are often neglected by coding novices and academics alike, to their own disadvantage.</p>
<p><strong>Take version control seriously:</strong> <a href="https://git-scm.com/">Git</a> is a programming staple for version control, and while it is tempting to disregard it when working alone, without it you soon find yourself creating <a href="https://uidaholib.github.io/get-git/1why.html">convoluted naming schemes</a> for your files; frequently losing track of progress; and wasting time looking through email attachments for the older version of the code to replace the one you just messed up.</p>
<p><a href="https://git-scm.com/">Git</a> can be <a href="https://xkcd.com/1597/">a little intimidating</a> to get started, but once you are comfortable with the basic commands (fetch, add, commit, push, pull, merge) and a few others (checkout, rebase, reset) you will never look back.
<a href="https://github.com/">GitHub</a>’s utility, meanwhile, extends far beyond that of a programmatic hosting service; it provides <a href="https://guides.github.com/features/wikis/">documentation hosting</a>, <a href="https://help.github.com/en/actions/building-and-testing-code-with-continuous-integration/about-continuous-integration">CI/CD pipelines</a>, and many other features that enable efficient cross-party collaboration on an <em>enterprise</em> scale.</p>
<p>It cannot be overstated how truly indispensable <a href="https://git-scm.com/">Git</a> and <a href="https://github.com/">GitHub</a> are when it comes to turning your code into functional packages, and the earlier you adopt these the better.
It also helps to know how <a href="https://semver.org/">semantic versioning</a> works, so you will know what it means to increment a package version from 1.2.3 to 1.3 and why.</p>
<p><strong>Organise your code</strong>: In terms of packaging your code, get to know the typical package folder structure.
Packages often contain src, docs, and test directories, as well as standard artefacts like a README, to explain what the package is about, and a list of dependencies, e.g.
Project and Manifest files in Julia, or requirements.txt in Python.
Implementing the familiar package structure keeps things organised and enables yourself and other users to navigate the contents more easily.</p>
<p><strong>Practice code hygiene</strong>: This relates to the readability and maintainability of the code itself.
It’s important to practice <a href="https://medium.com/@anishmahapatra/code-hygiene-dont-laugh-it-off-2a5aebcdd84b">good hygiene</a> if you want your code to be used, extended, and maintained by others.
Bad code hygiene will turn off other contributors—and eventually yourself—leaving the package unused and unmaintained.
Here are some tips for ensuring good hygiene:</p>
<ul>
<li>Take a <strong>design-first</strong> approach when creating your package.
Think about the intended user(s) and what their requirements are—this may be others in your research group or your future self.
Sometimes this can be difficult to know in advance but working iteratively is better than trying to capture all possible use cases at once.</li>
<li>Think about how the <a href="https://en.wikipedia.org/wiki/Application_programming_interface">API</a> should work and how it integrates with other packages or applications.
Are you building on something that already exists or is your package creating something entirely new?</li>
<li>There should be a style guide for writing in the language, for example, <a href="https://github.com/invenia/BlueStyle/">BlueStyle</a> in Julia and <a href="https://www.python.org/dev/peps/pep-0008/">PEP 8</a> in Python.
You should adhere to it so that your code follows the same standard as everyone else.</li>
<li>Give your variables and functions meaningful, and memorable names.
There is no advantage to obfuscating your code for the sake of brevity.</li>
<li>Furthermore, read up on the language’s <a href="https://en.wikipedia.org/wiki/Software_design_pattern">Design Patterns</a>.
These are the common approaches or techniques used in the language, which you will recognise from reading the code.
These will help you write better, more idiomatic code.</li>
</ul>
<p><strong>Write good documentation</strong>: The greatest package ever written would never be used if nobody knew how it worked.
At the very least your code should be commented and a README accompanying the package explaining to your users (and your future self) what it does and how to install and use it.
You should also attach docstrings to all user-facing (aka public) functions to explain what they do, what inputs they take, what data types they return, etc.
This also applies to some internal functions, to remind maintainers (including you) what they do and how they are used.
Some minimum working examples of how to use the package features are also a welcome addition.</p>
<p>Lastly, documentation should evolve with the package; when the API changes or new use-cases get added these should be reflected in the latest documentation.</p>
<p><strong>Write good tests</strong>: Researchers in computational fields might find familiar the practice of running “canonical experiments” or “reproducibility tests” that check if the code produces the correct result for some pipeline and is therefore “calibrated”.
But these don’t necessarily provide good or meaningful <a href="https://en.wikipedia.org/wiki/Code_coverage">test coverage</a>.
For instance, canonical experiments, by definition, test the software within the limits of its intended use.
This will not reveal latent bugs that only manifest under certain conditions, e.g.
when encountering corner cases.</p>
<p>To capture these you need to write adequate <em>Unit and Integration Tests</em> that cover all expected corner cases to be reasonably sure your code is doing what it should.
Even then you can’t guarantee there isn’t a corner case you haven’t considered, but testing certainly helps.</p>
<p>If you do catch a bug it’s not enough to fix it and call it a day; you need to write a new test to replicate it and you will only have fixed the bug only when that new test passes.
This new test prevents <a href="https://stackoverflow.com/questions/3464629/what-does-regression-test-mean">regressions</a> in behaviour if the bug ever returns.</p>
<h2 id="lesson-3-take-part-in-the-community">Lesson 3: Take Part in the Community</h2>
<p>Undertaking a fraction of the points above would be more than enough to boost your ability to develop software.
But the return on investment is compounded by taking part in the community forums on <a href="https://slackinvite.julialang.org/">Slack</a> and <a href="https://discourse.julialang.org/">Discourse</a>; joining organizations on <a href="https://github.com/">GitHub</a>; and attending <a href="https://www.meetup.com/London-Julia-User-Group/">Meetups</a> and <a href="https://juliacon.org/">conferences</a>.
Taking part in a collaboration (and meeting your co-developers) fosters a strong sense of community that supports continual learning and encouragement to go and do great things.
In smaller communities related to a particular tool or niche language, you may even become well-known such that your potential future employer (or some of their engineers) are already familiar with who you are before you apply.</p>
<h2 id="takeaway">Takeaway</h2>
<p>Personal experience has taught me that the incentives in academic research can be qualitatively different from those in industry, despite the overlap they share.
However, the practices that are instilled in one track don’t necessarily translate off-the-shelf to the other, and switching gears between these (often competing) frameworks can initially induce an all-too-familiar sense of <a href="https://ardalis.com/the-more-you-know-the-more-you-realize-you-dont-know">imposter syndrome</a>.</p>
<p>It’s important to remember that what you learn and internalise in a PhD is, in a sense, “selected for” according to the incentives of that environment, as outlined above.
However, under the auspices of a supportive community and the proper guidelines, it’s possible to become more well-rounded in your skillset, as I have.
And while I still have much more to learn, it’s encouraging to reflect on what I have learned during my time at Invenia and share it with others.</p>
<p>Although this post could not possibly relay everything there is to know about software engineering, my hope is that simply being exposed to the lexicon will serve as a springboard to further learning.
To those looking down such a path, I say: you will make many many mistakes, as one always does at the outset of a new venture, but that’s all part of learning.</p>
<h2 id="notes">Notes</h2>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:1" role="doc-endnote">
<p>While these tips are language-agnostic, they would be particularly helpful for anyone interested in learning or improving with <a href="https://julialang.org/">Julia</a>. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:2" role="doc-endnote">
<p>Examples of high quality packages include the <a href="https://github.com/psf/requests">Requests</a> in Python, and <a href="https://github.com/invenia/NamedDims.jl">NamedDims.jl</a> in Julia. <a href="#fnref:2" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>Glenn MoynihanIn 2017, the twilight days of my PhD in computational physics, I found myself ready to leave academia behind. While my research was interesting, it was not what I wanted to pursue full time. However, I was happy with the type of work I was doing, contributing to research software, and I wanted to apply myself in a more industrial setting.Impacts of statewide lockdowns on the United States power grid2020-05-13T00:00:00+00:002020-05-13T00:00:00+00:00https://invenia.github.io/blog/2020/05/13/covid-part3<p>In the first two parts of this series we looked at the effects of national lockdowns on electricity usage and production, and in particular how European energy demand has <a href="https://invenia.github.io/blog/2020/03/31/covid-part1/">decreased significantly</a>. This decrease has caused a sharp decline in power production from fossil fuels compared to renewable sources and led to emissions reductions equivalent to the annual carbon footprint of approximately <a href="https://invenia.github.io/blog/2020/04/17/covid-part2/">1 million people</a>.<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup></p>
<p>In this post, we turn our attention to the United States and review how the changing patterns of energy use have impacted planning and the environmental efficiency of electricity systems. We’ll then look at how the generation fuel mix has changed due to a rare abundance of available energy, and the effects this has on wholesale electricity markets.</p>
<p>As of May 7th 2020, the United States as a whole has been the country <a href="https://coronavirus.jhu.edu/map.html">worst affected</a> by Covid-19. However, statewide lockdowns have differed in both when they were introduced, and the severity of the imposed restrictions. The first states to issue stay-at-home orders did so on March 21st, and by April 5th a large majority of the US was under partial or full <a href="https://en.wikipedia.org/wiki/U.S._state_and_local_government_response_to_the_COVID-19_pandemic">lockdown</a>. With this in mind we first look at how these restrictions have affected the demand for electricity.</p>
<h2 id="declining-demand">Declining Demand</h2>
<p><img src="/blog/public/images/covid3-demand_heatmap_long.png" alt="Percentage demand change" />
Figure 1: Percentage difference in demand by hour of day in March and April 2020 compared to 2019. A linear temperature adjustment has been applied to correct for demand reductions due to increases in temperature.<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup></p>
<p>As was the case for many European countries, demand for electricity has decreased across most of the United States during April 2020 (Figure 1). Electricity demand recorded by the transmission operator for New York (<a href="https://www.nyiso.com">NYISO</a>) shows the largest decreases between the hours of 8am and 7pm, when many people would usually be either commuting or at their workplace. This data suggests that the <em>decrease</em> in industrial and commercial electricity demand from the closure of non-essential businesses, likely outweighs the <em>increase</em> in residential energy use we would expect now that many people are working from home.</p>
<p>For <a href="https://www.misoenergy.org">MISO</a> and <a href="https://www.pjm.com">PJM</a>, who oversee electricity systems in central and eastern states, we see reduced demand across April, with MISO showing the largest decline during the morning peak hours. The demand reported by the California Independent System Operator (<a href="http://www.caiso.com/Pages/default.aspx">CAISO</a>) also shows reductions between the hours of 10am and 7pm. However, it is likely that a large part of this effect can be attributed to <em>increases</em> in <em>behind-the-meter</em> solar generation<sup id="fnref:3" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">3</a></sup> and less to do with the statewide lockdown. In Texas and the southwest, the figures reported by <a href="http://www.ercot.com">ERCOT</a> and <a href="https://spp.org">SPP</a> break the trend and show <em>increased</em> demand over several weeks in April. Part of this increase is likely due to higher April temperatures in the south causing a rise in the use of residential air conditioning.</p>
<p><img src="/blog/public/images/covid3-Demand_drops_long.png" alt="Average of electricity demand" />
Figure 2: 7 day rolling average of electricity demand in March and April 2020 (purple), 2019 (orange).</p>
<p>Taking a step back to look at the long term trend in electricity demand (see Figure 2), we observe a clear turning point for some of the ISOs, coinciding with the order of statewide lockdowns. For MISO, and SPP, we see a similar demand for electricity during the majority of March, after which the demand this year begins plummeting in comparison to 2019 levels. In California, CAISO were initially reporting greater demand than in 2019, however, as <a href="https://en.wikipedia.org/wiki/U.S._state_and_local_government_response_to_the_COVID-19_pandemic">restrictions were introduced</a> during the latter half of March, a declining trend emerges which puts the 2020 demand well below what we saw last year. For the eastern ISOs, namely NYISO and PJM, the effect of lockdown on demand is less apparent, and in Texas, the demand reported by ERCOT in 2020 looks to be above the levels seen in 2019. Again, we can attribute much of this observed demand increase to a greater need for residential air conditioning, as the eastern and southern states have been experiencing a <a href="https://www.ncdc.noaa.gov/sotc/national/202003">hot</a> start to spring.</p>
<p>Whilst demand for electricity has decreased in the US, the effect of statewide lockdowns is not as striking as seen in <a href="https://invenia.github.io/blog/2020/03/31/covid-part1/">Europe</a>. This may be due to different states, which are overseen by the same ISO, introducing restrictions that vary in both severity and time of implementation, compared to the national lockdowns implemented in Europe. Nevertheless, daily patterns of energy consumption are far from the norm, and we next go on to explore how these changes are affecting electricity system planning.</p>
<h2 id="how-are-the-isos-responding-to-the-unusual-changes-to-demand-for-electricity">How are the ISOs responding to the unusual changes to demand for electricity?</h2>
<p>With the uncertainties in demand for electricity caused by statewide lockdowns, combined with the usual uncertainties inherent to the weather, demand forecasting for electricity has become more difficult for many of the ISOs. We find that this difficulty has led to considerably larger demand forecast errors (see Figure 3) than we have seen in previous years.</p>
<p><img src="/blog/public/images/covid3-ISO_forecast_error.png" alt="Average demand forecast error" />
Figure 3: Hourly average weekday percent demand forecast error (\(\frac{\text{demand} - \text{forecast}}{\text{demand}}\)) over the final weeks in March and April. The 2017–2019 average <em>over the same period</em> is shown in blue.</p>
<p>The errors for MISO, PJM and NYISO clearly spike negative at 6am. This spike is due to over-forecasting the ramp up towards the morning peak demand, occurring around 9am/10am. With the shelter-in-place orders issued, these ISOs are struggling to adjust to the new morning patterns of a population now working at home. SPP looks to be less affected until the week beginning April 20th, when we see chronic over-forecasting throughout all hours of the day. This pattern is apparent in Figure 1, where the demand for electricity in SPP was relatively unaffected until the final two weeks of April.</p>
<h2 id="how-are-these-uncertainties-affecting-the-efficiency-of-electricity-systems">How are these uncertainties affecting the efficiency of electricity systems?</h2>
<p>As we discussed in the <a href="https://invenia.github.io/blog/2020/03/31/covid-part1/">first post</a> of this series, incorrect demand forecasts can reduce both the economic and environmental efficiency of electricity grids by causing divergence between the <a href="https://learn.pjm.com/three-priorities/buying-and-selling-energy/energy-markets.aspx">day ahead and real-time energy markets</a>, and increasing the volume of energy <a href="https://leveltenenergy.com/blog/ppa-risk-management/renewable-energy-curtailment/">curtailed</a> to mitigate line congestion. Whilst we do not observe unusually large deviations between day ahead and real-time energy prices, we are seeing <em>record breaking</em> curtailment of combined wind and solar energy in California. Curtailment by CAISO has been growing year on year due to rising solar capacity, and <a href="https://www.greentechmedia.com/articles/read/california-renewable-curtailments-spike-as-coronavirus-reduces-demand">previous reports</a> indicated that this year would be no different.</p>
<p>Figure 4 (right) shows that curtailments in CAISO have reached ~300GWh in April 2020, soaring above the <a href="https://www.spglobal.com/platts/en/market-insights/latest-news/electric-power/022520-curtailments-rising-with-renewables-increasing-on-the-cal-iso-grid">previous record</a> of ~223GWh set in May 2019. To put this volume of curtailed energy into perspective, the average annual household consumption in the <a href="https://www.eia.gov/tools/faqs/faq.php?id=97&t=3">US</a> is ~11,000kWh, which means the volume of curtailed renewable power in April is equivalent to the annual electricity usage of almost 30,000 homes. Looking at Figure 4 (left) we see that the volume of solar power produced in the first four months of 2020 is only marginally higher than the production levels seen in 2019. It is more likely that the combination of <em>increased</em> intermittent power production with <em>decreased</em> demand due to statewide lockdown, has forced CAISO to curtail more renewable power than ever before.</p>
<p><img src="/blog/public/images/covid3-curtail.png" alt="Solar generation" />
Figure 4: Left: monthly solar generation in CAISO, right: Solar and wind curtailments in CAISO throughout 2019 and 2020.</p>
<p>Growing curtailment of power has long been a <a href="https://www.greentechmedia.com/articles/read/californias-flood-of-green-energy-could-drive-a-record-6-to-8-gigawatts-of">concern</a> for those in the control room at CAISO. This increase in wasted power highlights a lack of flexibility in the system in dealing with oversupply, and the effect of the current pandemic on reducing energy demand has amplified this further. To mitigate against additional growth in curtailment of renewable energy, and ensure less energy is wasted, utilities in California have set <a href="https://www.greentechmedia.com/articles/read/southern-california-edison-picks-770mw-of-energy-storage-projects-to-be-built-by-next-year">ambitious targets</a> to increase grid storage capacity.</p>
<h2 id="how-is-the-generation-mix-changing-to-accommodate-reduced-demand">How is the generation mix changing to accommodate reduced demand?</h2>
<p>As we saw in <a href="https://invenia.github.io/blog/2020/04/17/covid-part2/">Europe</a>, coal and gas based generation of electricity has declined due to national lockdowns. We find a similar pattern for PJM and MISO (see figure 5), where the hourly average production from coal has decreased ~7GW<sup id="fnref:4" role="doc-noteref"><a href="#fn:4" class="footnote" rel="footnote">4</a></sup> for both markets. Whilst some of this decrease is due to <a href="https://www.pjm.com/planning/services-requests/gen-deactivations.aspx">planned coal plant retirements</a> making the way for cheaper generation technologies, the demand reductions we observe across the US are most likely to impact coal production before any other resource. This is because electricity markets are based on a type of auction, with the lowest cost generators receiving priority to serve demand. Coal powered generation is more <a href="https://eu.usatoday.com/story/news/2019/06/04/climate-change-coal-now-more-expensive-than-wind-solar-energy/1277637001/">expensive</a> compared to natural gas and renewable production, which means coal power plants will be edged out of the auction when there is less demand for electricity.</p>
<p><img src="/blog/public/images/covid3-gen_drops_US.png" alt="Generation changes" />
Figure 5: Average hourly change in production by fuel type between April 2020 and April 2019.</p>
<p>Looking at real time market wide energy prices, for most ISOs we can see that leading up to the outbreak of Covid-19, electricity prices were already down compared to 2019 levels (Figure 6). Coming into April, most ISOs had lower prices than at any time this year, and whilst prices continue to be low for some of the ISOs, an upwards trend emerges throughout the lockdown period.</p>
<p>The marketwide energy price is dependent on the least-cost generators available, and there are several factors, such as changing weather conditions and fluctuations in fuel supply, which can affect the generators available to serve demand. As an example, natural gas is one of the <a href="https://www.bbc.co.uk/news/business-45881551">cheapest fuels</a> for electricity generation. However, fluctuations in crude oil prices can have large impacts on the cost of natural gas generation, because a form of gas fuel, known as associated gas, is a <a href="https://energy.economictimes.indiatimes.com/news/oil-and-gas/irony-oil-price-collapse-helps-u-s-natural-gas-market/74559815">by-product of crude oil</a>. With the recent <a href="https://www.bbc.co.uk/news/business-52350082">crash</a> in oil prices where we saw negative oil futures for the first time in history, the production of associated gas has likely decreased throughout April. In the US, where close to <a href="https://www.eia.gov/tools/faqs/faq.php?id=427&t=3">40%</a> of generation is based on natural gas sources, a reduced supply of natural gas may be causing rises in electricity prices.</p>
<p><img src="/blog/public/images/covid3-MEC_rol_average.png" alt="Marginal energy cost" />
Figure 6: 7 day rolling average of the real time market-wide marginal energy cost for 2020 (purple) and 2019 (orange).</p>
<h2 id="conclusions">Conclusions</h2>
<p>In this final post in our series, we have seen how electricity systems in the United States have responded to the changing patterns of electricity use induced by statewide lockdowns. We find demand forecasting has become more difficult for many of the ISOs, with CAISO forced to curtail <em>record</em> amounts of renewable power to manage oversupply and mitigate line overloading. Further, as was the case for Europe, we see coal generation taking the brunt of the decline in energy production, as lower cost generators are able to serve the depleted load. Finally, we explored how wholesale electricity prices have been affected by these changes. Whilst prices during lockdown have been the lowest seen this year, confounding factors such as the recent crash in oil prices may have led to rising wholesale electricity prices throughout April.</p>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:1" role="doc-endnote">
<p>Our estimates only account for nine of the European countries most heavily affected by Covid-19. A later report performs a similar analysis for the whole of the EU27+UK, finding emissions reductions equivalent to that of <a href="https://www.carbonbrief.org/analysis-coronavirus-has-cut-co2-from-europes-electricity-system-by-39-per-cent">roughly 5 million people</a>. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:2" role="doc-endnote">
<p>The linear temperature correction we apply is valid in mild temperatures when energy demand decreases as the demand for heating is reduced. When temperature exceeds ~20˚C, the relationship is <a href="https://www.sciencedirect.com/science/article/pii/S0306261918311437">more complex</a>, and was not adjusted for in this analysis. <a href="#fnref:2" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:3" role="doc-endnote">
<p>Behind-the-meter generation is power production that is not controlled by the electricity system operator. In the case of solar power, this is often from residential solar panels. This shows up as demand reductions, as more people are providing power for themselves, using their onsite power generation systems. <a href="#fnref:3" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:4" role="doc-endnote">
<p>The largest coal plants in the US have <a href="https://www.carbonbrief.org/mapped-worlds-coal-power-plants">~3.5GW capacity</a>, so the reductions we see correspond to two of the largest plants halting operations. The annual emissions of these plants is roughly 16Mt of CO₂, meaning that closure for one month would correspond to 1.3Mt of CO₂ emissions saved per coal plant. <a href="#fnref:4" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>Ian GoddardIn the first two parts of this series we looked at the effects of national lockdowns on electricity usage and production, and in particular how European energy demand has decreased significantly. This decrease has caused a sharp decline in power production from fossil fuels compared to renewable sources and led to emissions reductions equivalent to the annual carbon footprint of approximately 1 million people.1 Our estimates only account for nine of the European countries most heavily affected by Covid-19. A later report performs a similar analysis for the whole of the EU27+UK, finding emissions reductions equivalent to that of roughly 5 million people. ↩