How is the vector-Jacobian product invoked in Neural ODEs

This post just tries to explicate the claim in Deriving the Adjoint Equation for Neural ODEs Using Lagrange Multipliers that the vector-Jacobian product $\lambda^\intercal \frac{\partial f}{\partial z}$ can be calculated efficiently without explicitly constructing the Jacobian $\frac{\partial f}{\partial z}$. The claim is made in the Solving PL, PG, PM with Good Lagrange Multiplier section. This post is inspired by a question asked about this topic in the comments post there....

February 21, 2020 · 9 min · 1787 words · Vaibhav Patel

Deriving the Adjoint Equation for Neural ODEs using Lagrange Multipliers

A Neural ODE ​1​ expresses its output as the solution to a dynamical system whose evolution function is a learnable neural network. In other words, a Neural ODE models the transformation from input to output as a learnable ODE. Since our model is a learnable ODE, we use an ODE solver to evolve the input to an output in the forward pass and calculate a loss. For the backward pass, we would like to simply store the function evaluations of the ODE solver and then backprop through them to calculate the loss gradient....

February 4, 2020 · 14 min · 2932 words · Vaibhav Patel