How is the vector-Jacobian product invoked in Neural ODEs
This post just tries to explicate the claim in Deriving the Adjoint Equation for Neural ODEs Using Lagrange Multipliers that the vector-Jacobian product $\lambda^\intercal \frac{\partial f}{\partial z}$ can be calculated efficiently without explicitly constructing the Jacobian $\frac{\partial f}{\partial z}$. The claim is made in the Solving PL, PG, PM with Good Lagrange Multiplier section. This post is inspired by a question asked about this topic in the comments post there....