PIPS: PATH INTEGRAL STOCHASTIC OPTIMAL CON-TROL FOR PATH SAMPLING IN MOLECULAR DYNAMICS

Abstract

We consider the problem of Sampling Transition Paths: Given two metastable conformational states of a molecular system, e.g. a folded and unfolded protein, we aim to sample the most likely transition path between the two states. Sampling such a transition path is computationally expensive due to the existence of high free energy barriers between the two states. To circumvent this, previous work has focused on simplifying the trajectories to occur along specific molecular descriptors called Collective Variables (CVs). However, finding CVs is non trivial and requires chemical intuition. For larger molecules, where intuition is not sufficient, using these CV-based methods biases the transition along possibly irrelevant dimensions. In this work, we propose a method for sampling transition paths that considers the entire geometry of the molecules. We achieve this by relating the problem to recent works on the Schrödinger bridge problem and stochastic optimal control. Using this relation, we construct a path integral method that incorporates important characteristics of molecular systems such as second-order dynamics and invariance to rotations and translations. We demonstrate our method on commonly studied protein structures like Alanine Dipeptide, and also consider larger proteins such as Polyproline and Chignolin.

1. INTRODUCTION

Modeling non-equilibrium systems in natural sciences involves analyzing dynamical behaviour that occur with very low probability known as rare events, i.e. particular instances of the dynamical system that are atypical. The kinetics of many important molecular processes, such as phase transitions, protein folding, conformational changes, and chemical reactions, are all dominated by these rare events. One way to sample these rare events is to follow the time evolution of the underlying dynamical system using Molecular Dynamic (MD) simulations until a reasonable number of events have been observed. However, this is highly inefficient computationally due to the large time-scales involved in MD simulations, which are typically related to the presence of high energy or entropy barriers between the metastable states. Thus, the main problem is: How can we efficiently sample trajectories between metastable states that give rise to these rare but interesting transition events? Numerous enhanced sampling methods such as steered MD (Jarzynski, 1997), umbrella sampling (Torrie and Valleau, 1977), constrained MD (Carter et al., 1989) , transition path sampling (Dellago and Bolhuis, 2009) , and many more, have been developed to deal with the problem of rare events in molecular simulation. Most of these methods bias the dynamical system with well-chosen geometric descriptors of the transition (analogous to lower dimensional features), called collective variables (CVs), that allow the system to overcome high-energy transition barriers and sample these rare events. The performance of these enhanced sampling techniques is critically dependent on the choice of these CVs. However, choosing appropriate CVs for all but the simplest molecular systems is fraught with difficulty, as it relies on human intuition, insights about the molecular system, and trial and error. A key alternative to sampling these rare transition paths is to model an alternate dynamical system that allows sampling these rare trajectories in an optimal manner (Ahamed et al., 2006; Jack, 2020; Todorov, 2009) or by learning an optimal RL policy for such a transition system Rose et al. (2021) . In this paper, we consider the problem of sampling rare transition paths by developing an alternative dynamical system using path integral stochastic optimal control (Kappen, 2005; 2007; Kappen and Ruiz, 2016; Theodorou et al., 2010) . Our method models this alternative dynamics of the system by applying an external control policy to each of the atoms in the molecule. We learn the external control policy such that it minimizes the amount of external work needed to overcome the lowest energy barrier and transition the molecular system from an initial meta-stable state to a final one. The method does not require any knowledge of CVs to sample these rare trajectories. Furthermore, we draw connections between sampling rare transition paths and the Schrödinger bridge problem (Schrödinger, 1931; 1932) . Subsequently, we show that stochastic optimal control is well suited to solving these problems by extending the work of Kappen and Ruiz (2016) for molecular systems by incorporating Hamiltonian dynamics and equivariance constraints in our path integral SOC method. Our main contributions in this paper are: • We demonstrate the equivalence between the problem of sampling transition paths, the Schrödinger bridge problem, and path integral stochastic optimal control (SOC) ( §2). • We develop PIPS, a path integral SOC method that incorporates second order Hamiltonian dynamics with clear physical interpretations of the system ( §3). • In contrast to earlier work, PIPS does not require any knowledge of CVs, which is important for modeling large and complex molecular transitions for which CVs are unknown ( §2-3). 

2. PRELIMINARIES AND PROBLEM SETUP

Consider a system evolving over time where π(x) is the distribution of states x and π i (x i |x i-1 ) a Markovian transition kernel. The distribution of trajectories generated by such a system is given by: π x(τ ) := π(x 0 ) • τ i=1 π i (x i |x i-1 ). ( ) where x(τ ) defines a trajectory of states of length τ discretized over time into an ordered sequence of states x(τ ) = {x 0 , x 1 , • • • , x τ }. The problem of sampling transition paths involves sampling trajectories from this distribution, π x(τ ) , with the boundary condition that the initial state x 0 and terminal state x τ are drawn from pre-specified marginal distributions π 0 and π τ , respectively. These marginal distributions describe the stable states of the molecular system located at the local minimas of the free energy surface e.g. these stable states can be reactants and products of chemical reactions, or native and unfolded states of protein. Thus, these marginal distributions defining the stable states can be viewed as Dirac delta distributions. Unfortunately, these stable states are often separated by high free energy barriers making the trajectories, x(τ ), sampled starting from x 0 to terminate in the target state x τ unlikely. In this paper, we construct a sampling approach that generates trajectories that are still likely under the distribution π x(τ ) while also adhering to the boundary conditions by crossing the high free energy barrier by incorporating relevant inductive biases of the system. Formally, we find an alternate dynamical system π x(τ ) with marginals π 0 and π τ that is as close to π x(τ ) as possible, i.e. π * x(τ ) := arg min π(x(τ ))∈D(π0,πτ ) D KL π x(τ ) ∥π x(τ ) where D(π 0 , π τ ) is the space of path measures with marginals π 0 and π τ . This problem of learning an alternative dynamical system is also known as the Schrödinger Bridge Problem (SBP) (Schrödinger, 1931; 1932) . We, thus, take inspiration from recent computational advances for solving SBP (Vargas et al., 2021a; De Bortoli et al., 2021) to develop our solution in §3 to solve the problem of sampling transition paths that can efficiently cross the high free energy barriers. Additionally, in this work, we propose an alternative approach to solving SBP using path integral stochastic optimal control that lends itself well to modelling the chemical nature of our problem. In the next section, we will set the stage for this novel approach by first relating the problem of sampling transition paths as a path integral stochastic optimal control problem. Subsequently, we will



• Due to considering second order Hamiltonian dynamics, PIPS seamlessly integrates with common molecular dynamics frameworks such as OpenMM (Eastman et al., 2017). • We demonstrate the efficacy of PIPS on conformational transitions in three molecular systems of varying complexity, namely Alanine Dipeptide, Polyproline, and Chignolin ( §4).

