FLOW ANNEALED IMPORTANCE SAMPLING BOOTSTRAP

Abstract

Normalizing flows are tractable density models that can approximate complicated target distributions, e.g. Boltzmann distributions of physical systems. However, current methods for training flows either suffer from mode-seeking behavior, use samples from the target generated beforehand by expensive MCMC methods, or use stochastic losses that have high variance. To avoid these problems, we augment flows with annealed importance sampling (AIS) and minimize the masscovering α-divergence with α = 2, which minimizes importance weight variance. Our method, Flow AIS Bootstrap (FAB), uses AIS to generate samples in regions where the flow is a poor approximation of the target, facilitating the discovery of new modes. We apply FAB to multimodal targets and show that we can approximate them very accurately where previous methods fail. To the best of our knowledge, we are the first to learn the Boltzmann distribution of the alanine dipeptide molecule using only the unnormalized target density, without access to samples generated via Molecular Dynamics (MD) simulations: FAB produces better results than training via maximum likelihood on MD samples while using 100 times fewer target evaluations. After reweighting the samples, we obtain unbiased histograms of dihedral angles that are almost identical to the ground truth.

1. INTRODUCTION

Approximating intractable distributions is a challenging task whose solution has relevance in many real-world applications. A prominent example involves approximating the Boltzmann distribution of a given molecule. In this case, the unnormalized density can be obtained by physical modeling and is given by e -u(x) , where x are the 3D atomic coordinates and u(•) returns the dimensionless energy of the system. Drawing independent samples from this distribution is difficult (Lelièvre et al., 2010) . It is typically done by running expensive Molecular Dynamics (MD) simulations (Leimkuhler & Matthews, 2015) , which yield highly correlated samples and require long simulation times. An alternative is given by normalizing flows. These are tractable density models parameterized by neural networks. They can generate a batch of independent samples with a single forward pass and any bias in the samples can be eliminated by reweighting via importance sampling. Flows are called Boltzmann generators when they approximate Boltzmann distributions (Noé et al., 2019) . Recently, there has been a growing interest in these methods (Dibak et al., 2022; Köhler et al., 2021; Liu et al., 2022) as they have the potential to avoid the limitations of MD simulations. Most current approaches to train Boltzmann generators rely on MD samples since these are required for the estimation of the flow parameters by maximum likelihood (ML) (Wu et al., 2020) . Alternatively, flows can be trained without MD samples by minimizing the Kullback-Leibler (KL) divergence with respect to the target distribution. Wirnsberger et al. (2022) followed this approach to approximate the * Equal contribution 1

