PRE-TRAINING VIA DENOISING FOR MOLECULAR PROPERTY PREDICTION

Abstract

Many important problems involving molecular property prediction from 3D structures have limited data, posing a generalization challenge for neural networks. In this paper, we describe a pre-training technique based on denoising that achieves a new state-of-the-art in molecular property prediction by utilizing large datasets of 3D molecular structures at equilibrium to learn meaningful representations for downstream tasks. Relying on the well-known link between denoising autoencoders and score-matching, we show that the denoising objective corresponds to learning a molecular force field -arising from approximating the Boltzmann distribution with a mixture of Gaussians -directly from equilibrium structures. Our experiments demonstrate that using this pre-training objective significantly improves performance on multiple benchmarks, achieving a new state-of-the-art on the majority of targets in the widely used QM9 dataset. Our analysis then provides practical insights into the effects of different factors -dataset sizes, model size and architecture, and the choice of upstream and downstream datasets -on pre-training.

1. INTRODUCTION

The success of the best performing neural networks in vision and natural language processing (NLP) relies on pre-training the models on large datasets to learn meaningful features for downstream tasks (Dai & Le, 2015; Simonyan & Zisserman, 2014; Devlin et al., 2018; Brown et al., 2020; Dosovitskiy et al., 2020) . For molecular property prediction from 3D structures (a point cloud of atomic nuclei in R 3 ), the problem of how to similarly learn such representations remains open. For example, none of the best models on the widely used QM9 benchmark use any form of pre-training (e.g. Klicpera et al., 2020a; Liu et al., 2022b; Schütt et al., 2021; Thölke & De Fabritiis, 2022) , in stark contrast with vision and NLP. Effective methods for pre-training could have a significant impact on fields such as drug discovery and material science. In this work, we focus on the problem of how large datasets of 3D molecular structures can be utilized to improve performance on downstream molecular property prediction tasks that also rely on 3D structures as input. We address the question: how can one exploit large datasets like PCQM4Mv2,foot_0 that contain over 3 million structures, to improve performance on datasets such as DES15K that are orders of magnitude smaller? Our answer is a form of self-supervised pre-training that generates useful representations for downstream prediction tasks, leading to state-of-the-art (SOTA) results. Inspired by recent advances in noise regularization for graph neural networks (GNNs) (Godwin et al., 2022) , our pre-training objective is based on denoising in the space of structures (and is hence self-supervised). Unlike existing pre-training methods, which largely focus on 2D graphs, our approach targets the setting where the downstream task involves 3D point clouds defining the molecular structure. Relying on the well-known connection between denoising and score-matching (Vincent, 2011; Song & Ermon, 2019; Ho et al., 2020) , we show that the denoising objective is equivalent to learning a particular force field, adding a new interpretation of denoising in the context of molecules and shedding light on how it aids representation learning.



Note that PCQM4Mv2 is a new version of PCQM4M that now offers 3D structures. 1

