POLYRETRO: FEW-SHOT POLYMER RETROSYNTHESIS VIA DOMAIN ADAPTATION

Abstract

Polymers appear everywhere in our daily lives -fabrics, plastics, rubbers, etc. -and we could hardly live without them. To make polymers, chemists develop processes that combine smaller building blocks (monomers) to form long chains or complex networks (polymers). These processes are called polymerizations and will usually take lots of human efforts to develop. Although machine learning models for small molecules have generated lots of promising results, the prediction problem for polymerization is new and suffers from the scarcity of polymerization datasets available in the field. Furthermore, the problem is made even more challenging by the large size of the polymers and the additional recursive constraints, which are not present in the small molecule problem. In this paper, we make an initial step towards this challenge and propose a learning-based search framework that can automatically identify a sequence of reactions that lead to the polymerization of a target polymer with minimal polymerization data involved. Our method transfers models trained on small molecule datasets for retrosynthesis to check the validity of polymerization reaction. Furthermore, our method also incorporates a template prior learned on a limited amount of polymer data into the framework to adapt the model from small molecule to the polymer domain. We demonstrate that our method is able to propose high-quality polymerization plans for a dataset of 52 real-world polymers, of which more than 50% successfully recovers the currently-in-used polymerization processes in the real world.

1. INTRODUCTION

Human beings are living in a world of chemical products, among which a category of chemicals, called polymers, is playing an essential role. Ranging from fabrics to plastics to rubbers, polymers are appearing in every corner of our daily lives. Polymers with different properties are desired when used in different circumstances, and chemists have been spending tremendous effort to design and synthesize new polymers in the pursuit of ones with better properties. To make polymers, chemists develop processes that combine small building blocks, which we call monomers, to form longer chains or complex networks. Such processes are called polymerization and will take a significant amount of human effort to develop. Since the rise of deep learning (LeCun et al., 2015) , applying these models to science problems like biology and chemistry ones have gradually gathered attentions. Specifically, the applications of AI methods in the retrosynthetic design of chemical compounds have become very popular recently (Segler et al., 2018; Coley et al.) . While most work focuses on synthesizing drug-like small molecules, the study of polymer retrosynthesis is still at its infancy. The reasons are multifold, but one of the most important ones being the lack of available polymerization datasets, which poses difficulties for existing learning-based methods to learn meaningful pattern for polymerization reactions. Moreover, polymers usually have a chain or network structure with repeat units, which is very different from small molecules. This additional constraints also introduces difficulties in the formulation and modeling of polymer design/retrosynthesis. In this paper, we focus specifically on the polymer retrosynthesis problem. While there has been a series of work focusing on small molecule retrosynthesis (Corey & Wipke, 1969; Gasteiger et al., 1992; Coley et al., 2017; Liu et al., 2017; Segler & Waller, 2017; Segler et al., 2018; Coley et al.;  

