DON'T THROW YOUR OLD POLICIES AWAY: KNOWLEDGE-BASED POLICY RECYCLING PROTECTS AGAINST ADVERSARIAL ATTACKS

Abstract

Recent work has shown that Deep Reinforcement Learning (DRL) is vulnerable to adversarial attacks, in which minor perturbations of input signals cause agents to behave inappropriately and unexpectedly. Humans, on the other hand, appear robust to these particular sorts of input variations. We posit that this part of robustness stems from accumulated knowledge about the world. In this work, we propose to leverage prior knowledge to defend against adversarial attacks in RL settings using a framework we call Knowledge-based Policy Recycling (KPR). Different from previous defense methods such as adversarial training and robust learning, KPR incorporates domain knowledge over a set of auxiliary tasks policies and learns relations among them from interactions with the environment via a Graph Neural Network (GNN). KPR can use any relevant policy as an auxiliary policy and, importantly, does not assume access or information regarding the adversarial attack. Empirically, KPR results in policies that are more robust to various adversarial attacks in Atari games and a simulated Robot Foodcourt environment.

1. INTRODUCTION

Despite significant performance breakthroughs in recent years (Mnih et al., 2015; Silver et al., 2016; Berner et al., 2019, e.g.,) , Deep Reinforcement Learning (DRL) policies can be brittle. Specifically, recent works have shown that DRL policies are vulnerable to adversarial attacks -adversarially manipulated inputs (e.g., images) of small magnitude can cause RL agents to take incorrect actions (Ilahi et al., 2022; Chen et al., 2019; Behzadan & Munir, 2017; Oikarinen et al., 2021; Lee et al., 2021; Chan et al., 2020; Bai et al., 2018) . To counter such attacks, recent work has proposed a range of defense strategies including adversarial training (Oikarinen et al., 2021; Behzadan & Munir, 2018; Han et al., 2018 ), robust learning (Mandlekar et al., 2017; Smirnova et al., 2019; Pan et al., 2019) , defensive distillation (Rusu et al., 2016) , and adversarial detection (Gallego et al., 2019a; Havens et al., 2018; Gallego et al., 2019a) . While these defense methods can be effective, each has its limitations; adversarial training and adversarial detection require specific knowledge about the attacker. Robust learning adds noise during agent training, which can degrade performance (Tsipras et al., 2019; Yang et al., 2020) . Defensive distillation is typically unable to protect against diverse adversarial attacks (Carlini & Wagner, 2016; Soll et al., 2019) . In this work, we explore an alternative defense strategy that exploits existing knowledge encoded in auxiliary task policies and known relationships between the policies. The key intuition underlying our approach is that existing task policies encode learnt low-level knowledge regarding the environment (e.g., possible observations, dynamics), whilst high-level specifications can provide guidance for transfer or generalization. Our approach is to leverage known and learnt relations between different policies as structural priors for an ensemble of policies; our hypothesis is that while a single task policy can be attacked, perturbing inputs such that multiple policies are negatively affected in a consistent manner is more difficult. Our framework, which we call Knowledge-based Policy Fusion (KPR), is partially inspired by the use of domain knowledge to address vulnerabilities to adversarial attacks in supervised learning (Melacci et al., 2021; Gürel et al., 2021; Zhang et al., 2022) . In these works, domain knowledge is encoded as logical formulae over predicted labels and a set of features. A soft satisfiability score between

