SEEING DIFFERENTLY, ACTING SIMILARLY: HETEROGENEOUSLY OBSERVABLE IMITATION LEARN-ING

Abstract

In many real-world imitation learning tasks, the demonstrator and the learner have to act under different observation spaces. This situation brings significant obstacles to existing imitation learning approaches, since most of them learn policies under homogeneous observation spaces. On the other hand, previous studies under different observation spaces have strong assumptions that these two observation spaces coexist during the entire learning process. However, in reality, the observation coexistence will be limited due to the high cost of acquiring expert observations. In this work, we study this challenging problem with limited observation coexistence under heterogeneous observations: Heterogeneously Observable Imitation Learning (HOIL). We identify two underlying issues in HOIL: the dynamics mismatch and the support mismatch, and further propose the Importance Weighting with REjection (IWRE) algorithm based on importance weighting and learning with rejection to solve HOIL problems. Experimental results show that IWRE can solve various HOIL tasks, including the challenging tasks of transforming the vision-based demonstrations to random access memory (RAM)-based policies in the Atari domain, even with limited visual observations.

1. INTRODUCTION

Imitation Learning (IL), which studies how to learn a good policy by imitating the given demonstrations (Xu et al., 2020; Chen et al., 2022) , has made significant progress in real-world applications such as autonomous driving (Chen et al., 2019 ), health care (Iyer et al., 2021) , and continuous control (Wang et al., 2023) . In tradition, the expert and the learner are assumed to use the same observation space. However, nowadays, many real-world IL tasks demand to remove this assumption (Chen et al., 2019; Warrington et al., 2021) , such as in autonomous driving (Chen et al., 2019), recommendation system (Wu et al., 2019) , and medical decision making (Wang et al., 2021a) . Taking AI for medical diagnosis as an example in Figure 1 : A medical AI is learning to make medical decisions based on expert doctor demonstrations. To ensure demonstration quality, the expert may use high-cost observations such as CT, MRI, and B-ultrasound. In contrast, the AI learner is ideal to use only low-cost observations from cheaper devices, which could be newly designed ones that have not been used previously by the expert. Meanwhile, to ensure reliability, it is also reasonable to allow the learner to access the high-cost observations during training under a limited budget (Yu et al., 2019) . The above examples share three characteristics: (i) Even though a pair of expert and learner observations can be different, they are under the same state of the environment, leading to similar policies; (ii) The learner's new observations are not available to the expert when generating demonstrations; (iii) During training, the learner can only access expert observations under a limited budget, in special the high-cost ones, since it is also important to minimize the usage of the

