RELATIONAL LEARNING WITH VARIATIONAL BAYES

Abstract

In psychology, relational learning refers to the ability to recognize and respond to relationship among objects irrespective of the nature of those objects. Relational learning has long been recognized as a hallmark of human cognition and a key question in artificial intelligence research. In this work, we propose an unsupervised learning method for addressing the relational learning problem where we learn the underlying relationship between a pair of data irrespective of the nature of those data. The central idea of the proposed method is to encapsulate the relational learning problem with a probabilistic graphical model in which we perform inference to learn about data relationships and other relational processing tasks.

1. INTRODUCTION

American Psychological Association defines relational learning as (VandenBos & APA, 2007) : Definition 1.1 (Relational learning). Learning to differentiate among stimuli on the basis of relational properties rather than absolute properties. In other words, relational learning refers to the ability to recognize and respond to relationship (called relational property) among objects irrespective of the nature of those objects (called absolute property). Relational learning has long been recognized as a hallmark of human cognition, and there has been substantial research showing that adequate cognitive capacity is necessary for relational processing (Biederman, 1987; Medin et al., 1993; Holyoak, 2012; Doumas & Hummel, 2013; Gentner, 2016) . As a machine learning application, relational learning can provide new insight into data analysis by dissecting information in the data into relational property and absolute property. However, in order to discover relationship patterns among raw and unknown data, relational learning is only truly useful if it can be achieved without supervised data. A key challenge in learning relational property with machine learning-based methods is that relational property is an abstract construct; unlike absolute property, which is based on observable data and can be quantitatively measured, relational property is an abstract quantity that is difficult to objectively quantify, especially when the learning is unsupervised. In this work, we propose an unsupervised learning method-variational relation learning (VRL)-for addressing the relational learning problem. The proposed method is completely unsupervised, which means that the learning does not require a labeled training dataset nor training examples that have the same (or different) relational property. At its core, VRL encapsulates the relational learning problem with a probabilistic graphical model (PGM) in which we perform inference to learn about relational property and other relational processing tasks. Furthermore, our main learning algorithm is derived from the PGM using first principles, which gives us the flexibility to use any compatible computational inference method and still retains the desired properties of the proposed method. Our contribution in this paper is threefold. First, we propose a PGM that encapsulates the relational learning problem. Second, we formulate various relational processing tasks as performing inference and learning in the PGM. Third, we propose an efficient and effective learning algorithm that can be trained end-to-end and unsupervised.

2. PROBLEM DEFINITION

We begin with formulating the relational learning problem as a machine learning problem: we observed a paired dataset X = { (a (i) , b (i) ) | i ∈ [1..N ] } consisting of N i.i.d samples generated from a joint distribution p( a ∈ A, b ∈ B ); our goal is to learn a relational property between a (i) and b (i) irrespective of their absolute property. Furthermore, we want the learning to be unsupervised, e.g., we do not require a labeled dataset, such as (a (i) , b (i) , z (i) ) where z (i) is a target variable indicating (a (i) , b (i) )'s relational property, nor do we require training examples that have the same (or different) relational property. There are two distinct features that separate our problem formulation from other unsupervised learning problem formulations: 1. We dissect the information in X into relational property and absolute property; relational property characterizes the relationship between a (i) and b (i) , whereas absolute property represents specific features that independently describe a (i) and b (i) . 2. Our goal is to learn a relational property among X irrespective of its absolute property, i.e., we want to learn a relational property that is decoupled from the absolute property. In addition, we are interested in two related relational processing tasks: relational discriminationfoot_0 and relational mappingfoot_1 (VandenBos & APA, 2007).Relational discrimination allows us to differentiate (a (i) , b (i) ) from (a (j) , b (j) ) based on their relational properties, while relational mapping allows us to apply the relational property of (a (i) , b (i) ) to a different set of data, for example, deduce that b (j) is related to a (j) in the same way that b (i) is related to a (i) .

3. METHOD

Here we introduce the proposed VRL method for addressing the relational learning problem and discuss various optimization challenges unique to VRL.

3.1. VARIATIONAL RELATION LEARNING

The proposed VRL method consists of two parts: first, we encapsulate the relational learning problem with a PGM, called VRL-PGM; we then formulate various relational processing tasks as performing inference and learning in VRL-PGM. , 2006) . In VRL-PGM, the absolute property can be interpreted as representing the dependency between a and



Definition (Relational discrimination in condition). A discrimination based on the relationship between or among stimuli rather than on absolute features of the stimuli. Definition (Relational mapping) The ability to apply what one knows about one set of elements to a different set of elements.



Figure 1: VRL-PGM: a probabilistic graphical model for representing the relational learning problem; the observed random variables a and b are generated from some random process (parameterized by θ) involving an unobserved random variable z.

