LOCALIZED META-LEARNING: A PAC-BAYES ANAL-YSIS FOR META-LEARNING BEYOND GLOBAL PRIOR

Abstract

Meta-learning methods learn the meta-knowledge among various training tasks 1 and aim to promote the learning of new tasks under the task similarity assumption. 2 Such meta-knowledge is often represented as a fixed distribution; this, however, 3 may be too restrictive to capture various specific task information because the 4 discriminative patterns in the data may change dramatically across tasks. In this 5 work, we aim to equip the meta learner with the ability to model and produce 6 task-specific meta knowledge and, accordingly, present a localized meta-learning 7 framework based on the PAC-Bayes theory. In particular, we propose a Local 8 Coordinate Coding (LCC) based prior predictor that allows the meta learner to 9 generate local meta-knowledge for specific tasks adaptively. We further develop a 10 practical algorithm with deep neural network based on the bound. Empirical results 11 on real-world datasets demonstrate the efficacy of the proposed method. 12



& Pratt, 2012), especially for empowering deep neu- & Larochelle, 2017). More concretely, the neural u γ u (x) = 1. It induces the following physical approximation of x in R d : x = u∈C γ u (x)u.



Figure1: Illustration of the localized metalearning framework. Instead of using global meta-knowledge for all tasks, we tailor the meta-knowledge for various specific task.

(Lipschitz Smoothness Yu et al. (2009).) A function f (x) in R d is a (α, β)-Lipschitz 89 smooth w.r.t. a norm • if f (x) -f (x ) ≤ α xx and f (x ) -f (x) -∇f (x) (x -x) ≤ 90 β xx 2 .91 Definition 2. (Coordinate Coding Yu et al. (2009).) A coordinate coding is a pair (γ, C), where 92 C ⊂ R d is a set of anchor points(bases), and γ is a map of x ∈ R d to [γ u (x)] u∈C ∈ R |C| such that

annex

each task uses distinct discriminative patterns and thus the desired meta-knowledge is required 46 to extract these patterns simultaneously. It could be a challenging problem to represent it with a 47 global hyperposterior since the most significant patterns in the first task could be irrelevant or even 48 detrimental to the second task. Figure schematically illustrates this notion. Therefore, customized 49 meta-knowledge such that the patterns are most discriminative for a given task is urgently desired.

50

Can the meta-knowledge be adaptive to tasks? How can one achieve it? Intuitively, we could 51 implement this idea by reformulating the meta-knowledge as a maping function. Leveraging the 52 samples in the target task, the meta model produces tasks specific meta-knowledge. 

