MID-VISION FEEDBACK

Abstract

Feedback plays a prominent role in biological vision, where perception is modulated based on agents' evolving expectations and world model. We introduce a novel mechanism which modulates perception based on high level categorical expectations: Mid-Vision Feedback (MVF). MVF associates high level contexts with linear transformations. When a context is "expected" its associated linear transformation is applied over feature vectors in a mid level of a network. The result is that mid-level network representations are biased towards conformance with high level expectations, improving overall accuracy and contextual consistency. Additionally, during training mid-level feature vectors are biased through introduction of a loss term which increases the distance between feature vectors associated with different contexts. MVF is agnostic as to the source of contextual expectations, and can serve as a mechanism for top down integration of symbolic systems with deep vision architectures. We show the superior performance of MVF to post-hoc filtering for incorporation of contextual knowledge, and show superior performance of configurations using predicted context (when no context is known a priori) over configurations with no context awareness. 1

1. INTRODUCTION

In most contemporary computer vision architectures information flows in a single direction: from low-level of pixels up to high level abstract concepts (e.g., object categories) -such architectures are termed feed-forward architectures. In general, each successive layer of the network contains more abstract representations than the previous, and the representational hierarchy mirrors the architectural hierarchy. It is also possible to introduce top-down connections into the network architecture, introducing high level information into processes involving lower levels of abstraction in a process of feedback. Feedback plays a primary role in biological vision; in fact, the majority of neural connections in the visual cortex are top-down, rather than bottom-up, connections (Markov et al., 2014) . These topdown connections are thought to convey information of higher level expectation, and neurons of the visual cortex use both higher level expectation as well as lower level visual information in producing their representations. Expectations in biological systems arise from continuous engagement with the environment. In Computer Vision, this is reflected in the paradigm of Active Vision (Bajcsy, 1988; Fermüller & Aloimonos, 1995) , where perception is framed as an active problem involving evolving world models. 2012) from low level input is under-constrained -many plausible mid-level interpretations may be consistent with input. To give an intuition for how understanding of context can impact perception of mid-level features consider Figure 1 -characteristics of shrews and kiwi differ, but may be similar enough to be confused without context. Top-down feedback -from high level context to mid-level visual features -provides a "map" for mid-level processes, constraining it towards high level consistency.



Code will be available at: https://github.com/maynord/Mid-Vision-Feedback 1



The task of producing mid-level visual representations Teo et al. (2015a;b); Xu et al. (2012); Nishigaki et al. (

