COMBINING IMITATION AND REINFORCEMENT LEARNING WITH FREE ENERGY PRINCIPLE Anonymous authors Paper under double-blind review

Abstract

Imitation Learning (IL) and Reinforcement Learning (RL) from high dimensional sensory inputs are often introduced as separate problems, but a more realistic problem setting is how to merge the techniques so that the agent can reduce exploration costs by partially imitating experts at the same time it maximizes its return. Even when the experts are suboptimal (e.g. Experts learned halfway with other RL methods or human-crafted experts), it is expected that the agent outperforms the suboptimal experts' performance. In this paper, we propose to address the issue by using and theoretically extending Free Energy Principle, a unified brain theory that explains perception, action and model learning in a Bayesian probabilistic way. We find that both IL and RL can be achieved based on the same free energy objective function. Our results show that our approach is promising in visual control tasks especially with sparse-reward environments.

1. INTRODUCTION

Imitation Learning (IL) is a framework to learn a policy to mimic expert trajectories. As the expert specifies model behaviors, there is no need to do exploration or to design complex reward functions. Reinforcement Learning (RL) does not have these features, so RL agents have no clue to realize desired behaviors in sparse-reward settings and even when RL succeeds in reward maximization, the policy does not necessarily achieve behaviors that the reward designer has expected. The key drawbacks of IL are that the policy never exceeds the suboptimal expert performance and that the policy is vulnerable to distributional shift. Meanwhile, RL can achieve super-human performance and has potentials to transfer the policy to new tasks. As real-world applications often needs high sample efficiency and little preparation (rough rewards and suboptimal experts), it is important to find a way to effectively combine IL and RL. When the sensory inputs are high-dimensional images as in the real world, behavior learning such as IL and RL would be difficult without representation or model learning. Free Energy Principle (FEP), a unified brain theory in computational neuroscience that explains perception, action and model learning in a Bayesian probabilistic way (Friston et al., 2006; Friston, 2010) , can handle behavior learning and model learning at the same time. In FEP, the brain has a generative model of the world and computes a mathematical amount called Free Energy using the model prediction and sensory inputs to the brain. By minimizing the Free Energy, the brain achieves model learning and behavior learning. Prior work about FEP only dealt with limited situations where a part of the generative model is given and the task is very low dimensional. As there are a lot in common between FEP and variational inference in machine learning, recent advancements in deep learning and latent variable models could be applied to scale up FEP agents to be compatible with high dimensional tasks. Recent work in model-based reinforcement learning succeeds in latent planning from highdimensional image inputs by incorporating latent dynamics models. Behaviors can be derived either by imagined-reward maximization (Ha & Schmidhuber, 2018; Hafner et al., 2019a) or by online planning (Hafner et al., 2019b) . Although solving high dimensional visual control tasks with modelbased methods is becoming feasible, prior methods have never tried to combine with imitation. In this paper, we propose Deep Free Energy Network (FENet), an agent that combines the advantages of IL and RL so that the policy roughly learns from suboptimal expert data without the need of exploration or detailed reward crafting in the first place, then learns from sparsely specified reward functions to exceed the suboptimal expert performance. The key contributions of this work are summarized as follows: • Extension of Free Energy Principle: We theoretically extend Free Energy Principle, introducing policy prior and policy posterior to combine IL and RL. We implement the proposed method on top of Recurrent State Space Model (Hafner et al., 2019b) , a latent dynamics model with both deterministic and stochastic components. • Visual control tasks in realistic problem settings: We solve Cheetah-run, Walker-walk, and Quadruped-walk tasks from DeepMind Control Suite (Tassa et al., 2018) . We do not only use the default problem settings, we also set up problems with sparse rewards and with suboptimal experts. We demonstrate that our agent outperforms model-based RL using Recurrent State Space Model in sparse-reward settings. We also show that our agent can achieve higher returns than Behavioral Cloning (IL) with suboptimal experts. 2 BACKGROUNDS ON FREE ENERGY PRINCIPLE 

2.2. FREE ENERGY PRINCIPLE

Perception, action and model learning are all achieved by minimizing the same objective function, Free Energy (Friston et al., 2006; Friston, 2010) . In FEP, the agent is equipped with a generative model of the world, using a prior p(s t ) and a likelihood p(o t |s t ).  Since we cannot compute p(o t ) due to the integral, we think of approximating p(s t |o t ) with a variational posterior q(s t ) by minimizing KL divergence KL(q(s t )||p(s t |o t )). KL(q(s t )||p(s t |o t )) = ln p(o t ) + KL(q(s t )||p(o t , s t )) (3) F t = KL(q(s t )||p(o t , s t )) (4) We define the Free Energy as (eq.4). Since p(o t ) does not depend on s t , we can minimize (eq.3) w.r.t. the parameters of the variational posterior by minimizing the Free Energy. Thus, the agent can infer the hidden states of the observations by minimizing F t . This process is called 'perceptual inference' in FEP. Perceptual Learning Free Energy is the same amount as negative Evidence Lower Bound (ELBO) in variational inference often seen in machine learning as follows. p(o t ) ≥ -F t (5) By minimizing F t w.r.t. the parameters of the prior and the likelihood, the generative model learns to best explain the observations. This process is called 'perceptual learning' in FEP. Active Inference We can assume that the prior is conditioned on the hidden states and actions at the previous time step as follows. p(s t ) = p(s t |s t-1 , a t-1 ) The agent can change the future by choosing actions. Suppose the agent chooses a t when it is at s t , the prior can predict the next hidden state s t+1 . Thus, we can think of the Expected Free Energy G t+1 at the next time step t + 1 as follows (Friston et al., 2015) . G t+1 = KL(q(s t+1 )||p(o t+1 , s t+1 )) = E q(st+1) [ln q(s t+1 )ln p(o t+1 , s t+1 )] = E q(st+1)p(ot+1|st+1) [ln q(s t+1 )ln p(o t+1 , s t+1 )] (7) = E q(st+1)p(ot+1|st+1) [ln q(s t+1 )ln p(s t+1 |o t+1 )ln p(o t+1 )] ≈ E q(ot+1,st+1) [ln q(s t+1 )ln q(s t+1 |o t+1 )ln p(o t+1 )] (8) = E q(ot+1) [-KL(q(s t+1 |o t+1 )||q(s t+1 ))ln p(o t+1 )] (9) Since the agent has not experienced time step t + 1 yet and has not received observations o t+1 , we take expectation over o t+1 using the likelihood p(o t+1 |s t+1 ) as (eq.7). In (eq.8), we approximate p(o t+1 |s t+1 ) as q(o t+1 |s t+1 ) and p(s t+1 |o t+1 ) as q(s t+1 |o t+1 ). According to the complete class theorem (Friston et al., 2012) , any scalar rewards can be encoded as observation priors using p(o) ∝ exp r(o) and the second term in (eq.9) becomes a goal-directed value. This observation prior p(o t+1 ) can also be regarded as the probability of optimality variable p(O t+1 = 1|o t+1 ), where the binary optimality variable O t+1 = 1 denotes that time step t + 1 is optimal and O t+1 = 0 denotes that it is not optimal as introduced in the context of control as probabilistic inference (Levine, 2018) . The first term in (eq.9) is called epistemic value that works as intrinsic motivation to further explore the world. Minimization of -KL(q(s t+1 |o t+1 )||q(s t+1 )) means that the agent tries to experience as different states s t+1 as possible given some imagined observations o t+1 . By minimizing the Expected Free Energy, the agent can infer the actions that explores the world and maximize rewards. This process is called 'active inference'.

3. DEEP FREE ENERGY NETWORK (FENET)

Perceptual learning deals with learning the generative model to best explain the agent's sensory inputs. If we think of not only observations but also actions given by the expert as a part of the sensory inputs, we can explain imitation leaning by using the concept of perceptual learning. Active inference deals with exploration and reward maximization, so it is compatible with reinforcement learning. By minimizing the same objective function, the Free Energy, we can deal with both imitation and RL. In this section, we first introduce a policy prior for imitation and a policy posterior for RL. Second, we extend the Free Energy Principle to be able to accommodate these two policies in the same objective function, the Free Energy. Finally, we explain a detailed network architecture to implement the proposed method for solving image control tasks.

3.1. INTRODUCING A POLICY PRIOR AND A POLICY POSTERIOR

Free Energy We extend the Free Energy from (eq.4) so that actions are a part of sensory inputs that the generative model tries to explain. F t = KL(q(s t )||p(o t , s t , a t )) = KL(q(s t )||p(o t |s t )p(a t |s t )p(s t |s t-1 , a t-1 )) (10) = E q(st) [ln q(s t ) p(o t |s t )p(a t |s t )p(s t |s t-1 , a t-1 ) ] (11) = E q(st) [-ln p(o t |s t ) -ln p(a t |s t ) + ln q(s t ) -ln p(s t |s t-1 , a t-1 )] (12) = E q(st) [-ln p(o t |s t ) -ln p(a t |s t )] + KL(q(s t )||p(s t |s t-1 , a t-1 )) We define p(a t |s t ) as a policy prior. When the agent observes expert trajectories, by minimizing F t , the policy prior will be learned so that it can best explain the experts. Besides the policy prior, we introduce and define a policy posterior q(a t |s t ), which is the very policy that the agent samples from when interacting with its environments. We explain how to learn the policy posterior in the following. Expected Free Energy for imitation In a similar manner to active inference in Section 2.2, we think of the Expected Free Energy G t+1 at the next time step t + 1, but this time we take expectation over the policy posterior q(a t |s t ) because G t+1 is a value expected under the next actions. Note that in Section 2.2 a t was given as a certain value, but here a t is sampled from the policy posterior. We calculate the expected variational posterior at time step t + 1 as follows. q(s t+1 ) = E q(st)q(at|st) [p(s t+1 |s t , a t )] (14) q(o t+1 , s t+1 , a t+1 ) = E q(st+1) [p(o t+1 |s t+1 )q(a t+1 |s t+1 )] We extend the Expected Free Energy from (eq.12) so that the variational posterior makes inference on actions as follows. G IL t+1 = E q(ot+1,st+1,at+1) [-ln p(o t+1 |s t+1 ) -ln p(a t+1 |s t+1 ) + ln q(s t+1 , a t+1 ) -ln p(s t+1 |s t , a t )] (16) = E q(ot+1,st+1,at+1) [-ln p(o t+1 |s t+1 ) -ln p(a t+1 |s t+1 ) + ln q(a t+1 |s t+1 )] + KL(q(s t+1 )||p(s t+1 |s t , a t )) (17) = E q(ot+1,st+1) [-ln p(o t+1 |s t+1 ) + KL(q(a t+1 |s t+1 )||p(a t+1 |s t+1 ))] + KL(q(s t+1 )||p(s t+1 |s t , a t )) (18) = E q(ot+1,st+1) [-ln p(o t+1 |s t+1 ) + KL(q(a t+1 |s t+1 )||p(a t+1 |s t+1 ))] + 0 (19) = E q(st+1) [H[p(o t+1 |s t+1 )] + KL(q(a t+1 |s t+1 )||p(a t+1 |s t+1 ))] In (eq.20), the first term is the entropy of the observation likelihood, and the second term is the KL divergence between the policy prior and the policy posterior. By minimizing G IL t+1 , the agent learns the policy posterior so that it matches the policy prior which has been learned through minimizing F t to encode the experts' behavior. Expected Free Energy for RL We can get the Expected Free Energy in a different way that has a reward component r(o t+1 ) leading to the policy posterior maximizing rewards. We extend the Expected Free Energy from (eq.8) so that the variational posterior makes inference on actions as follows. G RL t+1 = E q(ot+1,st+1,at+1) [ln q(s t+1 , a t+1 ) ln p(a t+1 |s t+1 )ln q(s t+1 |o t+1 )ln p(o t+1 )] (21) = E q(ot+1,st+1) [ln q(s t+1 )ln q(s t+1 |o t+1 ) + KL(q(a t+1 |s t+1 )||p(a t+1 |s t+1 )) -ln p(o t+1 )] (22) = E q(ot+1) [-KL(q(s t+1 |o t+1 )||q(s t+1 )) -ln p(o t+1 )] + E q(st+1) [KL(q(a t+1 |s t+1 )||p(a t+1 |s t+1 ))] (23) ≈ E q(ot+1) [-KL(q(s t+1 |o t+1 )||q(s t+1 )) -r(o t+1 )] + E q(st+1) [KL(q(a t+1 |s t+1 )||p(a t+1 |s t+1 ))] In a similar manner to active inference in Section 2.2, we use p(o) ∝ exp r(o) in (eq.24). The first KL term is the epistemic value that lets the agent explore the world, the second term is the expected reward under the action sampled from the policy posterior, and the last KL term is the KL divergence between the policy prior and the policy posterior. The last KL term can be written as follows (eq.25), meaning that minimizing this term leads to maximizing the entropy of the policy posterior at the same time the policy posterior tries to match the policy prior. Thus, the expected free energy can be regarded as one of entropy maximizing RL methods. KL(q(a t+1 |s t+1 )||p(a t+1 |s t+1 )) = -H[q(a t+1 |s t+1 )] -E q(at+1|st+1) [ln p(a t+1 |s t+1 )] (25) Note that q(o t+1 ) in (eq.24) can be calculated as follows. q(o t+1 ) = E q(st+1) [p(o t+1 |s t+1 )] By minimizing G RL t+1 , the agent learns the policy posterior so that it explores the world and maximizes the reward as long as it does not deviate too much from the policy prior which has encoded experts' behavior through minimizing F t .  Go to t + 2 < l a t e x i t s h a 1 _ b a s e 6 4 = " m f V v s T k 2 f K g D C p R j y n E 5 o j 1 h y 4 U = " > A A A B 6 n i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B Z B E E p S B D 0 W v X i s a D + g D W W z 3 b R L N 5 u w O x F K 6 E / w 4 k E R r / 4 i b / 4 b t 2 0 O 2 v p g 4 P H e D D P z g k Q K g 6 7 7 7 R T W 1 j c 2 t 4 r b p Z 3 d v f 2 D 8 u F R y 8 S p Z r z J Y h n r T k A N l 0 L x J g q U v J N o T q N A 8 n Y w v p 3 5 7 S e u j Y j V I 0 4 S 7 k d 0 q E Q o G E U r P e B F r V + u u F V 3 D r J K v J x U I E e j X / 7 q D W K W R l w h k 9 S Y r u c m 6 G d U o 2 C S T 0 u 9 1 P C E s j E d 8 q 6 l i k b c + N n 8 1 C k 5 s 8 q A h L G 2 p Z D M 1 d 8 T G Y 2 M m U S B 7 Y w o j s y y N x P / 8 7 o p h t d + J l S S I l d s s S h M J c G Y z P 4 m A 6 E 5 Q z m x h D I t 7 K 2 E j a i m D G 0 6 J R u C t / z y K m n V q p 5 b 9 e 4 v K / W b P I 4 i n M A p n I M H V 1 C H O 2 h A E x g M 4 R l e 4 c 2 R z o v z 7 n w s W g t O P n M M f + B 8 / g C 4 C Y 1 p < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " m f V v s T k 2 f K g D C p R j y n E 5 o j 1 h y 4 U = " > A A A B 6 n i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B Z B E E p S B D 0 W v X i s a D + g D W W z 3 b R L N 5 u w O x F K 6 E / w 4 k E R r / 4 i b / 4 b t 2 0 O 2 v p g 4 P H e D D P z g k Q K g 6 7 7 7 R T W 1 j c 2 t 4 r b p Z 3 d v f 2 D 8 u F R y 8 S p Z r z J Y h n r T k A N l 0 L x J g q U v J N o T q N A 8 n Y w v p 3 5 7 S e u j Y j V I 0 4 S 7 k d 0 q E Q o G E U r P e B F r V + u u F V 3 D r J K v J x U I E e j X / 7 q D W K W R l w h k 9 S Y r u c m 6 G d U o 2 C S T 0 u 9 1 P C E s j E d 8 q 6 l i k b c + N n 8 1 C k 5 s 8 q A h L G 2 p Z D M 1 d 8 T G Y 2 M m U S B 7 Y w o j s y y N x P / 8 7 o p h t d + J l S S I l d s s S h M J c G Y z P 4 m A 6 E 5 Q z m x h D I t 7 K 2 E j a i m D G 0 6 J R u C t / z y K m n V q p 5 b 9 e 4 v K / W b P I 4 i n M A p n I M H V 1 C H O 2 h A E x g M 4 R l e 4 c 2 R z o v z 7 n w s W g t O P n M M f + B 8 / g C 4 C Y 1 p < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " m f V v s T k 2 f K g D C p R j y n E 5 o j 1 h y 4 U = " > A A A B 6 n i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B Z B E E p S B D 0 W v X i s a D + g D W W z 3 b R L N 5 u w O x F K 6 E / w 4 k E R r / 4 i b / 4 b t 2 0 O 2 v p g 4 P H e D D P z g k Q K g 6 7 7 7 R T W 1 j c 2 t 4 r b p Z 3 d v f 2 D 8 u F R y 8 S p Z r z J Y h n r T k A N l 0 L x J g q U v J N o T q N A 8 n Y w v p 3 5 7 S e u j Y j V I 0 4 S 7 k d 0 q E Q o G E U r P e B F r V + u u F V 3 D r J K v J x U I E e j X / 7 q D W K W R l w h k 9 S Y r u c m 6 G d U o 2 C S T 0 u 9 1 P C E s j E d 8 q 6 l i k b c + N n 8 1 C k 5 s 8 q A h L G 2 p Z D M 1 d 8 T G Y 2 M m U S B 7 Y w o j s y y N x P / 8 7 o p h t d + J l S S I l d s s S h M J c G Y z P 4 m A 6 E 5 Q z m x h D I t 7 K 2 E j a i m D G 0 6 J R u C t / z y K m n V q p 5 b 9 e 4 v K / W b P I 4 i n M A p n I M H V 1 C H O 2 h A E x g M 4 R l e 4 c 2 R z o v z 7 n w s W g t O P n M M f + B 8 / g C 4 C Y 1 p < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " m f V v s T k 2 f K g D C p R j y n E 5 o j 1 h y 4 U = " > A A A B 6 n i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B Z B E E p S B D 0 W v X i s a D + g D W W z 3 b R L N 5 u w O x F K 6 E / w 4 k E R r / 4 i b / 4 b t 2 0 O 2 v p g 4 P H e D D P z g k Q K g 6 7 7 7 R T W 1 j c 2 t 4 r b p Z 3 d v f 2 D 8 u F R y 8 S p Z r z J Y h n r T k A N l 0 L x J g q U v J N o T q N A 8 n Y w v p 3 5 7 S e u j Y j V I 0 4 S 7 k d 0 q E Q o G E U r P e B F r V + u u F V 3 D r J K v J x U I E e j X / 7 q D W K W R l w h k 9 S Y r u c m 6 G d U o 2 C S T 0 u 9 1 P C E s j E d 8 q 6 l i k b c + N n 8 1 C k 5 s 8 q A h L G 2 p Z D M 1 d 8 T G Y 2 M m U S B 7 Y w o j s y y N x P / 8 7 o p h t d + J l S S I l d s s S h M J c G Y z P 4 m A 6 E 5 Q z m x h D I t 7 K 2 E j a i m D G 0 6 J R u C t / z y K m n V q p 5 b 9 e 4 v K / W b P I 4 i n M A p n I M H V 1 C H O 2 h A E x g M 4 R l e 4 c 2 R z o v z 7 n w s W g t O P n M M f + B 8 / g C 4 C Y 1 p < / l a t e x i t > s t+1 < l a t e x i t s h a 1 _ b a s e 6 4 = " S c C u 8 W Z i x o F v l W x D l n I d R y 5 V Z 8 I = " > A A A B 7 n i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B Z B E E o i g h 6 L X j x W s B / Q h r L Z b t q l m 0 3 Y n Q g l 9 E d 4 8 a C I V 3 + P N / + N m z Y H b X 0 w 8 H h v h p l 5 Q S K F Q d f 9 d k p r 6 x u b W + X t y s 7 u 3 v 5 B 9 f C o b e J U M 9 5 i s Y x 1 N 6 C G S 6 F 4 C w V K 3 k 0 0 p 1 E g e S e Y 3 O V + 5 4 l r I 2 L 1 i N O E + x E d K R E K R t F K H T P I 8 M K b D a o 1 t + 7 O Q V a J V 5 A a F G g O q l / 9 Y c z S i C t k k h r T 8 9 w E / Y x q F E z y W a W f G p 5 Q N q E j 3 r N U 0 Y g b P 5 u f O y N n V h m S M N a 2 F J K 5 + n s i o 5 E x 0 y i w n R H F s V n 2 c v E / r 5 d i e O N n Q i U p c s U W i 8 J U E o x J / j s Z C s 0 Z y q k l l G l h b y V s T D V l a B O q 2 B C 8 5 Z d X S f u y 7 r l 1 7 + G q 1 r g t 4 i j D C Z z C O X h w D Q 2 4 h y a 0 g M E E n u E V 3 p z E e X H e n Y 9 F a 8 k p Z o 7 h D 5 z P H w f e j 1 o = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " S c C u 8 W Z i x o F v l W x D l n I d R y 5 V Z 8 I = " > A A A B 7 n i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B Z B E E o i g h 6 L X j x W s B / Q h r L Z b t q l m 0 3 Y n Q g l 9 E d 4 8 a C I V 3 + P N / + N m z Y H b X 0 w 8 H h v h p l 5 Q S K F Q d f 9 d k p r 6 x u b W + X t y s 7 u 3 v 5 B 9 f C o b e J U M 9 5 i s Y x 1 N 6 C G S 6 F 4 C w V K 3 k 0 0 p 1 E g e S e Y 3 O V + 5 4 l r I 2 L 1 i N O E + x E d K R E K R t F K H T P I 8 M K b D a o 1 t + 7 O Q V a J V 5 A a F G g O q l / 9 Y c z S i C t k k h r T 8 9 w E / Y x q F E z y W a W f G p 5 Q N q E j 3 r N U 0 Y g b P 5 u f O y N n V h m S M N a 2 F J K 5 + n s i o 5 E x 0 y i w n R H F s V n 2 c v E / r 5 d i e O N n Q i U p c s U W i 8 J U E o x J / j s Z C s 0 Z y q k l l G l h b y V s T D V l a B O q 2 B C 8 5 Z d X S f u y 7 r l 1 7 + G q 1 r g t 4 i j D C Z z C O X h w D Q 2 4 h y a 0 g M E E n u E V 3 p z E e X H e n Y 9 F a 8 k p Z o 7 h D 5 z P H w f e j 1 o = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " S c C u 8 W Z i x o F v l W x D l n I d R y 5 V Z 8 I = " > A A A B 7 n i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B Z B E E o i g h 6 L X j x W s B / Q h r L Z b t q l m 0 3 Y n Q g l 9 E d 4 8 a C I V 3 + P N / + N m z Y H b X 0 w 8 H h v h p l 5 Q S K F Q d f 9 d k p r 6 x u b W + X t y s 7 u 3 v 5 B 9 f C o b e J U M 9 5 i s Y x 1 N 6 C G S 6 F 4 C w V K 3 k 0 0 p 1 E g e S e Y 3 O V + 5 4 l r I 2 L 1 i N O E + x E d K R E K R t F K H T P I 8 M K b D a o 1 t + 7 O Q V a J V 5 A a F G g O q l / 9 Y c z S i C t k k h r T 8 9 w E / Y x q F E z y W a W f G p 5 Q N q E j 3 r N U 0 Y g b P 5 u f O y N n V h m S M N a 2 F J K 5 + n s i o 5 E x 0 y i w n R H F s V n 2 c v E / r 5 d i e O N n Q i U p c s U W i 8 J U E o x J / j s Z C s 0 Z y q k l l G l h b y V s T D V l a B O q 2 B C 8 5 Z d X S f u y 7 r l 1 7 + G q 1 r g t 4 i j D C Z z C O X h w D Q 2 4 h y a 0 g M E E n u E V 3 p z E e X H e n Y 9 F a 8 k p Z o 7 h D 5 z P H w f e j 1 o = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " S c C u 8 W Z i x o F v l W x D l n I d R y 5 V Z 8 I = " > A A A B 7 n i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B Z B E E o i g h 6 L X j x W s B / Q h r L Z b t q l m 0 3 Y n Q g l 9 E d 4 8 a C I V 3 + P N / + N m z Y H b X 0 w 8 H h v h p l 5 Q S K F Q d f 9 d k p r 6 x u b W + X t y s 7 u 3 v 5 B 9 f C o b e J U M 9 5 i s Y x 1 N 6 C G S 6 F 4 C w V K 3 k 0 0 p 1 E g e S e Y 3 O V + 5 4 l r I 2 L 1 i N O E + x E d K R E K R t F K H T P I 8 M K b D a o 1 t + 7 O Q V a J V 5 A a F G g O q l / 9 Y c z S i C t k k h r T 8 9 w E / Y x q F E z y W a W f G p 5 Q N q E j 3 r N U 0 Y g b P 5 u f O y N n V h m S M N a 2 F J K 5 + n s i o 5 E x 0 y i w n R H F s V n 2 c v E / r 5 d i e O N n Q i U p c s U W i 8 J U E o x J / j s Z C s 0 Z y q k l l G l h b y V s T D V l a B O q 2 B C 8 5 Z d X S f u y 7 r l 1 7 + G q 1 r g t 4 i j D C Z z C O X h w D Q 2 4 h y a 0 g M E E n u E V 3 p z E e X H e n Y 9 F a 8 k p Z o 7 h D 5 z P H w f e j 1 o = < / l a t e x i t > a t+1 < l a t e x i t s h a 1 _ b a s e 6 4 = " d j Q 9 v + 9 D a z s G J p 0 k d r M 2 G L D 5 C 4 8 = " > A A A B 7 n i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B Z B E E o i g h 6 L X j x W s B / Q h r L Z b t q l m 0 3 Y n Q g l 9 E d 4 8 a C I V 3 + P N / + N m z Y H b X 0 w 8 H h v h p l 5 Q S K F Q d f 9 d k p r 6 x u b W + X t y s 7 u 3 v 5 B 9 f C o b e J U M 9 5 i s Y x 1 N 6 C G S 6 F 4 C w V K 3 k 0 0 p 1 E g e S e Y 3 O V + 5 4 l r I 2 L 1 i N O E + x E d K R E K R t F K H T r I 8 M K b D a o 1 t + 7 O Q V a J V 5 A a F G g O q l / 9 Y c z S i C t k k h r T 8 9 w E / Y x q F E z y W a W f G p 5 Q N q E j 3 r N U 0 Y g b P 5 u f O y N n V h m S M N a 2 F J K 5 + n s i o 5 E x 0 y i w n R H F s V n 2 c v E / r 5 d i e O N n Q i U p c s U W i 8 J U E o x J / j s Z C s 0 Z y q k l l G l h b y V s T D V l a B O q 2 B C 8 5 Z d X S f u y 7 r l 1 7 + G q 1 r g t 4 i j D C Z z C O X h w D Q 2 4 h y a 0 g M E E n u E V 3 p z E e X H e n Y 9 F a 8 k p Z o 7 h D 5 z P H + w b j 0 g = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " d j Q 9 v + 9 D a z s G J p 0 k d r M 2 G L D 5 C 4 8 = " > A A A B 7 n i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B Z B E E o i g h 6 L X j x W s B / Q h r L Z b t q l m 0 3 Y n Q g l 9 E d 4 8 a C I V 3 + P N / + N m z Y H b X 0 w 8 H h v h p l 5 Q S K F Q d f 9 d k p r 6 x u b W + X t y s 7 u 3 v 5 B 9 f C o b e J U M 9 5 i s Y x 1 N 6 C G S 6 F 4 C w V K 3 k 0 0 p 1 E g e S e Y 3 O V + 5 4 l r I 2 L 1 i N O E + x E d K R E K R t F K H T r I 8 M K b D a o 1 t + 7 O Q V a J V 5 A a F G g O q l / 9 Y c z S i C t k k h r T 8 9 w E / Y x q F E z y W a W f G p 5 Q N q E j 3 r N U 0 Y g b P 5 u f O y N n V h m S M N a 2 F J K 5 + n s i o 5 E x 0 y i w n R H F s V n 2 c v E / r 5 d i e O N n Q i U p c s U W i 8 J U E o x J / j s Z C s 0 Z y q k l l G l h b y V s T D V l a B O q 2 B C 8 5 Z d X S f u y 7 r l 1 7 + G q 1 r g t 4 i j D C Z z C O X h w D Q 2 4 h y a 0 g M E E n u E V 3 p z E e X H e n Y 9 F a 8 k p Z o 7 h D 5 z P H + w b j 0 g = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " d j Q 9 v + 9 D a z s G J p 0 k d r M 2 G L D 5 C 4 8 = " > A A A B 7 n i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B Z B E E o i g h 6 L X j x W s B / Q h r L Z b t q l m 0 3 Y n Q g l 9 E d 4 8 a C I V 3 + P N / + N m z Y H b X 0 w 8 H h v h p l 5 Q S K F Q d f 9 d k p r 6 x u b W + X t y s 7 u 3 v 5 B 9 f C o b e J U M 9 5 i s Y x 1 N 6 C G S 6 F 4 C w V K 3 k 0 0 p 1 E g e S e Y 3 O V + 5 4 l r I 2 L 1 i N O E + x E d K R E K R t F K H T r I 8 M K b D a o 1 t + 7 O Q V a J V 5 A a F G g O q l / 9 Y c z S i C t k k h r T 8 9 w E / Y x q F E z y W a W f G p 5 Q N q E j 3 r N U 0 Y g b P 5 u f O y N n V h m S M N a 2 F J K 5 + n s i o 5 E x 0 y i w n R H F s V n 2 c v E / r 5 d i e O N n Q i U p c s U W i 8 J U E o x J / j s Z C s 0 Z y q k l l G l h b y V s T D V l a B O q 2 B C 8 5 Z d X S f u y 7 r l 1 7 + G q 1 r g t 4 i j D C Z z C O X h w D Q 2 4 h y a 0 g M E E n u E V 3 p z E e X H e n Y 9 F a 8 k p Z o 7 h D 5 z P H + w b j 0 g = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " d j Q 9 v + 9 D a z s G J p 0 k d r M 2 G L D 5 C 4 8 = " > A A A B 7 n i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B Z B E E o i g h 6 L X j x W s B / Q h r L Z b t q l m 0 3 Y n Q g l 9 E d 4 8 a C I V 3 + P N / + N m z Y H b X 0 w 8 H h v h p l 5 Q S K F Q d f 9 d k p r 6 x u b W + X t y s 7 u 3 v 5 B 9 f C o b e J U M 9 5 i s Y x 1 N 6 C G S 6 F 4 C w V K 3 k 0 0 p 1 E g e S e Y 3 O V + 5 4 l r I 2 L 1 i N O E + x E d K R E K R t F K H T r I 8 M K b D a o 1 t + 7 O Q V a J V 5 A a F G g O q l / 9 Y c z S i C t k k h r T 8 9 w E / Y x q F E z y W a W f G p 5 Q N q E j 3 r N U 0 Y g b P 5 u f O y N n V h m S M N a 2 F J K 5 + n s i o 5 E x 0 y i w n R H F s V n 2 c v E / r 5 d i e O N n Q i U p c s U W i 8 J U E o x J / j s Z C s 0 Z y q k l l G l h b y V s T D V l a B O q 2 B C 8 5 Z d X S f u y 7 r l 1 7 + G q 1 r g t 4 i j D C Z z C O X h w D Q 2 4 h y a 0 g M E E n u E V 3 p z E e X H e n Y 9 F a 8 k p Z o 7 h D 5 z P H + w b j 0 g = < / l a t e x i t > a t+1 < l a t e x i t s h a 1 _ b a s e 6 4 = " d j Q 9 v + 9 D a z s G J p 0 k d r M 2 G L D 5 C 4 8 = " > A A A B 7 n i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B Z B E E o i g h 6 L X j x W s B / Q h r L Z b t q l m 0 3 Y n Q g l 9 E d 4 8 a C I V 3 + P N / + N m z Y H b X 0 w 8 H h v h p l 5 Q S K F Q d f 9 d k p r 6 x u b W + X t y s 7 u 3 v 5 B 9 f C o b e J U M 9 5 i s Y x 1 N 6 C G S 6 F 4 C w V K 3 k 0 0 p 1 E g e S e Y 3 O V + 5 4 l r I 2 L 1 i N O E + x E d K R E K R t F K H T r I 8 M K b D a o 1 t + 7 O Q V a J V 5 A a F G g O q l / 9 Y c z S i C t k k h r T 8 9 w E / Y x q F E z y W a W f G p 5 Q N q E j 3 r N U 0 Y g b P 5 u f O y N n V h m S M N a 2 F J K 5 + n s i o 5 E x 0 y i w n R H F s V n 2 c v E / r 5 d i e O N n Q i U p c s U W i 8 J U E o x J / j s Z C s 0 Z y q k l l G l h b y V s T D V l a B O q 2 B C 8 5 Z d X S f u y 7 r l 1 7 + G q 1 r g t 4 i j D C Z z C O X h w D Q 2 4 h y a 0 g M E E n u E V 3 p z E e X H e n Y 9 F a 8 k p Z o 7 h D 5 z P H + w b j 0 g = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " d j Q 9 y s 7 u 3 v 5 B 9 f C o b e J U M 9 5 i s Y x 1 N 6 C G S 6 F 4 C w V K 3 k 0 0 p 1 E g e S e Y 3 O V + 5 4 l r I 2 L 1 y s 7 u 3 v 5 B 9 f C o b e J U M 9 5 i s Y x 1 N 6 C G S 6 F 4 C w V K 3 k 0 0 p 1 E g e S e Y 3 O V + 5 4 l r I 2 L 1 y s 7 u 3 v 5 B 9 f C o b e J U M 9 5 i s Y x 1 N 6 C G S 6 F 4 C w V K 3 k 0 0 p 1 E g e S e Y 3 O V + 5 4 l r I 2 L 1  v + 9 D a z s G J p 0 k d r M 2 G L D 5 C 4 8 = " > A A A B 7 n i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B Z B E E o i g h 6 L X j x W s B / Q h r L Z b t q l m 0 3 Y n Q g l 9 E d 4 8 a C I V 3 + P N / + N m z Y H b X 0 w 8 H h v h p l 5 Q S K F Q d f 9 d k p r 6 x u b W + X t i N O E + x E d K R E K R t F K H T r I 8 M K b D a o 1 t + 7 O Q V a J V 5 A a F G g O q l / 9 Y c z S i C t k k h r T 8 9 w E / Y x q F E z y W a W f G p 5 Q N q E j 3 r N U 0 Y g b P 5 u f O y N n V h m S M N a 2 F J K 5 + n s i o 5 E x 0 y i w n R H F s V n 2 c v E / r 5 d i e O N n Q i U p c s U W i 8 J U E o x J / j s Z C s 0 Z y q k l l G l h b y V s T D V l a B O q 2 B C 8 5 Z d X S f u y 7 r l 1 7 + G q 1 r g t 4 i j D C Z z C O X h w D Q 2 4 h y a 0 g M E E n u E V 3 p z E e X H e n Y 9 F a 8 k p Z o 7 h D 5 z P H + w b j 0 g = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " d j Q 9 v + 9 D a z s G J p 0 k d r M 2 G L D 5 C 4 8 = " > A A A B 7 n i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B Z B E E o i g h 6 L X j x W s B / Q h r L Z b t q l m 0 3 Y n Q g l 9 E d 4 8 a C I V 3 + P N / + N m z Y H b X 0 w 8 H h v h p l 5 Q S K F Q d f 9 d k p r 6 x u b W + X t i N O E + x E d K R E K R t F K H T r I 8 M K b D a o 1 t + 7 O Q V a J V 5 A a F G g O q l / 9 Y c z S i C t k k h r T 8 9 w E / Y x q F E z y W a W f G p 5 Q N q E j 3 r N U 0 Y g b P 5 u f O y N n V h m S M N a 2 F J K 5 + n s i o 5 E x 0 y i w n R H F s V n 2 c v E / r 5 d i e O N n Q i U p c s U W i 8 J U E o x J / j s Z C s 0 Z y q k l l G l h b y V s T D V l a B O q 2 B C 8 5 Z d X S f u y 7 r l 1 7 + G q 1 r g t 4 i j D C Z z C O X h w D Q 2 4 h y a 0 g M E E n u E V 3 p z E e X H e n Y 9 F a 8 k p Z o 7 h D 5 z P H + w b j 0 g = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " d j Q 9 v + 9 D a z s G J p 0 k d r M 2 G L D 5 C 4 8 = " > A A A B 7 n i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B Z B E E o i g h 6 L X j x W s B / Q h r L Z b t q l m 0 3 Y n Q g l 9 E d 4 8 a C I V 3 + P N / + N m z Y H b X 0 w 8 H h v h p l 5 Q S K F Q d f 9 d k p r 6 x u b W + X t i N O E + x E d K R E K R t F K H T r I 8 M K b D a o 1 t + 7 O Q V a J V 5 A a F G g O q l / 9 Y c z S i C t k k h r T 8 9 w E / Y x q F E z y W a W f G p 5 Q N q E j 3 r N U 0 Y g b P 5 u f O y N n V h m S M N a 2 F J K 5 + n s i o 5 E x 0 y i w n R H F s V n 2 c v E / r 5 d i e O N n Q i U p c s U W i 8 J U E o x J / j s Z C s 0 Z y q k l l G l h b y V s T D V l a B O q 2 B C 8 5 Z d X S f u y 7 r l 1 7 + G q 1 r g t 4 i j D C Z z C O X h w D Q 2 4 h y a 0 g M E E n u E V 3 p z E e X H e n Y 9 F a 8 k p Z o 7 h D 5 z P H + w b j 0 g = < / l a t e x i t > o t+1 < l a t e x i t s h a 1 _ b a s e 6 4 = " X 1 p E A X T Y H b G J R x J q E j s t w t S r v f c = " > A A A B 7 n i c b V D L S g N B E O y N r x h f U Y 9 e B o M g C G F X B D 0 G v X i M Y B 6 Q L G F 2 M k m G z M 4 s M 7 1 C W P I R X j w o 4 t X v 8 e b f O E n 2 o I k F D U V V N 9 1 d U S K F R d / / 9 g p r 6 x u b W 8 X t 0 s 7 u 3 v 5 B + f C o a X V q G G 8 w L b V p R 9 R y K R R v o E D J 2 4 n h N I 4 k b 0 X j u 5 n f e u L G C q 0 e c Z L w M K Z D J Q a C U X R S S / c y v A i m v X L F r / p z k F U S 5 K Q C O e q 9 8 l e 3 r 1 k a c 4 V M U m s 7 g Z 9 g m F G D g k k + L X V T y x P K x n T I O 4 4 q G n M b Z v N z p + T M K X 0 y 0 M a V Q j J X f 0 9 k N L Z 2 E k e u M 6 Y 4 s s v e T P z P 6 6 Q 4 u A k z o Z I U u W K L R Y N U E t R k 9 j v p C 8 M Z y o k j l B n h b i V s R A 1 l 6 B I q u R C C 5 Z d X S f O y G v j V 4 O G q U r v N 4 y j C C Z z C O Q R w D T W 4 h z o 0 g M E Y n u E V 3 r z E e / H e v Y 9 F a 8 H L Z 4 7 h D 7 z P H w G 2 j 1 Y = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " X 1 p E A X T Y H b G J R x J q E j s t w t S r v f c = " > A A A B 7 n i c b V D L S g N B E O y N r x h f U Y 9 e B o M g C G F X B D 0 G v X i M Y B 6 Q L G F 2 M k m G z M 4 s M 7 1 C W P I R X j w o 4 t X v 8 e b f O E n 2 o I k F D U V V N 9 1 d U S K F R d / / 9 g p r 6 x u b W 8 X t 0 s 7 u 3 v 5 B + f C o a X V q G G 8 w L b V p R 9 R y K R R v o E D J 2 4 n h N I 4 k b 0 X j u 5 n f e u L G C q 0 e c Z L w M K Z D J Q a C U X R S S / c y v A i m v X L F r / p z k F U S 5 K Q C O e q 9 8 l e 3 r 1 k a c 4 V M U m s 7 g Z 9 g m F G D g k k + L X V T y x P K x n T I O 4 4 q G n M b Z v N z p + T M K X 0 y 0 M a V Q j J X f 0 9 k N L Z 2 E k e u M 6 Y 4 s s v e T P z P 6 6 Q 4 u A k z o Z I U u W K L R Y N U E t R k 9 j v p C 8 M Z y o k j l B n h b i V s R A 1 l 6 B I q u R C C 5 Z d X S f O y G v j V 4 O G q U r v N 4 y j C C Z z C O Q R w D T W 4 h z o 0 g M E Y n u E V 3 r z E e / H e v Y 9 F a 8 H L Z 4 7 h D 7 z P H w G 2 j 1 Y = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " X 1 p E A X T Y H b G J R x J q E j s t w t S r v f c = " > A A A B 7 n i c b V D L S g N B E O y N r x h f U Y 9 e B o M g C G F X B D 0 G v X i M Y B 6 Q L G F 2 M k m G z M 4 s M 7 1 C W P I R X j w o 4 t X v 8 e b f O E n 2 o I k F D U V V N 9 1 d U S K F R d / / 9 g p r 6 x u b W 8 X t 0 s 7 u 3 v 5 B + f C o a X V q G G 8 w L b V p R 9 R y K R R v o E D J 2 4 n h N I 4 k b 0 X j u 5 n f e u L G C q 0 e c Z L w M K Z D J Q a C U X R S S / c y v A i m v X L F r / p z k F U S 5 K Q C O e q 9 8 l e 3 r 1 k a c 4 V M U m s 7 g Z 9 g m F G D g k k + L X V T y x P K x n T I O 4 4 q G n M b Z v N z p + T M K X 0 y 0 M a V Q j J X f 0 9 k N L Z 2 E k e u M 6 Y 4 s s v e T P z P 6 6 Q 4 u A k z o Z I U u W K L R Y N U E t R k 9 j v p C 8 M Z y o k j l B n h b i V s R A 1 l 6 B I q u R C C 5 Z d X S f O y G v j V 4 O G q U r v N 4 y j C C Z z C O Q R w D T W 4 h z o 0 g M E Y n u E V 3 r z E e / H e v Y 9 F a 8 H L Z 4 7 h D 7 z P H w G 2 j 1 Y = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " X 1 p E A X T Y H b G J R x J q E j s t w t S r v f c = " > A A A B 7 n i c b V D L S g N B E O y N r x h f U Y 9 e B o M g C G F X B D 0 G v X i M Y B 6 Q L G F 2 M k m G z M 4 s M 7 1 C W P I R X j w o 4 t X v 8 e b f O E n 2 o I k F D U V V N 9 1 d U S K F R d / / 9 g p r 6 x u b W 8 X t 0 s 7 u 3 v 5 B + f C o a X V q G G 8 w L b V p R 9 R y K R R v o E D J 2 4 n h N I 4 k b 0 X j u 5 n f e u L G C q 0 e c Z L w M K Z D J Q a C U X R S S / c y v A i m v X L F r / p z k F U S 5 K Q C O e q 9 8 l e 3 r 1 k a c 4 V M U m s 7 g Z 9 g m F G D g k k + L X V T y x P K x n T I O 4 4 q G n M b Z v N z p + T M K X 0 y 0 M a V Q j J X f 0 9 k N L Z 2 E k e u M 6 Y 4 s s v e T P z P 6 6 Q 4 u A k z o Z I U u W K L R Y N U E t R k 9 j v p C 8 M Z y o k j l B n h b i V s R A 1 l 6 B I q u R C C 5 Z d X S f O y G v j V 4 O G q U r v N 4 y j C C Z z C O Q R w D T W 4 h z o 0 g M E Y n u E V 3 r z E e / H e v Y 9 F a 8 H L Z 4 7 h D 7 z P H w G 2 j 1 Y = < / l a t e x i t > -KL Policy posterior Policy prior KL q(at+1|st+1) < l a t e x i t s h a 1 _ b a s e 6 4 = " x o f 7 c E 7 q / X M 4 A 1 U V V P J m y Z T Z n M A = " > A A A B + 3 i c b V D L S s N A F L 2 p r 1 p f s S 7 d D B a h I p R E B F 0 W 3 b i s Y B / Q h j C Z T t u h k 0 m c m Y g l 5 l f c u F D E r T / i z r 9 x 2 m a h r Q c u 9 3 D O v c y d E 8 S c K e 0 4 3 1 Z h Z X V t f a O 4 W d r a 3 t n d s / f L L R U l k t A m i X g k O w F W l D N B m 5 p p T j u x p D g M O G 0 H 4 + u p 3 3 6 g U r F I 3 O l J T L 0 Q D w U b M I K 1 k X y 7 f F / F f q p P 3 e x J z f u J b 1 e c m j M D W i Z u T i q Q o + H b X 7 1 + R J K Q C k 0 4 V q r r O r H 2 U i w 1 I 5 x m p V 6 i a I z J G A 9 p 1 1 C B Q 6 q 8 d H Z 7 h o 6 N 0 k e D S J o S G s 3 U 3 x s p D p W a h I G Z D L E e q U V v K v 7 n d R M 9 u P R S J u J E U 0 H m D w 0 S j n S E p k G g P p O U a D 4 x B B P J z K 2 I j L D E R J u 4 S i Y E d / H L y 6 R 1 V n O d m n t 7 X q l f 5 X E U 4 R C O o A o u X E A d b q A B T S D w C M / w C m 9 W Z r 1 Y 7 9 b H f L R g 5 T s H 8 A f W 5 w / + Y p O / < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " x o f 7 c E 7 q / X M 4 A 1 U V V P J m y Z T Z n M A = " > A A A B + 3 i c b V D L S s N A F L 2 p r 1 p f s S 7 d D B a h I p R E B F 0 W 3 b i s Y B / Q h j C Z T t u h k 0 m c m Y g l 5 l f c u F D E r T / i z r 9 x 2 m a h r Q c u 9 3 D O v c y d E 8 S c K e 0 4 3 1 Z h Z X V t f a O 4 W d r a 3 t n d s / f L L R U l k t A m i X g k O w F W l D N B m 5 p p T j u x p D g M O G 0 H 4 + u p 3 3 6 g U r F I 3 O l J T L 0 Q D w U b M I K 1 k X y 7 f F / F f q p P 3 e x J z f u J b 1 e c m j M D W i Z u T i q Q o + H b X 7 1 + R J K Q C k 0 4 V q r r O r H 2 U i w 1 I 5 x m p V 6 i a I z J G A 9 p 1 1 C B Q 6 q 8 d H Z 7 h o 6 N 0 k e D S J o S G s 3 U 3 x s p D p W a h I G Z D L E e q U V v K v 7 n d R M 9 u P R S J u J E U 0 H m D w 0 S j n S E p k G g P p O U a D 4 x B B P J z K 2 I j L D E R J u 4 S i Y E d / H L y 6 R 1 V n O d m n t 7 X q l f 5 X E U 4 R C O o A o u X E A d b q A B T S D w C M / w C m 9 W Z r 1 Y 7 9 b H f L R g 5 T s H 8 A f W 5 w / + Y p O / < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " x o f 7 c E 7 q / X M 4 A 1 U V V P J m y Z T Z n M A = " > A A A B + 3 i c b V D L S s N A F L 2 p r 1 p f s S 7 d D B a h I p R E B F 0 W 3 b i s Y B / Q h j C Z T t u h k 0 m c m Y g l 5 l f c u F D E r T / i z r 9 x 2 m a h r Q c u 9 3 D O v c y d E 8 S c K e 0 4 3 1 Z h Z X V t f a O 4 W d r a 3 t n d s / f L L R U l k t A m i X g k O w F W l D N B m 5 p p T j u x p D g M O G 0 H 4 + u p 3 3 6 g U r F I 3 O l J T L 0 Q D w U b M I K 1 k X y 7 f F / F f q p P 3 e x J z f u J b 1 e c m j M D W i Z u T i q Q o + H b X 7 1 + R J K Q C k 0 4 V q r r O r H 2 U i w 1 I 5 x m p V 6 i a I z J G A 9 p 1 1 C B Q 6 q 8 d H Z 7 h o 6 N 0 k e D S J o S G s 3 U 3 x s p D p W a h I G Z D L E e q U V v K v 7 n d R M 9 u P R S J u J E U 0 H m D w 0 S j n S E p k G g P p O U a D 4 x B B P J z K 2 I j L D E R J u 4 S i Y E d / H L y 6 R 1 V n O d m n t 7 X q l f 5 X E U 4 R C O o A o u X E A d b q A B T S D w C M / w C m 9 W Z r 1 Y 7 9 b H f L R g 5 T s H 8 A f W 5 w / + Y p O / < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " x o f 7 c E 7 q / X M 4 A 1 U V V P J m y Z T Z n M A = " > A A A B + 3 i c b V D L S s N A F L 2 p r 1 p f s S 7 d D B a h I p R E B F 0 W 3 b i s Y B / Q h j C Z T t u h k 0 m c m Y g l 5 l f c u F D E r T / i z r 9 x 2 m a h r Q c u 9 3 D O v c y d E 8 S c K e 0 4 3 1 Z h Z X V t f a O 4 W d r a 3 t n d s / f L L R U l k t A m i X g k O w F W l D N B m 5 p p T j u x p D g M O G 0 H 4 + u p 3 3 6 g U r F I 3 O l J T L 0 Q D w U b M I K 1 k X y 7 f F / F f q p P 3 e x J z f u J b 1 e c m j M D W i Z u T i q Q o + H b X 7 1 + R J K Q C k 0 4 V q r r O r H 2 U i w 1 I 5 x m p V 6 i a I z J G A 9 p 1 1 C B Q 6 q 8 d H Z 7 h o 6 N 0 k e D S J o S G s 3 U 3 x s p D p W a h I G Z D L E e q U V v K v 7 n d R M 9 u P R S J u J E U 0 H m D w 0 S j n S E p k G g P p O U a D 4 x B B P J z K 2 I j L D E R J u 4 S i Y E d / H L y 6 R 1 V n O d m n t 7 X q l f 5 X E U 4 R C O o A o u X E A d b q A B T S D w C M / w C m 9 W Z r 1 Y 7 9 b H f L R g 5 T s H 8 A f W 5 w / + Y p O / < / l a t e x i t >

Free Energy Calculation

Expected Free Energy Calculation r t < l a t e x i t s h a 1 _ b a s e 6 4 = " B m o b w t q 6 d 1 V d n r Q p 0 o h 1 J w b Y U l g = " > A A A B 6 n i c b V B N S 8 N A E J 3 U r 1 q / o h 6 9 L B b B U 0 l E 0 G P R i 8 e K 9 g P a U D b b T b t 0 s w m 7 E 6 G E / g Q v H h T x 6 i / y 5 r 9 x 2 + a g r Q 8 G H u / N M D M v T K U w 6 H n f T m l t f W N z q 7 x d 2 d n d 2 z 9 w D 4 9 a J s k 0 4 0 2 W y E R 3 Q m q 4 F I o 3 U a D k n V R z G o e S t 8 P x 7 c x v P 3 F t R K I e c Z L y I K Z D J S L B K F r p Q f e x 7 1 a 9 m j c H W S V + Q a p Q o N F 3 v 3 q D h G U x V 8 g k N a b r e y k G O d U o m O T T S i 8 z P K V s T I e 8 a 6 m i M T d B P j 9 1 S s 6 s M i B R o m 0 p J H P 1 9 0 R O Y 2 M m c W g 7 Y 4 o j s + z N x P + 8 b o b R d Z A L l W b I F V s s i j J J M C G z v 8 l A a M 5 Q T i y h T A t 7 K 2 E j q i l D m 0 7 F h u A v v 7 x K W h c 1 3 6 v 5 9 5 f V + k 0 R R x l O 4 B T O w Y c r q M M d N K A J D I b w D K / w 5 k j n x X l 3 P h a t J a e Y O Y Y / c D 5 / A G g Y j d 0 = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " B m o b w t q 6 d 1 V d n r Q p 0 o h 1 J w b Y U l g = " > A A A B 6 n i c b V B N S 8 N A E J 3 U r 1 q / o h 6 9 L B b B U 0 l E 0 G P R i 8 e K 9 g P a U D b b T b t 0 s w m 7 E 6 G E / g Q v H h T x 6 i / y 5 r 9 x 2 + a g r Q 8 G H u / N M D M v T K U w 6 H n f T m l t f W N z q 7 x d 2 d n d 2 z 9 w D 4 9 a J s k 0 4 0 2 W y E R 3 Q m q 4 F I o 3 U a D k n V R z G o e S t 8 P x 7 c x v P 3 F t R K I e c Z L y I K Z D J S L B K F r p Q f e x 7 1 a 9 m j c H W S V + Q a p Q o N F 3 v 3 q D h G U x V 8 g k N a b r e y k G O d U o m O T T S i 8 z P K V s T I e 8 a 6 m i M T d B P j 9 1 S s 6 s M i B R o m 0 p J H P 1 9 0 R O Y 2 M m c W g 7 Y 4 o j s + z N x P + 8 b o b R d Z A L l W b I F V s s i j J J M C G z v 8 l A a M 5 Q T i y h T A t 7 K 2 E j q i l D m 0 7 F h u A v v 7 x K W h c 1 3 6 v 5 9 5 f V + k 0 R R x l O 4 B T O w Y c r q M M d N K A J D I b w D K / w 5 k j n x X l 3 P h a t J a e Y O Y Y / c D 5 / A G g Y j d 0 = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " B m o b w t q 6 d 1 V d n r Q p 0 o h 1 J w b Y U l g = " > A A A B 6 n i c b V B N S 8 N A E J 3 U r 1 q / o h 6 9 L B b B U 0 l E 0 G P R i 8 e K 9 g P a U D b b T b t 0 s w m 7 E 6 G E / g Q v H h T x 6 i / y 5 r 9 x 2 + a g r Q 8 G H u / N M D M v T K U w 6 H n f T m l t f W N z q 7 x d 2 d n d 2 z 9 w D 4 9 a J s k 0 4 0 2 W y E R 3 Q m q 4 F I o 3 U a D k n V R z G o e S t 8 P x 7 c x v P 3 F t R K I e c Z L y I K Z D J S L B K F r p Q f e x 7 1 a 9 m j c H W S V + Q a p Q o N F 3 v 3 q D h G U x V 8 g k N a b r e y k G O d U o m O T T S i 8 z P K V s T I e 8 a 6 m i M T d B P j 9 1 S s 6 s M i B R o m 0 p J H P 1 9 0 R O Y 2 M m c W g 7 Y 4 o j s + z N x P + 8 b o b R d Z A L l W b I F V s s i j J J M C G z v 8 l A a M 5 Q T i y h T A t 7 K 2 E j q i l D m 0 7 F h u A v v 7 x K W h c 1 3 6 v 5 9 5 f V + k 0 R R x l O 4 B T O w Y c r q M M d N K A J D I b w D K / w 5 k j n x X l 3 P h a t J a e Y O Y Y / c D 5 / A G g Y j d 0 = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " B m o b w t q 6 d 1 V d n r Q p 0 o h 1 J w b Y U l g = " > A A A B 6 n i c b V B N S 8 N A E J 3 U r 1 q / o h 6 9 L B b B U 0 l E 0 G P R i 8 e K 9 g P a U D b b T b t 0 s w m 7 E 6 G E / g Q v H h T x 6 i / y 5 r 9 x 2 + a g r Q 8 G H u / N M D M v T K U w 6 H n f T m l t f W N z q 7 x d 2 d n d 2 z 9 w D 4 9 a J s k 0 4 0 2 W y E R 3 Q m q 4 F I o 3 U a D k n V R z G o e S t 8 P x 7 c x v P 3 F t R K I e c Z L y I K Z D J S L B K F r p Q f e x 7 1 a 9 m j c H W S V + Q a p Q o N F 3 v 3 q D h G U x V 8 g k N a b r e y k G O d U o m O T T S i 8 z P K V s T I e 8 a 6 m i M T d B P j 9 1 S s 6 s M i B R o m 0 p J H P 1 9 0 R O Y 2 M m c W g 7 Y 4 o j s + z N x P + 8 b o b R d Z A L l W b I F V s s i j J J M C G z v 8 l A a M 5 Q T i y h T A t 7 K 2 E j q i l D m 0 7 F h u A v v 7 x K W h c 1 3 6 v 5 9 5 f V + k 0 R R x l O 4 B T O w Y c r q M M d N K A J D I b w D K / w 5 k j n x X l 3 P h a t J a e Y O Y Y / c D 5 / A G g Y j d 0 = < / l a t e x i t > (o t , a t ) < l a t e x i t s h a 1 _ b a s e 6 4 = " Z a 4 K + h v l O K 4 F 6 l q n n 6 B 4 r 3 z 0 Z 9 s = " > A A A B 8 H i c b V D L S g N B E J z 1 G e M r 6 t H L Y B A i S N g V Q Y 9 B L x 4 j m I c k y z I 7 m U 2 G z G O Z 6 R V C y F d 4 8 a C I V z / H m 3 / j J N m D J h Y 0 F F X d d H f F q e A W f P / b W 1 l d W 9 / Y L G w V t 3 d 2 9 / Z L B 4 d N q z N D W Y N q o U 0 7 J p Y J r l g D O A j W T g 0 j M h a s F Q 9 v p 3 7 r i R n L t X q A U c p C S f q K J 5 w S c N J j R U d w T i I 4 i 0 p l v + r P g J d J k J M y y l G P S l / d n q a Z Z A q o I N Z 2 A j + F c E w M c C r Y p N j N L E s J H Z I + 6 z i q i G Q 2 H M 8 O n u B T p / R w o o 0 r B X i m / p 4 Y E 2 n t S M a u U x I Y 2 E V v K v 7 n d T J I r s M x V 2 k G T N H 5 o i Q T G D S e f o 9 7 3 D A K Y u Q I o Y a 7 W z E d E E M o u I y K L o R g 8 e V l 0 r y o B n 4 1 u L 8 s 1 2 7 y O A r o G J 2 g C g r Q F a q h O 1 R H D U S R R M / o F b 1 5 x n v x 3 r 2 P e e u K l 8 8 c o T / w P n 8 A 3 B S P x w = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " Z a 4 K + h v l O K 4 F 6 l q n n 6 B 4 r 3 z 0 Z 9 s = " > A A A B 8 H i c b V D L S g N B E J z 1 G e M r 6 t H L Y B A i S N g V Q Y 9 B L x 4 j m I c k y z I 7 m U 2 G z G O Z 6 R V C y F d 4 8 a C I V z / H m 3 / j J N m D J h Y 0 F F X d d H f F q e A W f P / b W 1 l d W 9 / Y L G w V t 3 d 2 9 / Z L B 4 d N q z N D W Y N q o U 0 7 J p Y J r l g D O A j W T g 0 j M h a s F Q 9 v p 3 7 r i R n L t X q A U c p C S f q K J 5 w S c N J j R U d w T i I 4 i 0 p l v + r P g J d J k J M y y l G P S l / d n q a Z Z A q o I N Z 2 A j + F c E w M c C r Y p N j N L E s J H Z I + 6 z i q i G Q 2 H M 8 O n u B T p / R w o o 0 r B X i m / p 4 Y E 2 n t S M a u U x I Y 2 E V v K v 7 n d T J I r s M x V 2 k G T N H 5 o i Q T G D S e f o 9 7 3 D A K Y u Q I o Y a 7 W z E d E E M o u I y K L o R g 8 e V l 0 r y o B n 4 1 u L 8 s 1 2 7 y O A r o G J 2 g C g r Q F a q h O 1 R H D U S R R M / o F b 1 5 x n v x 3 r 2 P e e u K l 8 8 c o T / w P n 8 A 3 B S P x w = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " Z a 4 K + h v l O K 4 F 6 l q n n 6 B 4 r 3 z 0 Z 9 s = " > A A A B 8 H i c b V D L S g N B E J z 1 G e M r 6 t H L Y B A i S N g V Q Y 9 B L x 4 j m I c k y z I 7 m U 2 G z G O Z 6 R V C y F d 4 8 a C I V z / H m 3 / j J N m D J h Y 0 F F X d d H f F q e A W f P / b W 1 l d W 9 / Y L G w V t 3 d 2 9 / Z L B 4 d N q z N D W Y N q o U 0 7 J p Y J r l g D O A j W T g 0 j M h a s F Q 9 v p 3 7 r i R n L t X q A U c p C S f q K J 5 w S c N J j R U d w T i I 4 i 0 p l v + r P g J d J k J M y y l G P S l / d n q a Z Z A q o I N Z 2 A j + F c E w M c C r Y p N j N L E s J H Z I + 6 z i q i G Q 2 H M 8 O n u B T p / R w o o 0 r B X i m / p 4 Y E 2 n t S M a u U x I Y 2 E V v K v 7 n d T J I r s M x V 2 k G T N H 5 o i Q T G D S e f o 9 7 3 D A K Y u Q I o Y a 7 W z E d E E M o u I y K L o R g 8 e V l 0 r y o B n 4 1 u L 8 s 1 2 7 y O A r o G J 2 g C g r Q F a q h O 1 R H D U S R R M / o F b 1 5 x n v x 3 r 2 P e e u K l 8 8 c o T / w P n 8 A 3 B S P x w = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " Z a 4 K + h v l O K 4 F 6 l q n n 6 B 4 r 3 z 0  Z 9 s = " > A A A B 8 H i c b V D L S g N B E J z 1 G e M r 6 t H L Y B A i S N g V Q Y 9 B L x 4 j m I c k y z I 7 m U 2 G z G O Z 6 R V C y F d 4 8 a C I V z / H m 3 / j J N m D J h Y 0 F F X d d H f F q e A W f P / b W 1 l d W 9 / Y L G w V t 3 d 2 9 / Z L B 4 d N q z N D W Y N q o U 0 7 J p Y J r l g D O A j W T g 0 j M h a s F Q 9 v p 3 7 r i R n L t X q A U c p C S f q K J 5 w S c N J j R U d w T i I 4 i 0 p l v + r P g J d J k J M y y l G P S l / d n q a Z Z A q o I N Z 2 A j + F c E w M c C r Y p N j N L E s J H Z I + 6 z i q i G Q 2 H M 8 O n u B T p / R w o o 0 r B X i m / p 4 Y E 2 n t S M a u U x I Y 2 E V v K v 7 n d T J I r s M x V 2 k G T N H 5 o i Q T G D S e f o 9 7 3 D A K Y u Q I o Y a 7 W z E d E E M o u I y K L o R g 8 e V l 0 r y o B n 4 1 u L 8 s 1 2 7 y O A r o G J 2 g C g r Q F a q h O 1 R H D U S R R M / o F b 1 5 x n v x 3 / A V q 0 y Y s P d p G 3 K 5 Q 5 U 1 3 Q K 4 s s = " > A A A B 6 n i c b V B N S 8 N A E J 3 U r 1 q / o h 6 9 L B b B U 0 l E 0 G P R i 8 e K 9 g P a U D b b T b t 0 s w m 7 E 6 G E / g Q v H h T x 6 i / y 5 r 9 x 2 + a g r Q 8 G H u / N M D M v T K U w 6 H n f T m l t f W N z q 7 x d 2 d n d 2 z 9 w D 4 9 a J s k 0 4 0 2 W y E R 3 Q m q 4 F I o 3 U a D k n V R z G o e S t 8 P x 7 c x v P 3 F t R K I e c Z L y I K Z D J S L B K F r p w f S x 7 1 a 9 m j c H W S V + Q a p Q o N F 3 v 3 q D h G U x V 8 g k N a b r e y k G O d U o m O T T S i 8 z P K V s T I e 8 a 6 m i M T d B P j 9 1 S s 6 s M i B R o m 0 p J H P 1 9 0 R O Y 2 M m c W g 7 Y 4 o j s + z N x P + 8 b o b R d Z A L l W b I F V s s i j J J M C G z v 8 l A a M 5 Q T i y h T A t 7 K 2 E j q i l D m 0 7 F h u A v v 7 x K W h c 1 3 6 v 5 9 5 f V + k 0 R R x l O 4 B T O w Y c r q M M d N K A J D I b w D K / w 5 k j n x X l 3 P h a t J a e Y O Y Y / c D 5 / A G m e j d 4 = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " N O / A V q 0 y Y s P d p G 3 K 5 Q 5 U 1 3 Q K 4 s s = " > A A A B 6 n i c b V B N S 8 N A E J 3 U r 1 q / o h 6 9 L B b B U 0 l E 0 G P R i 8 e K 9 g P a U D b b T b t 0 s w m 7 E 6 G E / g Q v H h T x 6 i / y 5 r 9 x 2 + a g r Q 8 G H u / N M D M v T K U w 6 H n f T m l t f W N z q 7 x d 2 d n d 2 z 9 w D 4 9 a J s k 0 4 0 2 W y E R 3 Q m q 4 F I o 3 U a D k n V R z G o e S t 8 P x 7 c x v P 3 F t R K I e c Z L y I K Z D J S L B K F r p w f S x 7 1 a 9 m j c H W S V + Q a p Q o N F 3 v 3 q D h G U x V 8 g k N a b r e y k G O d U o m O T T S i 8 z P K V s T I e 8 a 6 m i M T d B P j 9 1 S s 6 s M i B R o m 0 p J H P 1 9 0 R O Y 2 M m c W g 7 Y 4 o j s + z N x P + 8 b o b R d Z A L l W b I F V s s i j J J M C G z v 8 l A a M 5 Q T i y h T A t 7 K 2 E j q i l D m 0 7 F h u A v v 7 x K W h c 1 3 6 v 5 9 5 f V + k 0 R R x l O 4 B T O w Y c r q M M d N K A J D I b w D K / w 5 k j n x X l 3 P h a t J a e Y O Y Y / c D 5 / A G m e j d 4 = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " N O / A V q 0 y Y s P d p G 3 K 5 Q 5 U 1 3 Q K 4 s s = " > A A A B 6 n i c b V B N S 8 N A E J 3 U r 1 q / o h 6 9 L B b B U 0 l E 0 G P R i 8 e K 9 g P a U D b b T b t 0 s w m 7 E 6 G E / g Q v H h T x 6 i / y 5 r 9 x 2 + a g r Q 8 G H u / N M D M v T K U w 6 H n f T m l t f W N z q 7 x d 2 d n d 2 z 9 w D 4 9 a J s k 0 4 0 2 W y E R 3 Q m q 4 F I o 3 U a D k n V R z G o e S t 8 P x 7 c x v P 3 F t R K I e c Z L y I K Z D J S L B K F r p w f S x 7 1 a 9 m j c H W S V + Q a p Q o N F 3 v 3 q D h G U x V 8 g k N a b r e y k G O d U o m O T T S i 8 z P K V s T I e 8 a 6 m i M T d B P j 9 1 S s 6 s M i B R o m 0 p J H P 1 9 0 R O Y 2 M m c W g 7 Y 4 o j s + z N x P + 8 b o b R d Z A L l W b I F V s s i j J J M C G z v 8 l A a M 5 Q T i y h T A t 7 K 2 E j q i l D m 0 7 F h u A v v 7 x K W h c 1 3 6 v 5 9 5 f V + k 0 R R x l O 4 B T O w Y c r q M M d N K A J D I b w D K / w 5 k j n x X l 3 P h a t J a e Y O Y Y / c D 5 / A G m e j d 4 = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " N O / A V q 0 y Y s P d p G 3 K 5 Q 5 U 1 3 Q K 4 s s = " > A A A B 6 n i c b V B N S 8 N A E J 3 U r 1 q / o h 6 9 L B b B U 0 l E 0 G P R i 8 e K 9 g P a U D b b T b t 0 s w m 7 E 6 G E / g Q v H h T x 6 i / y 5 r 9 x 2 + a g r Q 8 G H u / N M D M v T K U w 6 H n f T m l t f W N z q 7 x d 2 d n d 2 z 9 w D 4 9 a J s k 0 4 0 2 W y E R 3 Q m q 4 F I o 3 U a D k n V R z G o e S t 8 P x 7 c x v P 3 F t R K I e c Z L y I K Z D J S L B K F r p w f S x 7 1 a 9 m j c H W S V + Q a p Q o N F 3 v 3 q D h G U x V 8 g k N a b r e y k G O d U o m O T T S i 8 z P K V s T I e 8 a 6 m i M T d B P j 9 1 S s 6 s M i B R o m 0 p J H P 1 9 0 R O Y 2 M m c W g 7 Y 4 o j s + z N x P + 8 b o b R d Z A L l W b I F V s s i j J J M C G z v 8 l A a M 5 Q T i y h T A t 7 K 2 E j q i l D m 0 7 F h u A v v 7 x K W h c 1 3 6 v 5 9 5 f V + k 0 R R x l O 4 B T O w Y c r q M M d N K A J T i T R N 2 p U = " > A A A B 6 n i c b V D L S g N B E O y N r x h f U Y 9 e B o P g K e y K o M e g F 4 8 R z Q O S J c x O Z p M h s z P L T K 8 Q Q j 7 B i w d F v P p F 3 v w b J 8 k e N L G g o a j q p r s r S q W w 6 P v f X m F t f W N z q 7 h d 2 t n d 2 z 8 o H x 4 1 r c 4 M 4 w 2 m p T b t i F o u h e I N F C h 5 O z W c J p H k r W h 0 O / N b T 9 x Y o d U j j l M e J n S g R C w Y R S c 9 6 B 7 2 y h W / 6 s 9 B V k m Q k w r k q P f K X 9 2 + Z l n C F T J J r e 0 E f o r h h B o U T P J p q Z t Z n l I 2 o g P e c V T R h N t w M j 9 1 S s 6 c 0 i e x N q 4 U k r n 6 e 2 J C E 2 v H S e Q 6 E 4 p D u + z N x P + 8 T o b x d T g R K s 2 Q K 7 Z Y F G e S o C a z v 0 l f G M 5 Q j h 2 h z A h 3 K 2 F D a i h D l 0 7 J h R A s v 7 x K m h f V w K 8 G 9 5 e V 2 k 0 e R x F O 4 B T O I Y A r q M E d 1 K E B D A b w D K / w 5 k n v x X v 3 P h a t B S + f O Y Y / 8 D 5 / A G O G j d o = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " G W E n c D H / x r S / r O 0 m 4 1 Q T i T R N 2 p U = " > A A A B 6 n i c b V D L S g N B E O y N r x h f U Y 9 e B o P g K e y K o M e g F 4 8 R z Q O S J c x O Z p M h s z P L T K 8 Q Q j 7 B i w d F v P p F 3 v w b J 8 k e N L G g o a j q p r s r S q W w 6 P v f X m F t f W N z q 7 h d 2 t n d 2 z 8 o H x 4 1 r c 4 M 4 w 2 m p T b t i F o u h e I N F C h 5 O z W c J p H k r W h 0 O / N b T 9 x Y o d U j j l M e J n S g R C w Y R S c 9 6 B 7 2 y h W / 6 s 9 B V k m Q k w r k q P f K X 9 2 + Z l n C F T J J r e 0 E f o r h h B o U T P J p q Z t Z n l I 2 o g P e c V T R h N t w M j 9 1 S s 6 c 0 i e x N q 4 U k r n 6 e 2 J C E 2 v H S e Q 6 E 4 p D u + z N x P + 8 T o b x d T g R K s 2 Q K 7 Z Y F G e S o C a z v 0 l f G M 5 Q j h 2 h z A h 3 K 2 F D a i h D l 0 7 J h R A s v 7 x K m h f V w K 8 G 9 5 e V 2 k 0 e R x F O 4 B T O I Y A r q M E d 1 K E B D A b w D K / w 5 k n v x X v 3 P h a t B S + f O Y Y / 8 D 5 / A G O G j d o = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " G W E n c D H / x r S / r O 0 m 4 1 Q T i T R N 2 p U = " > A A A B 6 n i c b V D L S g N B E O y N r x h f U Y 9 e B o P g K e y K o M e g F 4 8 R z Q O S J c x O Z p M h s z P L T K 8 Q Q j 7 B i w d F v P p F 3 v w b J 8 k e N L G g o a j q p r s r S q W w 6 P v f X m F t f W N z q 7 h d 2 t n d 2 z 8 o H x 4 1 r c 4 M 4 w 2 m p T b t i F o u h e I N F C h 5 O z W c J p H k r W h 0 O / N b T 9 x Y o d U j j l M e J n S g R C w Y R S c 9 6 B 7 2 y h W / 6 s 9 B V k m Q k w r k q P f K X 9 2 + Z l n C F T J J r e 0 E f o r h h B o U T P J p q Z t Z n l I 2 o g P e c V T R h N t w M j 9 1 S s 6 c 0 i e x N q 4 U k r n 6 e 2 J C E 2 v H S e Q 6 E 4 p D u + z N x P + 8 T o b x d T g R K s 2 Q K 7 Z Y F G e S o C a z v 0 l f G M 5 Q j h 2 h z A h 3 K 2 F D a i h D l 0 7 J h R A s v 7 x K m h f V w K 8 G 9 5 e V 2 k 0 e R x F O 4 B T O I Y A r q M E d 1 K E B D A b w D K / w 5 k n v x X v 3 P h a t B S + f O Y Y / 8 D 5 / A G O G j d o = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " G W E n c D H / x r S / r O 0 m 4 1 Q T i T R N 2 p U = " > A A A B 6 n i c b V D L S g N B E O y N r x h f U Y 9 e B o P g K e y K o M e g F 4 8 R z Q O S J c x O Z p M h s z P L T K 8 Q Q j 7 B i w d F v P p F 3 v w b J 8 k e N L G g o a j q p r s r S q W w 6 P v f X m F t f W N z q 7 h d 2 t n d 2 z 8 o H x 4 1 r c 4 M 4 w 2 m p T b t i F o u h e I N F C h 5 O z W c J p H k r W h 0 O / N b T 9 x Y o d U j j l M e J n S g R C w Y R S c 9 6 B 7 2 y h W / 6 s 9 B V k m Q k w r k q P f K X 9 2 + Z l n C F T J J r e 0 E f o r h h B o U T P J p q Z t Z n l I 2 o g P e c V T R h N t w M j 9 1 S s 6 c 0 i e x N q 4 U k r n 6 e 2 J C E 2 v H S e Q 6 E 4 p D u + z N  x P + 8 T o b x d T g R K s 2 Q K 7 Z Y F G e S o C a z v 0 l f G M 5 Q j h 2 h z A h 3 K 2 F D a i h D l 0 7 J h R A s v 7 x K m h f V w K 8 G 9 5 e V 2 k 0 e R x F O 4 B T O I Y A r q M E d 1 K E B D A b w D K / w 5 k n v x X v 3 P h a t B S + f O Y Y / 8 D 5 / A G O G j d o = < / l a t e x i t > b R d Z A L l W b I F V s s i j J J M C G z v 8 l A a M 5 Q T i y h T A t 7 K 2 E j q i l D m 0 7 F h u A v v 7 x K W h c 1 3 6 v 5 9 5 f V + k 0 R R x l O 4 B T O w Y c r q M M d N K A b R d Z A L l W b I F V s s i j J J M C G z v 8 l A a M 5 Q T i y h T A t 7 K 2 E j q i l D m 0 7 F h u A v v 7 x K W h c 1 3 6 v 5 9 5 f V + k 0 R R x l O 4 B T O w Y c r q M M d N K A b R d Z A L l W b I F V s s i j J J M C G z v 8 l A a M 5 Q T i y h T A t 7 K 2 E j q i l D m 0 7 F h u A v v 7 x K W h c 1 3 6 v 5 9 5 f V + k 0 R R x l O 4 B T O w Y c r q M M d N K A b R d Z A L l W b I F V s s i j J J M C G z v 8 l A a M 5 Q T i y h T A t 7 K 2 E j q i l D m 0 7 F h u A v v 7 x K W h c 1 3 6 v 5 9 5 f V + k 0 R R x l O 4 B T O w Y c r q M M d N K A b R d Z A L l W b I F V s s i j J J M C G z v 8 l A a M 5 Q T i y h T A t 7 K 2 E j q i l D m 0 7 F h u A v v 7 x K W h c 1 3 6 v 5 9 5 f V + k 0 R R x l O 4 B T O w Y c r q M M d N K A b R d Z A L l W b I F V s s i j J J M C G z v 8 l A a M 5 Q T i y h T A t 7 K 2 E j q i l D m 0 7 F h u A v v 7 x K W h c 1 3 6 v 5 9 5 f V + k 0 R R x l O 4 B T O w Y c r q M M d N K A b R d Z A L l W b I F V s s i j J J M C G z v 8 l A a M 5 Q T i y h T A t 7 K 2 E j q i l D m 0 7 F h u A v v 7 x K W h c 1 3 6 v 5 9 5 f V + k 0 R R x l O 4 B T O w Y c r q M M d N K A b R d Z A L l W b I F V s s i j J J M C G z v 8 l A a M 5 Q T i y h T A t 7 K 2 E j q i l D m 0 7 F h u A v v 7 x K W h c 1 3 6 v 5 9 5 f V + k 0 R R x l O 4 B T O w Y c r q M M d N K A J D I b w D K / w 5 k j n x X l 3 P h a t J a e Y O Y Y / c D 5 / A E 4 y j c w = < / l a t e x i t > r t 1 < l a t e x i t s h a 1 _ b a s e 6 4 = " t M T A Z J 4 k J 5 Y n d b d P n F v R N s b N t F s = " > A A A B 7 n i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B i y U R Q Y 9 F L x 4 r 2 A 9 o Q 9 l s N + 3 S z S b s T o Q S + i O 8 e F D E q 7 / H m / / G T Z u D t j 4 Y e L w 3 w 8 y 8 I J H C o O t + O 6 W 1 9 Y 3 N r f J 2 Z W d 3 b / + g e n j U N n G q G W + x W M a 6 G 1 D D p V C 8 h Q I l 7 y a a 0 y i Q v B N M 7 n K / 8 8 S 1 E b F 6 x G n C / Y i O l A g F o 2 i l j h 5 k e O H N B t W a W 3 f n I K v E K 0 g N C j Q H 1 a / + M G Z p x B U y S Y 3 p e W 6 C f k Y 1 C i b 5 r N J P D U 8 o m 9 A R 7 1 m q a M S N n 8 3 P n Z E z q w x J G G t b C s l c / T 2 R 0 c i Y a R T Y z o j i 2 C x 7 u f i f 1 0 s x v P E z o Z I U u W K L R W E q C c Y k / 5 0 M h e Y M 5 d Q S y r S w t x I 2 p p o y t A l V b A j e 8 s u r p H 1 Z 9 9 y 6 9 3 B V a 9 w W c Z T h B E 7 h H D y 4 h g b c Q x N a w G A C z / A K b 0 7 i v D j v z s e i t e Q U M 8 f w B 8 7 n D w l g j 1 s = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " t M T A Z J 4 k J 5 Y n d b d P n F v R N s b N t F s = " > A A A B 7 n i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B i y U R Q Y 9 F L x 4 r 2 A 9 o Q 9 l s N + 3 S z S b s T o Q S + i O 8 e F D E q 7 / H m / / G T Z u D t j 4 Y e L w 3 w 8 y 8 I J H C o O t + O 6 W 1 9 Y 3 N r f J 2 Z W d 3 b / + g e n j U N n G q G W + x W M a 6 G 1 D D p V C 8 h Q I l 7 y a a 0 y i Q v B N M 7 n K / 8 8 S 1 E b F 6 x G n C / Y i O l A g F o 2 i l j h 5 k e O H N B t W a W 3 f n I K v E K 0 g N C j Q H 1 a / + M G Z p x B U y S Y 3 p e W 6 C f k Y 1 C i b 5 r N J P D U 8 o m 9 A R 7 1 m q a M S N n 8 3 P n Z E z q w x J G G t b C s l c / T 2 R 0 c i Y a R T Y z o j i 2 C x 7 u f i f 1 0 s x v P E z o Z I U u W K L R W E q C c Y k / 5 0 M h e Y M 5 d Q S y r S w t x I 2 p p o y t A l V b A j e 8 s u r p H 1 Z 9 9 y 6 9 3 B V a 9 w W c Z T h B E 7 h H D y 4 h g b c Q x N a w G A C z / A K b 0 7 i v D j v z s e i t e Q U M 8 f w B 8 7 n D w l g j 1 s = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " t M T A Z J 4 k J 5 Y n d b d P n F v R N s b N t F s = " > A A A B 7 n i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B i y U R Q Y 9 F L x 4 r 2 A 9 o Q 9 l s N + 3 S z S b s T o Q S + i O 8 e F D E q 7 / H m / / G T Z u D t j 4 Y e L w 3 w 8 y 8 I J H C o O t + O 6 W 1 9 Y 3 N r f J 2 Z W d 3 b / + g e n j U N n G q G W + x W M a 6 G 1 D D p V C 8 h Q I l 7 y a a 0 y i Q v B N M 7 n K / 8 8 S 1 E b F 6 x G n C / Y i O l A g F o 2 i l j h 5 k e O H N B t W a W 3 f n I K v E K 0 g N C j Q H 1 a / + M G Z p x B U y S Y 3 p e W 6 C f k Y 1 C i b 5 r N J P D U 8 o m 9 A R 7 1 m q a M S N n 8 3 P n Z E z q w x J G G t b C s l c / T 2 R 0 c i Y a R T Y z o j i 2 C x 7 u f i f 1 0 s x v P E z o Z I U u W K L R W E q C c Y k / 5 0 M h e Y M 5 d Q S y r S w t x I 2 p p o y t A l V b A j e 8 s u r p H 1 Z 9 9 y 6 9 3 B V a 9 w W c Z T h B E 7 h H D y 4 h g b c Q x N a w G A C z / A K b 0 7 i v D j v z s e i t e Q U M 8 f w B 8 7 n D w l g j 1 s = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " t M T A Z J 4 k J 5 Y n d b d P n F v R N s b N t F s = " > A A A B 7 n i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B i y U R Q Y 9 F L x 4 r 2 A 9 o Q 9 l s N + 3 S z S b s T o Q S + i O 8 e F D E q 7 / H m / / G T Z u D t j 4 Y e L w 3 w 8 y 8 I J H C o O t + O 6 W 1 9 Y 3 N r f J 2 Z W d 3 b / + g e n j U N n G q G W + x W M a 6 G 1 D D p V C 8 h Q I l 7 y a a 0 y i Q v B N M 7 n K / 8 8 S 1 E b F 6 x G n C / Y i O l A g F o 2 i l j h 5 k e O H N B t W a W 3 f n I K v E K 0 g N C j Q H 1 a / + M G Z p x B U y S Y 3 p e W 6 C f k Y 1 C i b 5 r N J P D U 8 o m 9 A R 7 1 m q a M S N n 8 3 P n Z E z q w x J G G t b C s l c / T 2 R 0 c i Y a R T Y z o j i 2 C x 7 u f i f 1 0 s x v P E z o Z I U u W K L R W E q C c Y k / 5 0 M h e Y M 5 d Q S y r S w t x I 2 p p o y t A l V b A j e 8 s u r p H 1 Z 9 9 y 6 9 3 B V a 9 w W c Z T h B E 7 h H D y 4 h g b c Q x N a w G A C z / A K b 0 7 i v D j v z s e i t e Q U M 8 f w B 8 7 n D w l g j 1 s = < / l a t e x i t > Likelihood p(ot|st) < l a t e x i t s h a 1 _ b a s e 6 4 = " K 5 n s z 0 < l a t e x i t s h a 1 _ b a s e 6 4 = " z 6 e d t 5 K N S z S 7 C I w p 1 h J 0 H 6 U P 0 S 4 = " > A l A G V Y C o v x c 3 O y j b K A p N 1 M = " > A A A B 8 X i c b V B N S 8 N A E J 3 4 W e t X 1 a O X x S L U S 0 l E 0 G P R i 8 c K 9 g P b E D b b T b t 0 s w m 7 E 6 H U / g s v H h T x 6 r / x 5 r 9 x 2 + a g r Q 8 G H u / N M D M v T K U w 6 L r f z s r q 2 v r G Z m G r u L 2 z u 7 d f O j h s m i T T j D d Y I h P d D q n h U i j e Q I G S t 1 P N a R x K 3 g q H N 1 O / 9 c i 1 E Y m 6 x 1 H K / Z j 2 l Y g E o 2 i l h 7 S S B P h k A j w L S m W 3 6 s 5 A l o m X k z L k q A e l r 2 4 v Y V n M F T J J j e l 4 b o r + m G o U T P J J s Z s Z n l I 2 p H 3 e s V T R m B t / P L t 4 Q k 6 t 0 i N R o m 0 p J D P 1 9 8 S Y x s a M 4 t B 2 x h Q H Z t G b i v 9 5 n Q y j K 3 8 s V J o h V 2 y + K M o k w Y R M 3 y c 9 o T l D O b K E M i 3 s r Y Q N q K Y M b U h F G 4 K 3 + P I y a Z 5 X P b f q 3 V 2 U a 9 d 5 H A U 4 h h O o g A e X U I N b q E M D G C h l A G V Y C o v x c 3 O y j b K A p N 1 M = " > A A A B 8 X i c b V B N S 8 N A E J 3 4 W e t X 1 a O X x S L U S 0 l E 0 G P R i 8 c K 9 g P b E D b b T b t 0 s w m 7 E 6 H U / g s v H h T x 6 r / x 5 r 9 x 2 + a g r Q 8 G H u / N M D M v T K U w 6 L r f z s r q 2 v r G Z m G r u L 2 z u 7 d f O j h s m i T T j D d Y I h P d D q n h U i j e Q I G S t 1 P N a R x K 3 g q H N 1 O / 9 c i 1 E Y m 6 x 1 H K / Z j 2 l Y g E o 2 i l h 7 S S B P h k A j w L S m W 3 6 s 5 A l o m X k z L k q A e l r 2 4 v Y V n M F T J J j e l 4 b o r + m G o U T P J J s Z s Z n l I 2 p H 3 e s V T R m B t / P L t 4 Q k 6 t 0 i N R o m 0 p J D P 1 9 8 S Y x s a M 4 t B 2 x h Q H Z t G b i v 9 5 n Q y j K 3 8 s V J o h V 2 y + K M o k w Y R M 3 y c 9 o T l D O b K E M i 3 s r Y Q N q K Y M b U h F G 4 K 3 + P I y a Z 5 X P b f q 3 V 2 U a 9 d 5 H A U 4 h h O o g A e X U I N b q E M D G C h l A G V Y C o v x c 3 O y j b K A p N 1 M = " > A A A B 8 X i c b V B N S 8 N A E J 3 4 W e t X 1 a O X x S L U S 0 l E 0 G P R i 8 c K 9 g P b E D b b T b t 0 s w m 7 E 6 H U / g s v H h T x 6 r / x 5 r 9 x 2 + a g r Q 8 G H u / N M D M v T K U w 6 L r f z s r q 2 v r G Z m G r u L 2 z u 7 d f O j h s m i T T j D d Y I h P d D q n h U i j e Q I G S t 1 P N a R x K 3 g q H N 1 O / 9 c i 1 E Y m 6 x 1 H K / Z j 2 l Y g E o 2 i l h 7 S S B P h k A j w L S m W 3 6 s 5 A l o m X k z L k q A e l r 2 4 v Y V n M F T J J j e l 4 b o r + m G o U T P J J s Z s Z n l I 2 p H 3 e s V T R m B t / P L t 4 Q k 6 t 0 i N R o m 0 p J D P 1 9 8 S Y x s a M 4 t B 2 x h Q H Z t G b i v 9 5 n Q y j K 3 8 s V J o h V 2 y + K M o k w Y R M 3 y c 9 o T l D O b K E M i 3 s r Y Q N q K Y M b U h F G 4 K 3 + P I y a Z 5 X P b f q 3 V 2 U a 9 d 5 H A U 4 h h O o g A e X U I N b q E M D G C h l A G V Y C o v x c 3 O y j b K A p N 1 M = " > A A A B 8 X i c b V B N S 8 N A E J 3 4 W e t X 1 a O X x S L U S 0 l E 0 G P R i 8 c K 9 g P b E D b b T b t 0 s w m 7 E 6 H U / g s v H h T x 6 r / x 5 r 9 x 2 + a g r Q 8 G H u / N M D M v T K U w 6 L r f z s r q 2 v r G Z m G r u L 2 z u 7 d f O j h s m i T T j D d Y I h P d D q n h U i j e Q I G S t 1 P N a R x K 3 g q H N 1 O / 9 c i 1 E Y m 6 x 1 H K / Z j 2 l Y g E o 2 i l h 7 S S B P h k A j w L S m W 3 6 s 5 A l o m X k z L k q A e l r 2 4 v Y V n M F T J J j e l 4 b o r + m G o U T P J J s Z s Z n l I 2 p H 3 e s V T R m B t / P L t 4 Q k 6 t 0 i N R o m 0 p J D P 1 9 8 S Y x s a M 4 t B 2 x h Q H Z t G b i v 9 5 n Q y j K 3 8 s V J o h V 2 y + K M o k w Y R M 3 y c 9 o T l D O b K E M i 3 s r Y Q N q K Y M b U h F G 4 K 3 + P I y a Z 5 X P b f q 3 V 2 U a 9 d 5 H A U 4 h h O o g A e X U I N b q E M D G C h A A B 8 X i c b V B N S 8 N A E J 3 4 W e t X 1 a O X x S L U S 0 l E 0 G P R i 8 c K 9 g P b E D b b T b t 0 s w m 7 E 6 H U / g s v H h T x 6 r / x 5 r 9 x 2 + a g r Q 8 G H u / N M D M v T K U w 6 L r f z s r q 2 v r G Z m G r u L 2 z u 7 d f O j h s m i T T j D d Y I h P d D q n h U i j e Q I G S t 1 P N a R x K 3 g q H N 1 O / 9 c i 1 E Y m 6 x 1 H K / Z j 2 l Y g E o 2 i l h 7 R C A 3 w y A Z 4 F p b J b d W c g y 8 T L S R l y 1 I P S V 7 e X s C z m C p m k x n Q 8 N 0 V / T D U K J v m k 2 M 0 M T y k b 0 j 7 v W K p o z I 0 / n l 0 8 I a d W 6 Z E o 0 b Y U k p n 6 e 2 J M Y 2 N G c W g 7 Y 4 o D s + h N x f + 8 T o b R l T 8 W K s 2 Q K z Z f F G W S Y E K m 7 5 O e 0 J y h H F l C m R b 2 V s I G V F O G N q S i D c F b f H m Z N M + r n l v 1 7 i 7 K t e s 8 j g I c w w l U w I N L q M E t 1 K E B D B Q 8 w y u 8 O c Z 5 c d 6 d j 3 n r i p P P H M E f O J 8 / L z q Q l Q = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " z 6 e d t 5 K N S z S 7 C I w p 1 h J 0 H 6 U P 0 S 4 = " > A A A B 8 X i c b V B N S 8 N A E J 3 4 W e t X 1 a O X x S L U S 0 l E 0 G P R i 8 c K 9 g P b E D b b T b t 0 s w m 7 E 6 H U / g s v H h T x 6 r / x 5 r 9 x 2 + a g r Q 8 G H u / N M D M v T K U w 6 L r f z s r q 2 v r G Z m G r u L 2 z u 7 d f O j h s m i T T j D d Y I h P d D q n h U i j e Q I G S t 1 P N a R x K 3 g q H N 1 O / 9 c i 1 E Y m 6 x 1 H K / Z j 2 l Y g E o 2 i l h 7 R C A 3 w y A Z 4 F p b J b d W c g y 8 T L S R l y 1 I P S V 7 e X s C z m C p m k x n Q 8 N 0 V / T D U K J v m k 2 M 0 M T y k b 0 j 7 v W K p o z I 0 / n l 0 8 I a d W 6 Z E o 0 b Y U k p n 6 e 2 J M Y 2 N G c W g 7 Y 4 o D s + h N x f + 8 T o b R l T 8 W K s 2 Q K z Z f F G W S Y E K m 7 5 O e 0 J y h H F l C m R b 2 V s I G V F O G N q S i D c F b f H m Z N M + r n l v 1 7 i 7 K t e s 8 j g I c w w l U w I N L q M E t 1 K E B D B Q 8 w y u 8 O c Z 5 c d 6 d j 3 n r i p P P H M E f O J 8 / L z q Q l Q = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " z 6 e d t 5 K N S z S 7 C I w p 1 h J 0 H 6 U P 0 S 4 = " > A A A B 8 X i c b V B N S 8 N A E J 3 4 W e t X 1 a O X x S L U S 0 l E 0 G P R i 8 c K 9 g P b E D b b T b t 0 s w m 7 E 6 H U / g s v H h T x 6 r / x 5 r 9 x 2 + a g r Q 8 G H u / N M D M v T K U w 6 L r f z s r q 2 v r G Z m G r u L 2 z u 7 d f O j h s m i T T j D d Y I h P d D q n h U i j e Q I G S t 1 P N a R x K 3 g q H N 1 O / 9 c i 1 E Y m 6 x 1 H K / Z j 2 l Y g E o 2 i l h 7 R C A 3 w y A Z 4 F p b J b d W c g y 8 T L S R l y 1 I P S V 7 e X s C z m C p m k x n Q 8 N 0 V / T D U K J v m k 2 M 0 M T y k b 0 j 7 v W K p o z I 0 / n l 0 8 I a d W 6 Z E o 0 b Y U k p n 6 e 2 J M Y 2 N G c W g 7 Y 4 o D s + h N x f + 8 T o b R l T 8 W K s 2 Q K z Z f F G W S Y E K m 7 5 O e 0 J y h H F l C m R b 2 V s I G V F O G N q S i D c F b f H m Z N M + r n l v 1 7 i 7 K t e s 8 j g I c w w l U w I N L q M E t 1 K E B D B Q 8 w y u 8 O c Z 5 c d 6 d j 3 n r i p P P H M E f O J 8 / L z q Q l Q = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " z 6 e d t 5 K N S z S 7 C I w p 1 h J 0 H 6 U P 0 S 4 = " > A A A B 8 X i c b V B N S 8 N A E J 3 4 W e t X 1 a O X x S L U S 0 l E 0 G P R i 8 c K 9 g P b E D b b T b t 0 s w m 7 E 6 H U / g s v H h T x 6 r / x 5 r 9 x 2 + a g r Q 8 G H u / N M D M v T K U w 6 L r f z s r q 2 v r G Z m G r u L 2 z u 7 d f O j h s m i T T j D d Y I h P d D q n h U i j e Q I G S t 1 P N a R x K 3 g q H N 1 O / 9 c i 1 E Y m 6 x 1 H K / Z j 2 l Y g E o 2 i l h 7 R C A 3 w y A Z 4 F p b J b d W c g y 8 T L S R l y 1 I P S V 7 e X s C z m C p m k x n Q 8 N 0 V / T D U K J v m k 2 M 0 M T y k b 0 j 7 v W K p o z I 0 / n l 0 8 I a d W 6 Z E o 0 b Y U k p n 6 e 2 J M Y 2 N G c W g 7 Y 4 o D s + h N x f + 8 T o b R l T 8 W K s 2 Q K z Z f F G W S Y E K m 7 5 O e 0 J y h H F l C m R b 2 V s I G V F O G N q S i D c F b f H m Z N M + r n l v 1 7 i 7 K t e s 8 j g I c w w l U w I N L q M E t 1 K E B D B Q 8 w y u 8 O c Z 5 c d 6 d j 3 n r i p P P H M E f O J 8 / L z q Q l Q = = < / l a t e x i t >

Policy posterior

Policy prior q(a t |s t ) < l a t e x i t s h a 1 _ b a s e 6  4 = " d X c L I Q 2 p Q Q 0 m 0 l k a e i c d Q 5 l H G K w = " > A A A B 8 X i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 B I t Q L y U R Q Y 9 F L x 4 r 2 A 9 s Q 9 h s N + 3 S z S b u T o Q S + y + 8 e F D E q / / G m / / G b Z u D t j 4 Y e L w 3 w 8 y 8 I B F c o + N 8 W 4 W V 1 b X 1 j e J m a W t 7 Z 3 e v v H / Q 0 n G q K G v S W M S q E x D N B J e s i R w F 6 y S K k S g Q r B 2 M r q d + + 5 E p z W N 5 h + O E e R E Z S B 5 y S t B I 9 w 9 V 4 u O T 9 v H U L 1 e c m j O D v U z c n F Q g R 8 M v f / X 6 M U 0 j J p E K o n X X d R L 0 M q K Q U 8 E m p V 6 q W U L o i A x Y 1 1 B J I q a 9 b H b x x D 4 x S t 8 O Y 2 V K o j 1 T f 0 9 k J N J 6 H A W m M y I 4 1 I v e V P z P 6 6 Y Y X n o Z l 0 m K T N L 5 o j A V N s b 2 9 H 2 7 z x W j K M a G E K q 4 u d W m Q 6 I I R R N S y Y T g L r 6 8 T F p n N d e p u b f n l f p V H k c R j u A Y q u D C B d T h B h r Q B A o S n u E V 3 i x t v V j v 1 s e 8 t W D l M 4 f w B 9 b n D z D H k J Y = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " d X c L I Q 2 p Q Q 0 m 0 l k a e i c d Q 5 l H G K w = " > A A A B 8 X i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 B I t Q L y U R Q Y 9 F L x 4 r 2 A 9 s Q 9 h s N + 3 S z S b u T o Q S + y + 8 e F D E q / / G m / / G b Z u D t j 4 Y e L w 3 w 8 y 8 I B F c o + N 8 W 4 W V 1 b X 1 j e J m a W t 7 Z 3 e v v H / Q 0 n G q K G v S W M S q E x D N B J e s i R w F 6 y S K k S g Q r B 2 M r q d + + 5 E p z W N 5 h + O E e R E Z S B 5 y S t B I 9 w 9 V 4 u O T 9 v H U L 1 e c m j O D v U z c n F Q g R 8 M v f / X 6 M U 0 j J p E K o n X X d R L 0 M q K Q U 8 E m p V 6 q W U L o i A x Y 1 1 B J I q a 9 b H b x x D 4 x S t 8 O Y 2 V K o j 1 T f 0 9 k J N J 6 H A W m M y I 4 1 I v e V P z P 6 6 Y Y X n o Z l 0 m K T N L 5 o j A V N s b 2 9 H 2 7 z x W j K M a G E K q 4 u d W m Q 6 I I R R N S y Y T g L r 6 8 T F p n N d e p u b f n l f p V H k c R j u A Y q u D C B d T h B h r Q B A o S n u E V 3 i x t v V j v 1 s e 8 t W D l M 4 f w B 9 b n D z D H k J Y = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " d X c L I Q 2 p Q Q 0 m 0 l k a e i c d Q 5 l H G K w = " > A A A B 8 X i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 B I t Q L y U R Q Y 9 F L x 4 r 2 A 9 s Q 9 h s N + 3 S z S b u T o Q S + y + 8 e F D E q / / G m / / G b Z u D t j 4 Y e L w 3 w 8 y 8 I B F c o + N 8 W 4 W V 1 b X 1 j e J m a W t 7 Z 3 e v v H / Q 0 n G q K G v S W M S q E x D N B J e s i R w F 6 y S K k S g Q r B 2 M r q d + + 5 E p z W N 5 h + O E e R E Z S B 5 y S t B I 9 w 9 V 4 u O T 9 v H U L 1 e c m j O D v U z c n F Q g R 8 M v f / X 6 M U 0 j J p E K o n X X d R L 0 M q K Q U 8 E m p V 6 q W U L o i A x Y 1 1 B J I q a 9 b H b x x D 4 x S t 8 O Y 2 V K o j 1 T f 0 9 k J N J 6 H A W m M y I 4 1 I v e V P z P 6 6 Y Y X n o Z l 0 m K T N L 5 o j A V N s b 2 9 H 2 7 z x W j K M a G E K q 4 u d W m Q 6 I I R R N S y Y T g L r 6 8 T F p n N d e p u b f n l f p V H k c R j u A Y q u D C B d T h B h r Q B A o S n u E V 3 i x t v V j v 1 s e 8 t W D l M 4 f w B 9 b n D z D H k J Y = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " d X c L I Q 2 p Q Q 0 m 0 l k a e i c d Q 5 l H G K w = " > A A A B 8 X i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 B I t Q L y U R Q Y 9 F L x 4 r 2 A 9 s Q 9 h s N + 3 S z S b u T o Q S + y + 8 e F D E q / / G m / / G b Z u D t j 4 Y e L w 3 w 8 y 8 I B F c o + N 8 W 4 W V 1 b X 1 j e J m a W t 7 Z 3 e v v H / Q 0 n G q K G v S W M S q E x D N B J e s i R w F 6 y S K k S g Q r B 2 M r q d + + 5 E p z W N 5 h + O E e R E Z S B 5 y S t B I 9 w 9 V 4 u O T 9 v H U L 1 e c m j O D v U z c n F Q g R 8 M v f / X 6 M U 0 j J p E K o n X X d R L 0 M q K Q U 8 E m p V 6 q W U L o i A x Y 1 1 B J I q a 9 b H b x x D 4 x S t 8 O Y 2 V K o j 1 T f 0 9 k J N J 6 H A W m M y I 4 1 I v e V P z P 6 6 Y Y X n o Z l 0 m K T N L 5 o j A V N s b 2 9 H 2 7 z x W j K M a G E K q 4 u d W m Q 6 I I R R N S y Y T g L r 6 8 T F p n N d e p u b f n l f p V H k c R j u A Y q u D C B d T h B h r Q B A o S n u E V 3 i x t v V j v 1 s e 8 t W D l M 4 f w B 9 b n D z D H k J Y = < / l a t e x i t > q(s t |o t ) < l a t e x i t s h a 1 _ b a s e 6 4 = " 1 q W G U k q N l w T G x M E I I n 4 k l p O I j E w = " > A A A B 8 X i c b V B N S w M x E M 3 W r 1 q / q h 6 9 B I t Q L 2 V X B D 0 W v X i s Y D + w X Z Z s m m 1 D s 8 m a z A p l 7 b / w 4 k E R r / 4 b b / 4 b 0 3 Y P 2 v p g 4 P H e D D P z w k R w A 6 7 7 7 R R W V t f W N 4 q b p a 3 t n d 2 9 8 v 5 B y 6 h U U 9 a k S i j d C Y l h g k v W B A 6 C d R L N S B w K 1 g 5 H 1 1 O / / c i 0 4 U r e w T h h f k w G k k e c E r D S / U P V B P C k A j g N y h W 3 5 s 6 A l 4 m X k w r K 0 Q j K X 7 2 + o m n M J F B B j O l 6 b g J + R j R w K t i k 1 E s N S w g d k Q H r W i p J z I y f z S 6 e 4 B O r 9 H G k t C 0 J e K b + n s h I b M w 4 D m 1 n T G B o F r 2 p + J / X T S G 6 9 D M u k x S Y p P N F U S o w K D x 9 H / e 5 Z h T E 2 B J C N b e 3 Y j o k m l C w I Z V s C N 7 i y 8 u k d V b z 3 J p 3 e 1 6 p X + V x F N E R O k Z V 5 K E L V E c 3 q I G a i C K J n t E r e n O M 8 + K 8 O x / z 1 o K T z x y i P 3 A + f w B G c Z C k < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " 1 q W G U k q N l w T G x M E I I n 4 k l p O I j E w = " > A A A B 8 X i c b V B N S w M x E M 3 W r 1 q / q h 6 9 B I t Q L 2 V X B D 0 W v X i s Y D + w X Z Z V D L S s N A F J 3 U V 6 2 v q O D G z W A R K m h J R N B l 0 Y 3 L C v Y B b Q i T 6 a Q d n E z C z I 1 Q 0 i 7 8 F T c u F f d G R i K t h 1 F g J i M C A z 3 v T c T / v E 4 K 4 Z W X c Z m k w C S d P R S m A k O M J 2 H g H l e M g h g a Q q j i 5 l Z M B 0 Q R C i a y k g n B n f / y I m m e V 1 2 n 6 t 5 d l G v X e R x F d I i O U A W 5 6 B L V 0 C 2 q o w a i a I S e 0 S t 6 s 5 6 s F + v d + p i N F q x 8 Z x / 9 g f X 5 A + e p l V w = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " 6 j 8  E B R T u D k m k G d D c + w m 4 t 4 p B I j E = " > A A A B / 3 i c b V D L S s N A F J 3 U V 6 2 v q O D G z W A R K m h J R N B l 0 Y 3 L C v Y B b Q i T 6 a Q d n E z C z I 1 Q 0 i 7 8 F T c u F f d G R i K t h 1 F g J i M C A z 3 v T c T / v E 4 K 4 Z W X c Z m k w C S d P R S m A k O M J 2 H g H l e M g h g a Q q j i 5 l Z M B 0 Q R C i a y k g n B n f / y I m m e V 1 2 n 6 t 5 d l G v X e R x F d I i O U A W 5 6 B L V 0 C 2 q o w a i a I S e 0 S t 6 s 5 6 s F + v d + p i N F q x 8 Z x / 9 g f X 5 A + e p l V w = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " 6 j 8  E B R T u D k m k G d D c + w m 4 t 4 p B I j E = " > A A A B / 3 i c b V D L S s N A F J 3 U V 6 2 v q O D G z W A R K m h J R N B l 0 Y 3 L C v Y B b Q i T 6 a Q d n E z C z I 1 Q 0 i 7 8 F T c u F f d G R i K t h 1 F g J i M C A z 3 v T c T / v E 4 K 4 Z W X c Z m k w C S d P R S m A k O M J 2 H g H l e M g h g a Q q j i 5 l Z M B 0 Q R C i a y k g n B n f / y I m m e V 1 2 n 6 t 5 d l G v X e R x F d I i O U A W 5 6 B L V 0 C 2 q o w a i a I S e 0 S t 6 s 5 6 s F + v d + p i N F q x 8 Z x / 9 g f X 5 A + e p l V w = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " 6 j 8  E B R T u D k m k G d D c + w m 4 t 4 p B I j E = " > A A A B / 3 i c b V D L S s N A F J 3 U V 6 2 v q O D G z W A R K m h J R N B l 0 Y 3 L C v Y B b Q i T 6 a Q d n E z C z I 1 Q 0 i 7 8 F T c u F f d G R i K t h 1 F g J i M C A z 3 v T c T / v E 4 K 4 Z W X c Z m k w C S d P R S m A k O M J 2 H g H l e M g h g a Q q j i 5 l Z M B 0 Q R C i a y k g n B n f / y I m m e V 1 2 n 6 t 5 d l G v X e R x F d I i O U A W 5 6 B L V 0 C 2 q o w a i a I S e 0 S t 6 s 5 6 s F + v d + p i N F q x 8 Z x / 9 g f X 5 A + e p l V w = < / l a t e x i t >

KL

Negative Log Likelihood p(at+1|st+1) < l a t e x i t s h a 1 _ b a s e 6 4 = " P M z S H A + U u m 0 h d n U S E Z D v q F Y X G F Q = " > A A A B + 3 i c b V D L S s N A F J 3 4 r P U V 6 9 L N Y B E q Q k l E 0 G X R j c s K 9 g F t C J P p p B 0 6 m Y S Z G 7 H E / I o b F 4 q 4 9 U f c + T d O 2 y y 0 9 c D l H s 6 5 l 7 l z g k R w D Y 7 z b a 2 s r q 1 v b J a 2 y t s 7 u 3 v 7 9 k G l r e N U U d a i s Y h V N y C a C S 5 Z C z g I 1 k 0 U I 1 E g W C c Y 3 0 z 9 z g N T m s f y H i Y J 8 y I y l D z k l I C R f L u S 1 I i f w Z m b P + l 5 P / X t q l N 3 Z s D L x C 1 I F R V o + v Z X f x D T N G I S q C B a 9 1 w n A S 8 j C j g V L C / 3 U 8 0 S Q s d k y H q G S h I x 7 W W z 2 3 N 8 Y p Q B D m N l S g K e q b 8 3 M h J p P Y k C M x k R G O l F b y r + 5 / V S C K + 8 j M s k B S b p / K E w F R h i P A 0 C D 7 h i F M T E E E I V N 7 d i O i K K U D B x l U 0 I 7 u K X l 0 n 7 v O 4 6 d f f u o t q 4 L u I o o S N 0 j G r I R Z e o g W 5 R E 7 U Q R Y / o G b 2 i N y u 3 X q x 3 6 2 M + u m I V O 4 f o D 6 z P H / z N k 7 4 = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " P M z S H A + U u m 0 h d n U S E Z D v q F Y X G F Q = " > A A A B + 3 i c b V D L S s N A F J 3 4 r P U V 6 9 L N Y B E q Q k l E 0 G X R j c s K 9 g F t C J P p p B 0 6 m Y S Z G 7 H E / I o b F 4 q 4 9 U f c + T d O 2 y y 0 9 c D l H s 6 5 l 7 l z g k R w D Y 7 z b a 2 s r q 1 v b J a 2 y t s 7 u 3 v 7 9 k G l r e N U U d a i s Y h V N y C a C S 5 Z C z g I 1 k 0 U I 1 E g W C c Y 3 0 z 9 z g N T m s f y H i Y J 8 y I y l D z k l I C R f L u S 1 I i f w Z m b P + l 5 P / X t q l N 3 Z s D L x C 1 I F R V o + v Z X f x D T N G I S q C B a 9 1 w n A S 8 j C j g V L C / 3 U 8 0 S Q s d k y H q G S h I x 7 W W z 2 3 N 8 Y p Q B D m N l S g K e q b 8 3 M h J p P Y k C M x k R G O l F b y r + 5 / V S C K + 8 j M s k B S b p / K E w F R h i P A 0 C D 7 h i F M T E E E I V N 7 d i O i K K U D B x l U 0 I 7 u K X l 0 n 7 v O 4 6 d f f u o t q 4 L u I o o S N 0 j G r I R Z e o g W 5 R E 7 U Q R Y / o G b 2 i N y u 3 X q x 3 6 2 M + u m I V O 4 f o D 6 z P H / z N k 7 4 = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " P M z S H A + U u m 0 h d n U S E Z D v q F Y X G F Q = " > A A A B + 3 i c b V D L S s N A F J 3 4 r P U V 6 9 L N Y B E q Q k l E 0 G X R j c s K 9 g F t C J P p p B 0 6 m Y S Z G 7 H E / I o b F 4 q 4 9 U f c + T d O 2 y y 0 9 c D l H s 6 5 l 7 l z g k R w D Y 7 z b a 2 s r q 1 v b J a 2 y t s 7 u 3 v 7 9 k G l r e N U U d a i s Y h V N y C a C S 5 Z C z g I 1 k 0 U I 1 E g W C c Y 3 0 z 9 z g N T m s f y H i Y J 8 y I y l D z k l I C R f L u S 1 I i f w Z m b P + l 5 P / X t q l N 3 Z s D L x C 1 I F R V o + v Z X f x D T N G I S q C B a 9 1 w n A S 8 j C j g V L C / 3 U 8 0 S Q s d k y H q G S h I x 7 W W z 2 3 N 8 Y p Q B D m N l S g K e q b 8 3 M h J p P Y k C M x k R G O l F b y r + 5 / V S C K + 8 j M s k B S b p / K E w F R h i P A 0 C D 7 h i F M T E E E I V N 7 d i O i K K U D B x l U 0 I 7 u K X l 0 n 7 v O 4 6 d f f u o t q 4 L u I o o S N 0 j G r I R Z e o g W 5 R E 7 U Q R Y / o G b 2 i N y u 3 X q x 3 6 2 M + u m I V O 4 f o D 6 z P H / z N k 7 4 = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " P M z S H A + U u m 0 h d n U S E Z D v q F Y X G F Q = " > A A A B + 3 i c b V D L S s N A F J 3 4 r P U V 6 9 L N Y B E q Q k l E 0 G X R j c s K 9 g F t C J P p p B 0 6 m Y S Z G 7 H E / I o b F 4 q 4 9 U f c + T d O 2 y y 0 9 c D l H s 6 5 l 7 l z g k R w D Y 7 z b a 2 s r q 1 v b J a 2 y t s 7 u 3 v 7 9 k G l r e N U U d a i s Y h V N y C a C S 5 Z C z g I 1 k 0 U I 1 E g W C c Y 3 0 z 9 z g N T m s f y H i Y J 8 y I y l D z k l I C R f L u S 1 I i f w Z m b P + l 5 P / X t q l N 3 Z s D L x C 1 I F R V o + v Z X f x D T N G I S q C B a 9 1 w n A S 8 j C j g V L C / 3 U 8 0 S Q s d k y H q G S h I x 7 W W z 2 3 N 8 Y p Q B D m N l S g K e q b 8 3 M h J p P Y k C M x k R G O l F b y r + 5 / V S C K + 8 j M s k B S b p / K E w F R h i P A 0 C D 7 h i F M T E E E I V N 7 d i O i K K U D B x l U 0 I 7 u K X l 0 n 7 v O 4 6 d f f u o t q 4 L u I o o S N 0 j G r I R Z e o g W 5 R E 7 U Q R Y / o G b 2 i N y u 3 X q x 3 6 2 M + u m I V O 4 f o D 6 z P H / z N k 7 4 = < / l a t e x i t > q(st+1|ot+1) < l a t e x i t s h a 1 _ b a s e 6 4 = " Q j o g K 1 4 N Y B A Q p F C u R J j D A V W s c 6 E = " > A A A B + 3 i c b V D L S s N A F J 3 U V 6 2 v W J d u g k W o C C U R Q Z d F N y 4 r 2 A e 0 I U y m 0 3 b o Z C b O 3 I g l 5 l f c u F D E r T / i z r 9 x 2 m a h r Q c u 9 3 D O v c y d E 8 a c a X D d b 6 u w s r q 2 v l H c L G 1 t 7 + z u 2 f v l l p a J I r R J J J e q E 2 J N O R O 0 C Q w 4 7 c S K 4 i j k t B 2 O r 6 d + + 4 E q z a S 4 g 0 l M / Q g P B R s w g s F I g V 2 + r + o g h V M v e 5 L z f h L Y F b f m z u A s E y 8 n F Z S j E d h f v b 4 k S U Q F E I 6 1 7 n p u D H 6 K F T D C a V b q J Z r G m I z x k H Y N F T i i 2 k 9 n t 2 f O s V H 6 z k A q U w K c m f p 7 I 8 W R 1 p M o N J M R h p F e 9 K b i f 1 4 3 g c G l n z I R J 0 A F m T 8 0 S L g D 0 p k G 4 f S Z o g T 4 x B B M F D O 3 O m S E F S Z g 4 i q Z E L z F L y + T 1 l n N c 2 v e 7 X m l f p X H U U S H 6 A h V k Y c u U B 3 d o A Z q I o I e 0 T N 6 R W 9 W Z r 1 Y 7 9 b H f L R g 5 T s H 6 A + s z x 8 U m 5 P N < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " Q j o g K 1 4 N Y B A Q p F C u R J j D A V W s c 6 E = " > A A A B + 3 i c b V D L S s N A F J 3 U V 6 2 v W J d u g k W o C C U R Q Z d F N y 4 r 2 A e 0 I U y m 0 3 b o Z C b O 3 I g l 5 l f c u F D E r T / i z r 9 x 2 m a h r Q c u 9 3 D O v c y d E 8 a c a X D d b 6 u w s r q 2 v l H c L G 1 t 7 + z u 2 f v l l p a J I r R J J J e q E 2 J N O R O 0 C Q w 4 7 c S K 4 i j k t B 2 O r 6 d + + 4 E q z a S 4 g 0 l M / Q g P B R s w g s F I g V 2 + r + o g h V M v e 5 L z f h L Y F b f m z u A s E y 8 n F Z S j E d h f v b 4 k S U Q F E I 6 1 7 n p u D H 6 K F T D C a V b q J Z r G m I z x k H Y N F T i i 2 k 9 n t 2 f O s V H 6 z k A q U w K c m f p 7 I 8 W R 1 p M o N J M R h p F e 9 K b i f 1 4 3 g c G l n z I R J 0 A F m T 8 0 S L g D 0 p k G 4 f S Z o g T 4 x B B M F D O 3 O m S E F S Z g 4 i q Z E L z F L y + T 1 l n N c 2 v e 7 X m l f p X H U U S H 6 A h V k Y c u U B 3 d o A Z q I o I e 0 T N 6 R W 9 W Z r 1 Y 7 9 b H f L R g 5 T s H 6 A + s z x 8 U m 5 P N < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " Q j o g K 1 4 N Y B A Q p F C u R J j D A V W s c 6 E = " > A A A B + 3 i c b V D L S s N A F J 3 U V 6 2 v W J d u g k W o C C U R Q Z d F N y 4 r 2 A e 0 I U y m 0 3 b o Z C b O 3 I g l 5 l f c u F D E r T / i z r 9 x 2 m a h r Q c u 9 3 D O v c y d E 8 a c a X D d b 6 u w s r q 2 v l H c L G 1 t 7 + z u 2 f v l l p a J I r R J J J e q E 2 J N O R O 0 C Q w 4 7 c S K 4 i j k t B 2 O r 6 d + + 4 E q z a S 4 g 0 l M / Q g P B R s w g s F I g V 2 + r + o g h V M v e 5 L z f h L Y F b f m z u A s E y 8 n F Z S j E d h f v b 4 k S U Q F E I 6 1 7 n p u D H 6 K F T D C a V b q J Z r G m I z x k H Y N F T i i 2 k 9 n t 2 f O s V H 6 z k A q U w K c m f p 7 I 8 W R 1 p M o N J M R h p F e 9 K b i f 1 4 3 g c G l n z I R J 0 A F m T 8 0 S L g D 0 p k G 4 f S Z o g T 4 x B B M F D O 3 O m S E F S Z g 4 i q Z E L z F L y + T 1 l n N c 2 v e 7 X m l f p X H U U S H 6 A h V k Y c u U B 3 d o A Z q I o I e 0 T N 6 R W 9 W Z r 1 Y 7 9 b H f L R g 5 T s H 6 A + s z x 8 U m 5 P N < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " Q j o g K 1 4 N Y B A Q p F C u R J j D A V W s c 6 E = " > A A A B + 3 i c b V D L S s N A F J 3 U V 6 2 v W J d u g k W o C C U R Q Z d F N y 4 r 2 A e 0 I U y m 0 3 b o Z C b O 3 I g l 5 l f c u F D E r T / i z r 9 x 2 m a h r Q c u 9 3 D O v c y d E 8 a c a X D d b 6 u w s r q 2 v l H c L G 1 t 7 + z u 2 f v l l p a J I r R J J J e q E 2 J N O R O 0 C Q w 4 7 c S K 4 i j k t B 2 O r 6 d + + 4 E q z a S 4 g 0 l M / Q g P B R s w g s F I g V 2 + r + o g h V M v e 5 L z f h L Y F b f m z u A s E y 8 n F Z S j E d h f v b 4 k S U Q F E I 6 1 7 n p u D H 6 K F T D C a V b q J Z r G m I z x k H Y N F T i i 2 k 9 n t 2 f O s V H 6 z k A q U w K c m f p 7 I 8 W R 1 p M o N J M R h p F e 9 K b i f 1 4 3 g c G l n z I R J 0 A F m T 8 0 S L g D 0 p k G 4 f S Z o g T 4 x B B M F D O 3 O m S E F S Z g 4 i q Z E L z F L y + T 1 l n N c 2 v e 7 X m l f p X H U U S H 6 A h V k Y c u U B 3 d o A Z q I o I e 0 T N 6 R W 9 W Z r 1 Y 7 9 b H f L R g 5 T s H 6 A + s z x 8 U m 5 P N < / l a t e x i t > q(st+1) = Eq(s t )q(at|st)[p(st+1|st, at)] < l a t e x i t s h a 1 _ b a s e 6 4 = " 2 + c W D V J Y d W f t A t k a R Z T P o E J Q f S M = " > A A A C J X i c b V D L S s N A F J 3 U V 6 2 v q k s 3 w S K 0 K C U R Q R c K R R F c V r A P S E O Y T C f t 0 M m j M z d C i f k Z N / 6 K G x c W E V z 5 K 0 7 a C l o 9 M H D m n H u Z O c e N O J N g G B 9 a b m F x a X k l v 1 p Y W 9 / Y 3 C p u 7 z R l G A t C G y T k o W i 7 W F L O A t o A B p y 2 I 0 G x 7 3 L a c g d X m d + 6 p 0 K y M L i D U U R t H / c C 5 j G C Q U l O 8 X x Y l k 4 C h 2 Z a u e j 4 G P q u m 1 y n T p L J U B m W s Q M P G U u t 6 H s w u x 8 p v W I 7 x Z J R N S b Q / x J z R k p o h r p T H H e 6 I Y l 9 G g D h W E r L N C K w E y y A E U 7 T Q i e W N M J k g H v U U j T A P p V 2 M k m Z 6 g d K 6 e p e K N Q J Q J + o P z c S 7 E s 5 8 l 0 1 m Q W R 8 1 4 m / u d Z M X h n d s K C K A Y a k O l D X s x 1 C P W s M r 3 L B C X A R 4 p g I p j 6 q 0 7 6 W G A C q t i C K s G c j / y X N I + r p l E 1 b 0 9 K t c t Z H X m 0 h / Z R G Z n o F N X Q D a q j B i L o E T 2 j V z T W n r Q X 7 U 1 7 n 4 7 m t N n O L v o F 7 f M L / P K k 7 w = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " 2 + c W D V J Y d W f t A t k a R Z T P o E J Q f S M = " > A A A C J X i c b V D L S s N A F J 3 U V 6 2 v q k s 3 w S K 0 K C U R Q R c K R R F c V r A P S E O Y T C f t 0 M m j M z d C i f k Z N / 6 K G x c W E V z 5 K 0 7 a C l o 9 M H D m n H u Z O c e N O J N g G B 9 a b m F x a X k l v 1 p Y W 9 / Y 3 C p u 7 z R l G A t C G y T k o W i 7 W F L O A t o A B p y 2 I 0 G x 7 3 L a c g d X m d + 6 p 0 K y M L i D U U R t H / c C 5 j G C Q U l O 8 X x Y l k 4 C h 2 Z a u e j 4 G P q u m 1 y n T p L J U B m W s Q M P G U u t 6 H s w u x 8 p v W I 7 x Z J R N S b Q / x J z R k p o h r p T H H e 6 I Y l 9 G g D h W E r L N C K w E y y A E U 7 T Q i e W N M J k g H v U U j T A P p V 2 M k m Z 6 g d K 6 e p e K N Q J Q J + o P z c S 7 E s 5 8 l 0 1 m Q W R 8 1 4 m / u d Z M X h n d s K C K A Y a k O l D X s x 1 C P W s M r 3 L B C X A R 4 p g I p j 6 q 0 7 6 W G A C q t i C K s G c j / y X N I + r p l E 1 b 0 9 K t c t Z H X m 0 h / Z R G Z n o F N X Q D a q j B i L o E T 2 j V z T W n r Q X 7 U 1 7 n 4 7 m t N n O L v o F 7 f M L / P K k 7 w = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " 2 + c W D V J Y d W f t A t k a R Z T P o E J Q f S M = " > A A A C J X i c b V D L S s N A F J 3 U V 6 2 v q k s 3 w S K 0 K C U R Q R c K R R F c V r A P S E O Y T C f t 0 M m j M z d C i f k Z N / 6 K G x c W E V z 5 K 0 7 a C l o 9 M H D m n H u Z O c e N O J N g G B 9 a b m F x a X k l v 1 p Y W 9 / Y 3 C p u 7 z R l G A t C G y T k o W i 7 W F L O A t o A B p y 2 I 0 G x 7 3 L a c g d X m d + 6 p 0 K y M L i D U U R t H / c C 5 j G C Q U l O 8 X x Y l k 4 C h 2 Z a u e j 4 G P q u m 1 y n T p L J U B m W s Q M P G U u t 6 H s w u x 8 p v W I 7 x Z J R N S b Q / x J z R k p o h r p T H H e 6 I Y l 9 G g D h W E r L N C K w E y y A E U 7 T Q i e W N M J k g H v U U j T A P p V 2 M k m Z 6 g d K 6 e p e K N Q J Q J + o P z c S 7 E s 5 8 l 0 1 m Q W R 8 1 4 m / u d Z M X h n d s K C K A Y a k O l D X s x 1 C P W s M r 3 L B C X A R 4 p g I p j 6 q 0 7 6 W G A C q t i C K s G c j / y X N I + r p l E 1 b 0 9 K t c t Z H X m 0 h / Z R G Z n o F N X Q D a q j B i L o E T 2 j V z T W n r Q X 7 U 1 7 n 4 7 m t N n O L v o F 7 f M L / P K k 7 w = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " 2 + c W D V J Y d W f t A t k a R Z T P o E J Q f S M = " > A A A C J X i c b V D L S s N A F J 3 U V 6 2 v q k s 3 w S K 0 K C U R Q R c K R R F c V r A P S E O Y T C f t 0 M m j M z d C i f k Z N / 6 K G x c W E V z 5 K 0 7 a C l o 9 M H D m n H u Z O c e N O J N g G B 9 a b m F x a X k l v 1 p Y W 9 / Y 3 C p u 7 z R l G A t C G y T k o W i 7 W F L O A t o A B p y 2 I 0 G x 7 3 L a c g d X m d + 6 p 0 K y M L i D U U R t H / c C 5 j G C Q U l O 8 X x Y l k 4 C h 2 Z a u e j 4 G P q u m 1 y n T p L J U B m W s Q M P G U u t 6 H s w u x 8 p v W I 7 x Z J R N S b Q / x J z R k p o h r p T H H e 6 I Y l 9 G g D h W E r L N C K w E y y A E U 7 T Q i e W N M J k g H v U U j T A P p V 2 M k m Z 6 g d K 6 e p e K N Q J Q J + o P z c S 7 E s 5 8 l 0 1 m Q W R 8 1 4 m / u d Z M X h n d s K C K A Y a k O l D X s x 1 C P W s M r 3 L B C X A R 4 p g I p j 6 q 0 7 6 W G A C q t i C K s G c j / y X N I + r p l E 1 b 0 9 K t c t Z H X m 0 h / Z R G Z n o F N X Q D a q j B i L o E T 2 j V z T W n r Q X 7 U 1 7 n 4 7 m t N n O L v o F 7 f M L / P K k 7 w = = < / l a t e x i t >

Expected reward

Figure 1 : Deep Free Energy Network (FENet) calculation process. The left side shows how to calculate the Free Energy using data at hand. The right side shows how to calculate the Expected Free Energy for RL with latent imagination.

3.2. IMITATION AND RL OBJECTIVES

To account for the long term future, the agent has to calculate the Expected Free Energy at t + 1 to ∞. F = F t + ∞ τ =t+1 γ τ -t-1 G τ We define this curly F to be the objective that the Deep Free Energy Network should minimize. Note that γ is a discount factor as in the case of general RL algorithms. As it is impossible to sum over infinity time steps, we introduce an Expected Free Energy Value function V (s t+1 ) to estimate the cumulative Expected Free Energy. Similarly to the case of Temporal Difference learning of Deep Q Network (Mnih et al., 2013) , we use a target network V targ (s t+2 ) to stabilize the learning process and define the loss for the value function as follows. L = ||G t+1 + γV targ (s t+2 ) -V (s t+1 )|| 2 (28) We made a design choice that the agent uses the value function only for RL, and not for imitation. In imitation, we use only the real value of the Expected Free Energy G t+1 at the next time step t + 1. This is because imitation learning can be achieved without long term prediction as the agent is given the experts' all time series data available. On the other hand, in RL, using the value function to predict rewards in the long-term future is essential to avoid a local minimum and achieve the desired goal. In conclusion, the objective functions of Deep Free Energy Network (FENet) for a data sequence (o t , a t , r t , o t+1 ) are as follows. F IL = F t + G IL t+1 (29) F RL = F t + G RL t+1 + γV ωtarg (s t+2 ) L = ||G RL t+1 + γV targ (s t+2 ) -V (s t+1 )|| 2 The overall Free Energy calculation process is shown in Figure 1 .

3.3. NETWORK ARCHITECTURE AND CALCULATION

For implementation, we made a design choice to use Recurrent State Space Model (Hafner et al., 2019b) , a latent dynamics model with both deterministic and stochastic components. In this model, the hidden states s t are split into two parts: stochastic hidden states s t and deterministic hidden states h t . The deterministic transition of h t is modeled using Recurrent Neural Networks (RNN) f as follows. h t = f (h t-1 , s t-1 , a t-1 ) (32) 

4.3. PERFORMANCE WITH SUBOPTIMAL EXPERTS

In real-world robot learning, expert trajectories are often given by human experts. It is natural to assume that expert trajectories are suboptimal and that there remains much room for improvement. We compare the performance of FENet to Behavioral Cloning imitation methods. We use two types of networks for behavioral cloning methods: recurrent policy and recurrent decoder policy. The recurrent policy π R (a t |o t ) is neural networks with one gated recurrent unit cell and three dense layers. The recurrent decoder policy π R (a t , o t+1 |o t ) is neural networks with one gated recurrent unit cell and four dense layers and deconvolution layers as in the decoder of PlaNet. Both networks does not get raw pixel observations but take observations encoded by the same convolutional encoder as PlaNet's. Figure 4 shows that while imitation methods overfit to the expert and cannot surpass the suboptimal expert performance, FENet is able to substantially surpass the suboptimal expert's performance. 'Imitation RL' is the default FENet agent that does imitation learning and RL at the same time, minimizing F IL + F RL . 'Imitation-pretrained RL' is an agent that first learns the model only with imitation (minimizing F IL ) and then does RL using the pre-trained model (minimizing F RL ). 'RL only' is an agent that does RL only, minimizing F RL . 'Imitation only' is an agent that does imitation only, minimizing F IL . While 'imitation only' gives the best performance and 'imitation RL' gives the second best in Cheetah-run, 'imitation RL' gives the best performance and 'imitation only' gives the worst performance in Walker-walk. We could say 'imitation RL' is the most robust to the properties of tasks.

5. RELATED WORK

Active Inference Friston, who first proposed Active Inference, has evaluated the performance in simple control tasks and a low-dimensional maze (Friston et al., 2012; 2015) . Ueltzhoffer implemented Active Inference with Deep Neural Networks and evaluated the performance in a simple control task (Ueltzhöffer, 2018). Millidge proposed a Deep Active Inference framework with value functions to estimate the correct Free Energy and succeeded in solving Gym environments (Millidge, 2019) . Our approach extends Deep Active Inference to combine imitation and RL, solving more challenging tasks. RL from demonstration Reinforced Imitation Learning succeeds in reducing sample complexity by using imitation as pre-training before RL (Pfeiffer et al., 2018) . Adding demonstrations into a replay buffer of off policy RL methods also demonstrates high sample efficiency (Vecerik et al., 2017; Nair et al., 2018; Paine et al., 2019) . Demo Augmented Policy Gradient mixes the policy gradient with a behavioral cloning gradient (Rajeswaran* et al., 2018) . Deep Q-learning from Demonstrations (DQfD) not only use demonstrations for pre-training but also calculates gradients target smoothing rate ρ = 0.01. We use Adam (Kingma & Ba, 2014) with learning rates α = 10 -3 and scale down gradient norms that exceed 1000. We scale the reward-related loss by 100, the policy-prior-related loss by 10. We clip KL loss between the hidden states below 3 free nats and clip KL loss between the policies below 0.6.



r 2 P e e u K l 8 8 c o T / w P n 8 A 3 B S P x w = = < / l a t e x i t > s t < l a t e x i t s h a 1 _ b a s e 6 4 = " N O

D I b w D K / w 5 k j n x X l 3 P h a t J a e Y O Y Y / c D 5 / A G m e j d 4 = < / l a t e x i t > o t < l a t e x i t s h a 1 _ b a s e 6 4 = " G W E n c D H / x r S / r O 0 m 4 1 Q

t e x i t s h a 1 _ b a s e 6 4 = " c g 9 e T E q U t C Z q k w C W 5 k h Y w 4 m e 7 m E = " >A A A B 6 n i c b V B N S 8 N A E J 3 U r 1 q / o h 6 9 L B b B U 0 l E 0 G P R i 8 e K 9 g P a U D b b T b t 0 s w m 7 E 6 G E / g Q v H h T x 6 i / y 5 r 9 x 2 + a g r Q 8 G H u / N M D M v T K U w 6 H n f T m l t f W N z q 7 x d 2 d n d 2 z 9 w D 4 9 a J s k 0 4 0 2 W y E R 3 Q m q 4 F I o 3 U a D k n V R z G o e S t 8 P x 7 c x v P 3 F t R K I e c Z L y I K Z D J S L B K F r p g f a x 7 1 a 9 m j c H W S V + Q a p Q o N F 3 v 3 q D h G U x V8 g k N a b r e y k G O d U o m O T T S i 8 z P K V s T I e 8 a 6 m i M T d B P j 9 1 S s 6 s M i B R o m 0 p J H P 1 9 0 R O Y 2 M m c W g 7 Y 4 o j s + z N x P + 8 b o

J D I b w D K / w 5 k j n x X l 3 P h a t J a e Y O Y Y / c D 5 / A E 4 y j c w = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " c g 9 e T E q U t C Z q k w C W 5 k h Y w 4 m e 7 m E = " >A A A B 6 n i c b V B N S 8 N A E J 3 U r 1 q / o h 6 9 L B b B U 0 l E 0 G P R i 8 e K 9 g P a U D b b T b t 0 s w m 7 E 6 G E / g Q v H h T x 6 i / y 5 r 9 x 2 + a g r Q 8 G H u / N M D M v T K U w 6 H n f T m l t f W N z q 7 x d 2 d n d 2 z 9 w D 4 9 a J s k 0 4 0 2 W y E R 3 Q m q 4 F I o 3 U a D k n V R z G o e S t 8 P x 7 c x v P 3 F t R K I e c Z L y I K Z D J S L B K F r p g f a x 7 1 a 9 m j c H W S V + Q a p Q o N F 3 v 3 q D h G U x V8 g k N a b r e y k G O d U o m O T T S i 8 z P K V s T I e 8 a 6 m i M T d B P j 9 1 S s 6 s M i B R o m 0 p J H P 1 9 0 R O Y 2 M m c W g 7 Y 4 o j s + z N x P + 8 b o

J D I b w D K / w 5 k j n x X l 3 P h a t J a e Y O Y Y / c D 5 / A E 4 y j c w = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " c g 9 e T E q U t C Z q k w C W 5 k h Y w 4 m e 7 m E = " >A A A B 6 n i c b V B N S 8 N A E J 3 U r 1 q / o h 6 9 L B b B U 0 l E 0 G P R i 8 e K 9 g P a U D b b T b t 0 s w m 7 E 6 G E / g Q v H h T x 6 i / y 5 r 9 x 2 + a g r Q 8 G H u / N M D M v T K U w 6 H n f T m l t f W N z q 7 x d 2 d n d 2 z 9 w D 4 9 a J s k 0 4 0 2 W y E R 3 Q m q 4 F I o 3 U a D k n V R z G o e S t 8 P x 7 c x v P 3 F t R K I e c Z L y I K Z D J S L B K F r p g f a x 7 1 a 9 m j c H W S V + Q a p Q o N F 3 v 3 q D h G U x V8 g k N a b r e y k G O d U o m O T T S i 8 z P K V s T I e 8 a 6 m i M T d B P j 9 1 S s 6 s M i B R o m 0 p J H P 1 9 0 R O Y 2 M m c W g 7 Y 4 o j s + z N x P + 8 b o

J D I b w D K / w 5 k j n x X l 3 P h a t J a e Y O Y Y / c D 5 / A E 4 y j c w = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " c g 9 e T E q U t C Z q k w C W 5 k h Y w 4 m e 7 m E = " >A A A B 6 n i c b V B N S 8 N A E J 3 U r 1 q / o h 6 9 L B b B U 0 l E 0 G P R i 8 e K 9 g P a U D b b T b t 0 s w m 7 E 6 G E / g Q v H h T x 6 i / y 5 r 9 x 2 + a g r Q 8 G H u / N M D M v T K U w 6 H n f T m l t f W N z q 7 x d 2 d n d 2 z 9 w D 4 9 a J s k 0 4 0 2 W y E R 3 Q m q 4 F I o 3 U a D k n V R z G o e S t 8 P x 7 c x v P 3 F t R K I e c Z L y I K Z D J S L B K F r p g f a x 7 1 a 9 m j c H W S V + Q a p Q o N F 3 v 3 q D h G U x V8 g k N a b r e y k G O d U o m O T T S i 8 z P K V s T I e 8 a 6 m i M T d B P j 9 1 S s 6 s M i B R o m 0 p J H P 1 9 0 R O Y 2 M m c W g 7 Y 4 o j s + z N x P + 8 b o

J D I b w D K / w 5 k j n x X l 3 P h a t J a e Y O Y Y / c D 5 / A E 4 y j c w = < / l a t e x i t > a t < l a t e x i t s h a 1 _ b a s e 6 4 = " c g 9 e T E q U t C Z q k w C W 5 k h Y w 4 m e 7 m E = " >A A A B 6 n i c b V B N S 8 N A E J 3 U r 1 q / o h 6 9 L B b B U 0 l E 0 G P R i 8 e K 9 g P a U D b b T b t 0 s w m 7 E 6 G E / g Q v H h T x 6 i / y 5 r 9 x 2 + a g r Q 8 G H u / N M D M v T K U w 6 H n f T m l t f W N z q 7 x d 2 d n d 2 z 9 w D 4 9 a J s k 0 4 0 2 W y E R 3 Q m q 4 F I o 3 U a D k n V R z G o e S t 8 P x 7 c x v P 3 F t R K I e c Z L y I K Z D J S L B K F r p g f a x 7 1 a 9 m j c H W S V + Q a p Q o N F 3 v 3 q D h G U x V8 g k N a b r e y k G O d U o m O T T S i 8 z P K V s T I e 8 a 6 m i M T d B P j 9 1 S s 6 s M i B R o m 0 p J H P 1 9 0 R O Y 2 M m c W g 7 Y 4 o j s + z N x P + 8 b o

J D I b w D K / w 5 k j n x X l 3 P h a t J a e Y O Y Y / c D 5 / A E 4 y j c w = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " c g 9 e T E q U t C Z q k w C W 5 k h Y w 4 m e 7 m E = " >A A A B 6 n i c b V B N S 8 N A E J 3 U r 1 q / o h 6 9 L B b B U 0 l E 0 G P R i 8 e K 9 g P a U D b b T b t 0 s w m 7 E 6 G E / g Q v H h T x 6 i / y 5 r 9 x 2 + a g r Q 8 G H u / N M D M v T K U w 6 H n f T m l t f W N z q 7 x d 2 d n d 2 z 9 w D 4 9 a J s k 0 4 0 2 W y E R 3 Q m q 4 F I o 3 U a D k n V R z G o e S t 8 P x 7 c x v P 3 F t R K I e c Z L y I K Z D J S L B K F r p g f a x 7 1 a 9 m j c H W S V + Q a p Q o N F 3 v 3 q D h G U x V8 g k N a b r e y k G O d U o m O T T S i 8 z P K V s T I e 8 a 6 m i M T d B P j 9 1 S s 6 s M i B R o m 0 p J H P 1 9 0 R O Y 2 M m c W g 7 Y 4 o j s + z N x P + 8 b o

J D I b w D K / w 5 k j n x X l 3 P h a t J a e Y O Y Y / c D 5 / A E 4 y j c w = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " c g 9 e T E q U t C Z q k w C W 5 k h Y w 4 m e 7 m E = " >A A A B 6 n i c b V B N S 8 N A E J 3 U r 1 q / o h 6 9 L B b B U 0 l E 0 G P R i 8 e K 9 g P a U D b b T b t 0 s w m 7 E 6 G E / g Q v H h T x 6 i / y 5 r 9 x 2 + a g r Q 8 G H u / N M D M v T K U w 6 H n f T m l t f W N z q 7 x d 2 d n d 2 z 9 w D 4 9 a J s k 0 4 0 2 W y E R 3 Q m q 4 F I o 3 U a D k n V R z G o e S t 8 P x 7 c x v P 3 F t R K I e c Z L y I K Z D J S L B K F r p g f a x 7 1 a 9 m j c H W S V + Q a p Q o N F 3 v 3 q D h G U x V8 g k N a b r e y k G O d U o m O T T S i 8 z P K V s T I e 8 a 6 m i M T d B P j 9 1 S s 6 s M i B R o m 0 p J H P 1 9 0 R O Y 2 M m c W g 7 Y 4 o j s + z N x P + 8 b o

J D I b w D K / w 5 k j n x X l 3 P h a t J a e Y O Y Y / c D 5 / A E 4 y j c w = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " c g 9 e T E q U t C Z q k w C W 5 k h Y w 4 m e 7 m E = " >A A A B 6 n i c b V B N S 8 N A E J 3 U r 1 q / o h 6 9 L B b B U 0 l E 0 G P R i 8 e K 9 g P a U D b b T b t 0 s w m 7 E 6 G E / g Q v H h T x 6 i / y 5 r 9 x 2 + a g r Q 8 G H u / N M D M v T K U w 6 H n f T m l t f W N z q 7 x d 2 d n d 2 z 9 w D 4 9 a J s k 0 4 0 2 W y E R 3 Q m q 4 F I o 3 U a D k n V R z G o e S t 8 P x 7 c x v P 3 F t R K I e c Z L y I K Z D J S L B K F r p g f a x 7 1 a 9 m j c H W S V + Q a p Q o N F 3 v 3 q D h G U x V8 g k N a b r e y k G O d U o m O T T S i 8 z P K V s T I e 8 a 6 m i M T d B P j 9 1 S s 6 s M i B R o m 0 p J H P 1 9 0 R O Y 2 M m c W g 7 Y 4 o j s + z N x P + 8 b o

4 h l d 4 c 4 z z 4 r w 7 H / P W F S e f O Y I / c D 5 / A E T U k K M = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " K 5 n s z 0

4 h l d 4 c 4 z z 4 r w 7 H / P W F S e f O Y I / c D 5 / A E T U k K M = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " K 5 n s z 0

4 h l d 4 c 4 z z 4 r w 7 H / P W F S e f O Y I / c D 5 / A E T U k K M = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " K 5 n s z 0

4 h l d 4 c 4 z z 4 r w 7 H / P W F S e f O Y I / c D 5 / A E T U k K M = < / l a t e x i t > p(at|st)

s m m 1 D s 8 m a z A p l 7 b / w 4 k E R r / 4 b b / 4 b 0 3 Y P 2 v p g 4 P H e D D P z w k R w A 6 7 7 7 R R W V t f W N 4 q b p a 3 t n d 2 9 8 v 5 B y 6 h U U 9 a k S i j d C Y l h g k v W B A 6 C d R L N S B w K 1 g 5 H 1 1 O / / c i 0 4 U r e w T h h f k w G k k e c E r D S / U P VB P C k A j g N y h W 3 5 s 6 A l 4 m X k w r K 0 Q j K X 7 2 + o m n M J F B B j O l 6 b g J + R j R w K t i k 1 E s N S w g d k Q H r W i p J z I y f z S 6 e 4 B O r 9 H G k t C 0 J e K b + n s h I b M w 4 D m 1 n T G B o F r 2 p + J / X T S G 6 9 D M u k x S Y p P N F U S o w K D x 9 H / e 5 Z h T E 2 B J C N b e 3 Y j o k m l C w I Z V s C N 7 i y 8 u k d V b z 3 J p 3 e 1 6 p X + V x F N E R O k Z V 5 K E L V E c 3 q I G a i C K J n t E r e n O M 8 + K 8 O x / z 1 o K T z x y i P 3 A + f w B G c Z C k < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " 1 q W G U k q N l w T G x M E I I n 4 k l p O I j E w = " > A A A B 8 X i c b V B N S w M x E M 3 W r 1 q / q h 6 9 B I t Q L 2 V X B D 0 W v X i s Y D + w X Z Z s m m 1 D s 8 m a z A p l 7 b / w 4 k E R r / 4 b b / 4 b 0 3 Y P 2 v p g 4 P H e D D P z w k R w A 6 7 7 7 R R W V t f W N 4 q b p a 3 t n d 2 9 8 v 5 B y 6 h U U 9 a k S i j d C Y l h g k v W B A 6 C d R L N S B w K 1 g 5 H 1 1 O / / c i 0 4 U r e w T h h f k w G k k e c E r D S / U P V B P C k A j g N y h W 3 5 s 6 A l 4 m X k w r K 0 Q j K X 7 2 + o m n M J F B B j O l 6 b g J + R j R w K t i k 1 E s N S w g d k Q H r W i p J z I y f z S 6 e 4 B O r 9 H G k t C 0 J e K b + n s h I b M w 4 D m 1 n T G B o F r 2 p + J / X T S G 6 9 D M u k x S Y p P N F U S o w K D x 9 H / e 5 Z h T E 2 B J C N b e 3 Y j o k m l C w I Z V s C N 7 i y 8 u k d V b z 3 J p 3 e 1 6 p X + V x F N E R O k Z V 5 K E L V E c 3 q I G a i C K J n t E r e n O M 8 + K 8 O x / z 1 o K T z x y i P 3 A + f w B G c Z C k < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " 1 q W G U k q N l w T G x M E I I n 4 k l p O I j E w = " > A A A B 8 X i c b V B N S w M x E M 3 W r 1 q / q h 6 9 B I t Q L 2 V X B D 0 W v X i s Y D + w X Z Z s m m 1 D s 8 m a z A p l 7 b / w 4 k E R r / 4 b b / 4 b 0 3 Y P 2 v p g 4 P H e D D P z w k R w A 6 7 7 7 R R W V t f W N 4 q b p a 3 t n d 2 9 8 v 5 B y 6 h U U 9 a k S i j d C Y l h g k v W B A 6 C d R L N S B w K 1 g 5 H 1 1 O / / c i 0 4 U r e w T h h f k w G k k e c E r D S / U P V B P C k A j g N y h W 3 5 s 6 A l 4 m X k w r K 0 Q j K X 7 2 + o m n M J F B B j O l 6 b g J + R j R w K t i k 1 E s N S w g d k Q H r W i p J z I y f z S 6 e 4 B O r 9 H G k t C 0 J e K b + n s h I b M w 4 D m 1 n T G B o F r 2 p + J / X T S G 6 9 D M u k x S Y p P N F U S o w K D x 9 H / e 5 Z h T E 2 B J C N b e 3 Y j o k m l C w I Z V s C N 7 i y 8 u k d V b z 3 J p 3 e 1 6 p X + V x F N E R O k Z V 5 K E L V E c 3 q I G a i C K J n t Er e n O M 8 + K 8 O x / z 1 o K T z x y i P 3 A + f w B G c Z C k < / l a t e x i t > p(st|st 1, at 1) < l a t e x i t s h a 1 _ b a s e 6 4 = " 6 j 8 E B R T u D k m k G d D c + w m 4 t 4 p B I j E = " > A A A B / 3 i c b

H H r b 7 j z b 5 y 2 W W j r g c s 9 n HM v c + c E i e A a H O f b K i w t r 6 y u F d d L G 5 t b 2 z v 2 7 l 5 T x 6 m i r E F j E a t 2 Q D Q T X L I G c B C s n S h G o k C w V v B w M / F b j 0 x p H s t 7 G C b M i 0 h f 8 p B T A k b y 7 Y O k o n 0 Y a T + D M 3 d 8 S m b 9 x L f L T t W Z A i 8 S N y d l l K P u 2 1 / d X k z T i E mg g m j d c Z 0 E v I w o 4 F S w c a m b a p Y Q + k D 6 r G O o J B H T X j a 9 f 4 y P j d L D Y a x M S c B T 9

H H r b 7 j z b 5 y 2 W W j r g c s 9 n HM v c + c E i e A a H O f b K i w t r 6 y u F d d L G 5 t b 2 z v 2 7 l 5 T x 6 m i r E F j E a t 2 Q D Q T X L I G c B C s n S h G o k C w V v B w M / F b j 0 x p H s t 7 G C b M i 0 h f 8 p B T A k b y 7 Y O k o n 0 Y a T + D M 3 d 8 S m b 9 x L f L T t W Z A i 8 S N y d l l K P u 2 1 / d X k z T i E mg g m j d c Z 0 E v I w o 4 F S w c a m b a p Y Q + k D 6 r G O o J B H T X j a 9 f 4 y P j d L D Y a x M S c B T 9

H H r b 7 j z b 5 y 2 W W j r g c s 9 n HM v c + c E i e A a H O f b K i w t r 6 y u F d d L G 5 t b 2 z v 2 7 l 5 T x 6 m i r E F j E a t 2 Q D Q T X L I G c B C s n S h G o k C w V v B w M / F b j 0 x p H s t 7 G C b M i 0 h f 8 p B T A k b y 7 Y O k o n 0 Y a T + D M 3 d 8 S m b 9 x L f L T t W Z A i 8 S N y d l l K P u 2 1 / d X k z T i E mg g m j d c Z 0 E v I w o 4 F S w c a m b a p Y Q + k D 6 r G O o J B H T X j a 9 f 4 y P j d L D Y a x M S c B T 9

H H r b 7 j z b 5 y 2 W W j r g c s 9 n HM v c + c E i e A a H O f b K i w t r 6 y u F d d L G 5 t b 2 z v 2 7 l 5 T x 6 m i r E F j E a t 2 Q D Q T X L I G c B C s n S h G o k C w V v B w M / F b j 0 x p H s t 7 G C b M i 0 h f 8 p B T A k b y 7 Y O k o n 0 Y a T + D M 3 d 8 S m b 9 x L f L T t W Z A i 8 S N y d l l K P u 2 1 / d X k z T i E mg g m j d c Z 0 E v I w o 4 F S w c a m b a p Y Q + k D 6 r G O o J B H T X j a 9 f 4 y P j d L D Y a x M S c B T 9

Figure 4: Comparison of FENet to imitation learning methods when only suboptimal experts are available in Cheetah-run. Plots show test performance over learning iterations. Behavioral Cloning imitation methods cannot surpass the suboptimal expert's return which FENet successfully surpasses. The lines show means and the areas show standard deviations over 10 trajectories.

Figure5compares learning strategies of FENet in Cheetah-run and Walker-walk (ablation study). 'Imitation RL' is the default FENet agent that does imitation learning and RL at the same time, minimizing F IL + F RL . 'Imitation-pretrained RL' is an agent that first learns the model only with imitation (minimizing F IL ) and then does RL using the pre-trained model (minimizing F RL ). 'RL only' is an agent that does RL only, minimizing F RL . 'Imitation only' is an agent that does imitation only, minimizing F IL . While 'imitation only' gives the best performance and 'imitation RL' gives the second best in Cheetah-run, 'imitation RL' gives the best performance and 'imitation only' gives the worst performance in Walker-walk. We could say 'imitation RL' is the most robust to the properties of tasks.

annex

We model these probabilities as feedforward Neural Networks that output the mean and standard deviation of the random variables according to the Gaussian distribution. The parameters θ, φ, ψ, ω are network parameters to be learned. Using the network parameters, the objective loss functions can be written as follows.when Algorithm 1 in Appendix shows overall calculations using these losses. The agent minimizes F IL for expert data D E and the agent minimizes F RL for agent data D A that the agent collects on its own.

4. EXPERIMENTS

We evaluate FENet on three continuous control tasks from images. We compare our model with model-based RL and model-based imitation RL in dense and sparse reward setting when optimal expert is available. Then we compare our model with imitation learning methods when only suboptimal experts are available. Finally, we investigate the merits of combining imitation and RL as an ablation study.

Control tasks

We used Cheetah-run, Walker-walk, and Quadruped-walk tasks, image-based continuous control tasks of DeepMind Control Suite (Tassa et al., 2018) shown in Figure 6 . The agent gets rewards ranging from 0 to 1. Quadruped-walk is the most difficult as it has more action dimensions than the others. Walker-walk is more challenging than Cheehtah-run because an agent first has to stand up and then walk, meaning that the agent easily falls down on the ground, which is difficult to predict. The episode length is 1000 steps starting from randomized initial states. We use action repeat R = 4 for the Cheetah-run task, and R = 2 for the Walker-walk task and the Quadruped-walk task.

4.1. PERFORMANCE IN STANDARD VISUAL CONTROL TASKS

We compare the performance of FENet to PlaNet (RL) and "PlaNet with demonstrations" (imitation RL) in standard visual control tasks mentioned above. We use PlaNet as a baseline method because PlaNet is one of the most basic methods using Recurrent State Space Model, on top of which we build our model. As FENet uses expert data, we create "PlaNet with demonstrations" for fair comparison. This variant of PlaNet has an additional experience replay pre-populated with expert trajectories and minimize a loss calculated from the expert data in addition to PlaNet's original loss.Figure 2 shows that "PlaNet with demonstrations" is always better than PlaNet and that FENet is ranked higher as the difficulty of tasks gets higher. In Cheetah-run, FENet gives competitive performance with PlaNet. In Walker-walk, FENet and "PlaNet with demonstrations" are almost competitive, both of which are substantially better than PlaNet thanks to expert knowledge being leveraged to increase sample efficiency. In Quadruped-walk, FENet is slightly better than the other two baselines.

4.2. PERFORMANCE IN SPARSE-REWARD VISUAL CONTROL TASKS

In real-world robot learning, it is demanding to craft a dense reward function to lead robots to desired behaviors. It would be helpful if an agent could acquire desired behaviors simply by giving sparse signals. We compare the performance of FENet to PlaNet and "PlaNet with demonstrations" in sparse-reward settings, where agents do not get rewards less than 0.5 per time step (Note that in the original implementation of Cheetah-run, Walker-walk and Quadruped-walk, agents get rewards ranging from 0 to 1 per time step). Figure 3 shows that FENet outperforms PlaNet and "PlaNet with demonstrations" in all three tasks. In Cheetah-run, PlaNet and "PlaNet with demonstrations" are not able to get even a single reward. from demonstrations and environment interaction data (Hester et al., 2018) . Truncated HORizon Policy Search uses demonstrations to shape rewards so that subsequent planning can achieve superior performance to RL even when experts are suboptimal (Sun et al., 2018) . Soft Q Imitation Learning gives rewards that encourage the agent to return to demonstrated states in order to avoid policy collapse (Reddy et al., 2019) . Our approach is similar to DQfD in terms of mixing gradients calculated from demonstrations and from environment interaction data. One key difference is that FENet concurrently learns the generative model of the world so that it can be robust to wider environment properties.Control with latent dynamics model World Models acquire latent spaces and dynamics over the spaces separately, and evolve simple linear controllers to solve visual control tasks (Ha & Schmidhuber, 2018) . PlaNet learns Recurrent State Space Model and does planning with Model Predictive Control at test phase (Hafner et al., 2019b) . Dreamer, which is recently built upon PlaNet, has a policy for latent imagination and achieved higher performance than PlaNet (Hafner et al., 2019a) .Our approach also uses Recurrent State Space Model to describe variational inference, and we are the first to combine imitation and RL over latent dynamics models to the best of our knowledge.

6. CONCLUSION

We present FENet, an agent that combines Imitation Learning and Reinforcement Learning using Free Energy objectives. For this, we theoretically extend the Free Energy Principle and introduce a policy prior that encodes experts' behaviors and a policy posterior that learns to maximize expected rewards without deviating too much from the policy prior. FENet outperforms model-based RL and imitation RL especially in visual control tasks with sparse rewards and FENet also outperforms suboptimal experts' performance unlike Behavioral cloning. Strong potentials in sparse environment with suboptimal experts are important factors for real-world robot learning.Directions for future work include learning the balance between imitation and RL, i.e. Free Energy and Expected Free Energy so that the agent can select the best approach to solve its confronting tasks by monitoring the value of Free Energy. It is also important to evaluate FENet in real-world robotics tasks to show that our method is effective in more realistic settings that truly appear in the real world. 

A.2 IMPLEMENTATION

To stabilize the learning process, we adopt burn-in, a technique to recover initial states of RNN's hidden variables h t (Kapturowski et al., 2019) . As shown in Algorithm 1, the agent calculates the Free Energy with mini batches sampled from the expert or agent experience replay buffer D, which means that h t is initialized randomly in every mini batch calculation. Since the Free Energy heavily depends on h t , it is crucial to estimate the accurate hidden states. We set a burn-in period when a portion of the replay sequence is used only for unrolling the networks to produce initial states. After the period, we update the networks only on the remaining part of the sequence.We use PyTorch (Paszke et al., 2017) to write neural networks and run experiments using NVIDIA GeForce GTX 1080 Ti / RTX 2080 Ti / Tesla V100 GPU (1 GPU per experiment). The training time for our FENet implementation is about 24 hours on the Control Suite environment. As for the hyper parameters, we use the convolutional encoder and decoder networks from (Ha & Schmidhuber, 2018) and Recurrent State Space Model from (Hafner et al., 2019b) and implement all other functions as three dense layers of size 200 with ReLU activations (Nair & Hinton, 2010) . We made a design choice to make the policy prior, the policy posterior, and the observation likelihood, the reward likelihood deterministic functions while the state prior and the state posterior are stochastic. We use the batch size B = 25 for 'imitation RL' with FENet, and B = 50 for other types and baseline methods. We use the chunk length L = 50, the burn-in period 20. We use seed episodes S = 40, expert episodes N = 10000 trained with PlaNet (Hafner et al., 2019b) , collect interval C = 100 and action exploration noise Normal(0, 0.3). We use the discount factor γ = 0.99 and the 

