STOCHASTIC BRIDGES AS EFFECTIVE REGULARIZERS FOR PARAMETER-EFFICIENT TUNING

Abstract

Parameter-efficient tuning methods (PETs) have achieved promising results in tuning large pre-trained language models (PLMs). By formalizing frozen PLMs and additional tunable parameters as systems and controls respectively, PETs can be theoretically grounded to optimal control and further viewed as optimizing the terminal cost and running cost in the optimal control literature. Despite the elegance of this theoretical grounding, in practice, existing PETs often ignore the running cost and only optimize the terminal cost, i.e., focus on optimizing the loss function of the output state, regardless of the running cost that depends on the intermediate states. Since it is non-trivial to directly model the intermediate states and design a running cost function, we propose to use latent stochastic bridges to regularize the intermediate states and use the regularization as the running cost of PETs. As the first work to propose regularized PETs that use stochastic bridges as the regularizers (running costs) for the intermediate states, we show the effectiveness and generality of this regularization across different tasks, PLMs and PETs. In view of the great potential and capacity, we believe more sophisticated regularizers can be designed for PETs and better performance can be achieved in the future.

1. INTRODUCTION

Recent years have witnessed the dramatic growth of pre-trained language models (PLMs) in various fields (Devlin et al., 2019; Dosovitskiy et al., 2021) . As the size of PLMs continues to increase, the number of parameters has now even reached hundreds of billions (Brown et al., 2020; Smith et al., 2022) , making fine-tuning the whole PLM both computationally impractical and environmentally unfriendly. In view of this, a variety of Parameter-Efficient Tuning methods (PETs) are proposed (Houlsby et al., 2019; Hu et al., 2022; Zaken et al., 2022; Lester et al., 2021) . By only tuning a small number of additional parameters, PETs can be comparable to full-parameter fine-tuning. Despite the success of PETs, their underlying mechanism remains an open problem. Recently, several works have proposed to interpret PETs with optimal control theory. Yang & Liu (2022) first show that the optimization in Prefix Tuning (Li & Liang, 2021) (a typical method of PETs) can be considered as the search for optimal control variables in the context of optimal control, i.e., the trainable prefixes can be seen as the control variables that drive the PLM (the system) to the desired output. Ding et al. (2022) further show that the optimal control perspective can be applied to almost all PETs. The optimization of PETs' parameters can be seen as minimizing the two cost functions in the optimal control literature: (1) terminal cost L T , which measures the quality of the terminal state, and (2) running cost L R , which measures the feasibility of the controlled intermediate states and the control variables. Although L T can well correspond to the loss function of the model output, L R is only vaguely described as the regularizers on the parameters of PETs (control variables) in Yang & Liu (2022) and Ding et al. (2022) , ignoring the dependency of L R on the intermediate states. In this work, we show that designing a running cost to regularize intermediate states not only makes the optimal control perspective of PETs more theoretically sound, but also empirically leads to better PETs. We begin by assuming that in PLMs, the intermediate hidden states for generating different tokens in a sentence have different dynamics (or trajectories), and the dynamics can be approximated with stochastic processes in a latent space. Specifically, we first freeze the PLM and learn a mapping from the original hidden state space of the PLM to a latent space. In the latent space, the dynamics of the intermediate hidden states for generating different target tokens can be approximated with  P q C E M 7 2 G 7 3 S 1 l C L z 0 z B M u + A t c = " > A A A C V X i c b V D L T h s x F H W m F G j S l r R d s r G K K q U S i m Z A b V m i d s M S p A a Q M t P R t e N M L O y Z k X 2 n I r L 8 B / 2 q / g R i 2 x X b s g a p z o R F e R z J 1 t E 5 9 2 E f V i t p M Y 4 v O 9 G z l e e r a + s v u r 2 X r 1 5 v 9 N + 8 P b Z V Y 7 g Y 8 U p V 5 p S B F U q W Y o Q S l T i t j Q D N l D h h Z 9 8 W / s l P Y a y s y u 8 4 r 0 W m o S j l V H L A I O X 9 1 K X t k L E p W O b i 4 c 6 n 7 f Z K f J G n B W g N g 5 R p N / O 5 S 1 G c o 5 P W + x 8 O c + m 3 a c r A u K X 9 h P / R 5 / 2 t e B i 3 o I 9 J c k e 2 9 n v X X 9 d 6 v 2 4 O 8 / 6 f d F L x R o s S u Q J r x 0 l c Y + b A o O R K + G 7 a W F E D P 4 N C j A M t Q Q u b u f Y D n n 4 I y o R O K x N O i b R V / + 9 w o K 2 d a x Y q N e D M P v Q W 4 l P e u M H p X u Z k W T c o S r 5 c N G 0 U x Y o u I q U T a Q R H N Q 8 E u J H h v f 8 + h W V i m X p G X R z G g n c T l m L E Q x W i t 0 d H f a X N G Q 7 i b R f 3 t v u V W D C R O h Q F C b W E D N z o U O g 1 6 C Z M s b E 7 o Z f 9 v v w R k k w J B t H 7 u f E T f v + q x K 7 7 2 E z I 4 W g K R C O l W o E f g 6 R x h I Y 4 d S U w k L R H J N L 3 K Y N S 1 M s q I p 0 / y 3 j b V q l 6 b U y a S s F r 6 / + n N B Y K N U V i e 0 U G D r q r 9 c T / / M a B b Q O I 8 3 S v A C a k s G h V s E 9 y L x e U F 6 T S U q A d y 3 B R D L 7 q 0 c 6 W G I C N s 5 f V x J h S j a U 4 G 8 E o 6 S 2 U w 7 2 y 7 u n N p 0 t N M A M W k P r a A s F 6 A A d o R N U Q V V E 0 C 1 6 Q I / o y b l z n p 1 X 5 2 3 Q O u Y M Z 1 b R L z g f 3 5 H z r F s = < / l a t e x i t > µ is ti The drift vector field of [is] bridge The drift vector field defined by < l a t e x i t s h afoot_0 _ b a s e 6 4 = " S P 1 J Z f 4 B 4 A 0  2 P E 3 k 1 f X y s v d P w f E = " > A A A B / 3 i c b V C 7 S g N B F L 0 b X 0 l 8 R S 3 T D A b F K u w q q G X A x j K C e U C y h N n J 7 G b I z O w y M y u E J Y W / Y K l i a y e 2 f o q t X + L k U Z j E A x c O 5 9 z L v f c E C W f a u O 6 3 k 1 t b 3 9 j c y h e K 2 z u 7 e / u l g 8 O m j l N F a I P E P F b t A G v K m a Q N w w y n 7 U R R L A J O W 8 H w Z u K 3 H q j S L J b 3 Z p R Q X + B I s p A R b K z U j n r d C A u B e 6 W K W 3 W n Q K v E m 5 N K r f B W z j / n 0 n q v 9 N P t x y Q V V B r C s d Y d z 0 2 M n 2 F l G O F 0 X O y m m i a Y D H F E O 5 Z K L K j 2 s + m 9 Y 3 R i l T 4 K Y 2 V L G j R V / 0 5 k W G g 9 E o H t F N g M 9 L I 3 E f / z O q k J r / 2 M y S Q 1 V J L Z o j D l y M R o 8 j z q M 0 W J 4 S N L M F H M 3 o r I A C t M j I 1 o Y U s g x k U b i r c c w S p p n l e 9 y + r F n U 3 n F G b I Q x m O 4 Q w 8 u I I a 3 E I d G k C A w x O 8 w K v z 6 L w 7 H 8 7 n r D X n z G e O Y A H O 1 y 9 L I J i Y < / l a t e x i t > g Approximate < l a t e x i t s h a 1 _ b a s e 6 4 = " F V Y n F V N g l 4 U W 8 3 O 3 U Y L Y u W N r I S c = " > A A A C U X i c b V D L b h M x F L 0 Z o K R p K Q G W 3 V h U V E W K o p k i 0 S 4 r s W F Z J N J G y g y j a 8 e Z m N o z I / t O 1 c i a H 2 H B p / A T r N g i t r B n h 5 N 0 Q R 9 H s n R 0 z n 3 5 8 F o r R 3 H 8 o x M 9 e P h o 4 3 F 3 s 7 e 1 / W T n a f / Z 8 z N X N V b I k a h 0 Z c c c n d S q l C N S p O W 4 t h I N 1 / K c X 7 x b + u e X 0 j p V l R 9 p U c v M Y F G q m R J I Q c r 7 Y 5 + u h k x s w T O f D O J h P I j b I k 8 L N A Y P U m 7 8 v M 1 T k l f k R U O y / e Q p / 9 w O W M r R + r V 9 j / + 6 z f t 7 Y d Y K 7 C 5 J r s n e y e b X 3 e 6 X q D n N + z / T a S U a I 0 s S G p 2 b J H F N m U d L S m j Z 9 t L G y R r F B R Z y E m i J R r r M r 4 5 v 2 a u g T N m s s u G V x F b q / x 0 e j X M L w 0 O l Q Z q 7 2 9 5 S v M + b N D Q 7 z r w q 6 / C 3 U q w X z R r N q G L L O N l U W S l I L w J B Y V W 4 l Y k 5 W h Q U Q r + x h Z u 2 F 0 J J b k d w l 5 w d D p O 3 w z c f Q j r 7 s E Y X d u E l H E A C R 3 A C 7 + E U R i D g G / y C 3 / C n 8 7 3 z N 4 I o W p d G n e u e F 3 A D 0 d Y / T Y 6 4 R Q = = < / l a t e x i t > g (h tj cute , htj cute ) < l a t e x i t s h a 1 _ b a s e 6 4 = " X s D y f k u a Y S 9 f B b M L D 4 j y 6 B g r v C I = " > A A A C L n i c b V D L S g M x F M 3 4 t r 6 q L t 0 M i u J C y k y L j 2 X B j U s F a w u d c c i k m R p N Z o b k j l h C v s S N n + A v u H G h e 8 G F C K 7 8 C j F t X f g 6 c O F w z r 2 5 N y f O O V P g e c / O y O j Y + M T k 1 H R p Z n Z u f q G 8 u H S i s k I S 2 i A Z z 2 Q r x o p y l t I G M O C 0 l U u K R c x p M 7 7 Y 7 / v N S y o V y 9 J j 6 O U 0 F L i b s o Q R D F a K y t s 6 G D z S l t 0 4 1 P 6 W V 6 n Z 8 n d M E A s d i M J E G q J z c 6 o D o F e g S Q H U G B O V 1 7 y K N 4 D 7 l / h f Z K 2 + f G 8 + q j f J Y V R + C z o Z K Q R N g X C s V N v 3 c g g 1 l s A I p 6 Y U F I r m m F z g L m 1 b m m J B V a g H l x l 3 3 S o d N 8 m k r R T c g f p 9 Q m O h V E / E t l N g O F O / v b 7 4 n 9 c u I N k L N U t z + 6 u U D B c l B X c h c / t Z u R 0 m K Q H e s w Q T y e y t L j n D E h O w i f 7 Y E g t T s q H 4 v y P 4 S 0 6 q N t 1 K 7 c i m s 4 G G m E I r a B V t I h / t o j o 6 Q I e o g Q i 6 R n f o A T 0 6 t 8 6 T 8 + K 8 D l t H n K + Z Z f Q D z v s n u K W t a Q = = < J u M V f W h y p 1 E c 1 Y = " > A A A C F X i c b Z C 7 T s M w F I a d c i v l F m C E w a J C K g x V A g h Y k C q x M D A U i V 6 k J o 0 c 1 2 m t 2 k l k O 0 h V l I W X 4 B V Y Y W d D E i r M k J Q d 2 4 q V m y K h K G Y k K z m J J D H C Q 9 Q n H Y 0 h 4 k S 6 6 e Q X G T z U T g 8 G k d A v V H D i / p 5 I E Z d y x H 3 d y Z E a y N n a 2 P y 3 5 v O Z z S q 4 d F M a x o k i I Z 4 u D h I G V Q T H E c E e F Q Q r N t K A s K D 6 d o g H S C C s d J A l H Y o 9 G 8 E 8 N E + q 9 n n 1 9 O 6 s X D v O 4 y m C P X A A K s A G F 6 A G b k A d N A A G j + A Z v I B X 4 8 l 4 M 9 6 N j 2 l r w c h n d s E f G Z 8 / k M e f Q Q = = < / l a t e x i t > {h (i) o } L U = " > A A A C E 3 i c b V D L S s N A F J 3 U V 6 2 v q D v d B I t S N y V R U Z c F N y 4 r 2 A c 0 M U y m 0 3 b o T B J m b s Q S A v 6 E v + B W 9 + 7 E r R / g 1 i 9 x 2 m a h r Q c u H M 6 5 l 3 v v C W L O F N j 2 l 1 F Y W F x a X i m u l t b W N z a 3 z O 2 d p o o S S W i D R D y S 7 Q A r y l l I G 8 C A 0 3 Y s K R Y B p 6 1 g e D X 2 W / d U K h a F t z C K q S d w P 2 Q 9 R j B o y T f 3 3 E C k g 8 x P X a A P k J I E a J b d p R X 7 O P P N s l 2 1 J 7 D m i Z O T M s p R 9 8 1 v t x u R R N A Q C M d K d R w 7 B i / F E h j h N C u 5 i a I x J k P c p x 1 N Q y y o 8 t L J D 5 l 1 q J W u 1 Y u k r h C s i f p 7 I s V C q Z E I d K f A M F C z 3 l j 8 1 w v E z G b o X X o M 2 K v E l Z n D m Q K 6 O H R a M = " > A A A C E X i c b V D L S s N A F J 3 U V 6 2 v q B v B T b A o d V M S F X V Z c O O y g n 1 A E 8 N k O m 2 H z i R h 5 k Y s I f 6 E v + B W 9 + 7 E r V / g 1 i 9 x 2 m a h r Q c u H M 6 5 l 3 v v C W L O F N j 2 l 1 F Y W F x a X i m u l t b W N z a 3 z O 2 d p o o S S W i D R D y S 7 Q A r y l l I G 8 C A 0 3 Y s K R Y B p 6 1 g e D X 2 W / d U K h a F t z C K q S d w P 2 Q 9 R j B o y T f 3 3 E C k g 8 x P X a A P k D K V Z X d p x T 7 O f L N s V + 0 J r H n i 5 K S M c t R 9 8 9 v t R i Q R N A T C s V I d x 4 7 B S 7 E E R j j N S m 6 i a I z J E P d p R 9 M Q C 6 q 8 d P J B Z h 1 q p W v 1 I q k r B G u i / p 5 I s V B q J A L d K T A M 1 K w 3 F v / 1 A j G z G X q X X s r C O A E a k u n i X s I t i K x x P F a X S U q A j z T B R D J 9 u 0 U G W G I C O s S S D s W Z j W C e N E + q z n n 1 9 O a s X D v K 4 y m i f X S A K s h B F 6 i G r l E d N R B B j + g Z v a B X S K h V J D E e h O g a G v p r 2 R + K 8 X i K n N 0 L 3 w U h b G + t W Q T B Z 3 E 2 5 B Z I 0 C s j p M U g J 8 q A k m k u n b L d L H E h P Q M R Z 1 K M 5 0 B L O k c V x x z i o n N 6 e l 6 k E e T w H t o X 1 U R g 4 6 R 1 V 0 h W q o j g h 6 R M / o = " > A A A C E H i c b V A 9 S w N B E N 2 L X z F + R S 1 t F o M o F u F O R c V K s L G w U E h U y B 1 h b z O n S / b 2 j t 0 5 M R z 5 C T b + F R s L R W w t 7 f w 3 7 s U U m v h g 4 P H e D D P z w l Q K g 6 7 7 5 Z Q m J q e m Z 8 q z l b n 5 h c W l 6 v L K p U k y z a H J E 5 n o 6 5 A Z k E J B E w V K u E 4 1 s D i U c B V 2 T w r / 6 g 6 0 E Y l q Y C + F I G Y 3 S k S C M 7 R S u 7 r p I 9 x j G O U N 0 L F Q T F K e G D z y a Z / 6 M c N b z m R + 1 m 8 3 2 t W a W 3 c H o O P E G 5 I a G e K 8 X f 3 0 O w n P Y l D I J T O m 5 b k p B j n T K L i E f s X P D K S M d 9 k N t C x V L A Y T 5 I O H + n T D K h 0 a J d q W Q j p Q f 0 / k L D a m F 4 e 2 s z j S j H q F + J / X y j A 6 D H K h 0 g x B 8 Z 9 F U S Y p J r R I h 3 a E B o 6 y Z w n j W t h b K b 9 l m n G 0 G V Z s C N 7 o y + P k c q f u 7 d d 3 L / Z q x 9 v D O M p k j a y T L e K R A 3 J M T s k 5 a R J O H s g T e S G v z q P z 7 L w 5 7 z + t J W c 4 s 0 r + w P n 4 B v p 7 n R o = < / l a t e x i t > Terminal cost: LT < l a t e x i t s h a 1 _ b a s e 6 4 = " j m L M M f f l K q M w p 2 R b s 1 3 P x r a 4 g  x Q = " > A A A C D 3 i c b V A 9 S w N B E N 3 z M 8 a v q K X N Y l D E I t y p q F g F b C w s Y j A f k D v C 3 m Y v W b K 3 d + z O i

3.. for PETs

< l a t e x i t s h a 1 _ b a s e 6 4 = " E o U c n w B 3 + y 4 5 H 0 / + T a q W p z n F g 9 M = " > A A A C D 3 i c b V A 7 T 8 M w G H T K q 5 R X g J E l o g J 1 q K q k r S h s F Q w w M J S K P q Q 0 q h z X a a 0 6 2) fitting the SDE directly. These two methods act as a trade-off between efficiency and effectiveness: the first method incurs only negligible computational cost and has satisfactory results, while the second one is slower, but yield better regularizers. D 9 k O U h X l H 7 D w V 1 g Y Q I i V l Y Q S G m o n k a D 9 B K z f X 1 p R d V a o 2 z U j f K F E Q 9 c K C Y I 0 u g 2 H r b j o V r U K 3 o K b Z k Y G S m C D K 2 h + j U Y + S h 0 s S c Q h Z y b h h 4 I K 4 J M E E R x X B i E H A c Q T e E Y m 5 J 6 0 M X c i t I o s X Y i l Z H m + E w e T 2 i p + n s j g i 7 n M 9 e W k 0 l K v u g l 4 n + e G Q r n 3 I q I F 4 Q C e 2 j + k B N S T f h a U o 4 2 I g w j Q W e S Q M S I z K q h C W Q Q C V l h Q Z Z g L H We conduct experiments on different PLMs of different sizes, and the experimental results on GLUE (Wang et al., 2019) under both full-set and few-shot settings demonstrate the effectiveness of our proposal across four different PETs. Further analyses show that the learned regularizer helps pull apart the hidden states of different label words. We also observe that when we project the intermediate hidden states of PETs without our regularizer into our latent space, the better the PETs perform, the closer the latent states are to our latent bridges. This spontaneous approaching behavior may indicate that stochastic-bridge-like latent dynamics naturally exists in well-trained PETs. In summary, our work has the following contributions: (1) Guided by the perspective of optimal control for PETs, we design latent stochastic bridge regularizers on the intermediate states during the training of PETs. (2) We propose two methods to construct the latent space according to the two representations of stochastic bridges, offering a trade-off between efficiency and effectiveness. (3) Our regularizers are shown to be effective and general across different PLMs, different PETs, and different tasks. (4) We show that well-trained PETs without any regularization spontaneously exhibit stochastic-bridge-like latent dynamics.

2. BACKGROUND

2.1 DEFINITION AND MATHEMATICAL NOTATIONS Consider using a L-layer PLM with the vocabulary V to handle a text-to-text task D. For each sample (x, y) ∈ D, y ∈ V is the output token and x ∈ V N is the input token sequence 1 , where N is the length of x. With x as the input, each layer of the PLM will output a sequence of hidden states, and we denote the hidden states of the i-th PLM layer as h (i) = {h 

2.2. OPTIMAL CONTROL PERSPECTIVE OF PETS

Conventionally, adapting the PLM to D requires full-parameter fine-tuning, which is given as: min ∆θ Ex,y∼D L h (L) o , y + R ∆θ , h (i) = h (i-1) + G (i) θ+∆θ h (i-1) , i = 1, . . . , L, Embed(x), i = 0, (1)



Here we assume y ∈ V since a sample where y ∈ V M can be decomposed to M samples, The i-th sample is ([x; y<i], yi) for auto-regressive language modeling or ([x; y-i], yi) for auto-encoding language modeling.



t e x i t s h a 1 _ b a s e 6 4 = " Q n

r Z T P w A D H E P y 9 L U z 7 b g g l e R j B Y 3 K 8 M 0 w + D 3 e P Q j o D s s Q 6 2 S T v y Y A k 5 A v Z J w f k k I w I J 7 / J F f l L r j s X n d t o J V p d l k a d u 5 5 3 5 B 6 i j X 9 Q f r p 9 < / l a t e x i t > g (h ti is , hti is ) < l a t e x i t s h a 1 _ b a s e 6 4 = " G u M 7 p 7 r 9 O D I s q y B + I J t S z M E g l p M = " > A A A C K 3 i c b V C 7 S g N B F J 3 1 b X y t W t o s i m A h Y V f x U Q o 2 l h F M I m T X Z X Y y S Q Z n d p e Z u 2 I Y 5 j f E n 7 C 0 t d X e S r E S B G u / w M m j U O O B C 4 d z 7 o u T 5 J w p 8 P 0 X Z 2 x 8 Y n J q e m a 2 N D e / s L j k L q / U V F Z I Q q s k 4 5 k 8 T 7 C i n K W 0 C g w 4 P c 8 l x S L h t J 5 c H

t e x i t s h a 1 _ b a s e 6 4 = " k b H c j a r E W t l 4 m

r M y s P A l u m w F a f s n S p / + c o 3 P 8 + z G j U l n W l 1 F Y W F x a X i m u l t b W N z a 3 z O 2 d p o w S g U k D R y w S b R 9 J w m h I G o o q R t q x I I j 7 j L T 8 4 f W 4 3 n o g Q t I o v F e j m L g c 9 U M a U I y U t j x z 3 0 k d n 6 e D z E u j r J t W 6 F H m a K Z X V t a 9 9 c y y V b U m g v N g 5 1 A G u e q e + e 3 0 I p x w

t e x i t s h a 1 _ b a s e 6 4 = " c S T r B S P 7 o 5 u H y R C c t u B / S b Y B 6 3

p C 2 P 9 a k i m i 3 s J t y C y x g F Z X S Y p A T 7 S B B P J 9 O 0 W G W C J C e g Y S z o U Z z a C e d I 8 q T r n 1 d O b s 3 L t K I + n i P b R A a o g B 1 2 g G r p G d d R A B D 2 i Z / S C X o 0 n 4 8 1 4 N z 6 m r Q U j n 9 l F f 2 B 8 / g C V g J 7 E < / l a t e x i t > h (0) cute < l a t e x i t s h a 1 _ b a s e 6 4 = " 3 M B c a 5 7 q 5

4 8 l 4 M 9 6 N j 2 l r w c h n d t E f G J 8 / 5 f K d 2 w = = < / l a t e x i t > t e x i t s h a 1 _ b a s e 6 4 = " t H R A z E W v i 3 Y 6 M N k u 7 3 0F O S u z N 9 U = " > A A A C E X i c b V D L S s N A F J 3 4 r P U V d S O 4 C R a l b k q i o i 4 L b l y 4 q G A f 0 M Q w m U 7 b o T N J m L k R S 4 g / 4 S + 4 1 b 0 7 c e s X u P V L n L Z Z a O u B C 4 d z 7 u X e e 4 K Y M w W 2 / W X M z S 8 s L i 0 X V o q r a + s b m + b W d k N F i S S 0 T i I e y V a A F e U s p H V g w G k r l h S L g N N m M L g c + c 1 7 K h W L w l s Y x t Q T u B e y L i M Y t O S b u 2 4 g 0 n 7 m p y 7 Q B 0 i Z y r K 7 t H x 9 l P l m y a 7 Y Y 1 i z x M l J C e W o + e a 3 2 4 l I I m g I h G O l 2 o 4 d g 5 d i C Y x w m h X d R N E Y k w H u 0 b a m I R Z U e e n 4 g 8 w 6 0 E r H 6 k Z S V w j W W P 0 9 k W K h 1 F A E u l N g 6 K t p b y T + 6 w V i a j N 0 L 7 y U h X E C N C S T x d 2 E W x B Z o 3 i s D p O U A B 9 q g o l k + n a L 9 L H E B H S I R R 2 K M x 3 B L G k c V 5 y z y s n N a a l 6 m M d T Q H t o H 5 W R g 8 5 R F V 2 h G q o jg h 7 R M 3 p B r 8 a T 8 W a 8 G x + T 1 j k j n 9 l B f 2 B 8 / g A S a Z 3 3 < / l a t e x i t > h (L) is < l a t e x i t s h a 1 _ b a s e 6 4 = " + d j E B 7 J1 P K H c u + U 8 B w t H o / C T 0 Z E = " > A A A C E 3 i c b V D L S s N A F J 3 4 r P U V d a e b Y B X q p i Q q 6 r L g x o W L C v Y B T Q y T 6 b Q d O p O E m Ru x h I A / 4 S + 4 1 b 0 7 c e s H u P V L n L Z Z a O u B C 4 d z 7 u X e e 4 K Y M w W 2 / W X M z S 8 s L i 0 X V o q r a + s b m + b W d k N F i S S 0 T i I e y V a A F e U s p H V g w G k r l h S L g N N m M L g c + c 1 7 K h W L w l s Y x t Q T u B e y L i M Y t O S b u 2 4 g 0 n 7 m p y 7 Q B 0 h J A j T L 7 t L y 9 V H m m y W 7 Y o 9 h z R I n J y W U o + a b 3 2 4 n I o m g I R C O l W o 7 d g x e i i U w w m l W d B N F Y 0 w G u E f b m o Z Y U O W l 4 x 8 y 6 1 A r H a s b S V 0 h W G P 1 9 0

B b 0 a T 8 a b 8 W 5 8 T F r n j H x m B / 2 B 8 f k D w U 6 e 3 g = = < / l a t e x i t > h t e x i t s h a 1 _ b a s e 6 4 = " d E r 9 y t J u g o 1 B A a 4 F 7 c x 1 Y p m f O k 0

e H I P 7 D x r 9 h Y K G J r a + e / c S 9 J o Y k P B h 7 v z T A z z 4 8 F 1 2 D b 3 9 bc / M L i 0 n J u J b + 6 t r 6 x W d j a r u s o U Z T V a C Q i 1 f S J Z o J L V g M O g j V j x U j o C 9 b w + 1 e Z 3 7 h n S v N I 3 s E g Z l 5 I u p I H n B I w U r t w 4 A J 7 A D 9 I q 4 m U X H Y x j T R c u n i I 3 Z B A j x K R 3 g z b 1 X a h a J f s E f A s c S a k i C a o t A t f b i e i S c g k U E G 0 b j l 2 D F 5 K F H A q 2 D D v J p r F h P Z J l 7 U M l S R k 2 k t H / w z x v l E 6 O I i U K Q l 4 p P 6 e S E m o 9 S D 0 T W d 2 p J 7 2 M v E / r 5 V A c O G l X M Y J M E n H i 4 J E Y I h w F g 7 u c M U o i I E h h C p u b s W 0 R x S h Y C L M m x C c 6 Z d n S f2 4 5 J y V T m 5 P i + W j S R w 5 t I v 2 0 C F y 0 D k q o 2 t U Q T V E 0 S N 6 R q / o z X q y X q x 3 6 2 P c O m d N Z n b Q H 1 i f P z o Q n L M = < / l a t e x i t >

1 / g 5 N m g J a T L J 3 u v s 8 + n x 1 Q w o W u f y u 5 l d W 1 9 Y 3 8 Z m F r e 2 d 3 T 9 0 / 6 H I / Z A h 3 k E 9 9 1 r c h x 5 R 4 u C O I o L g f M A x d m + K e P b 1 K / N 4 D Z p z 4 3 r 2 Y B d h y 4 d g j D k F

Figure 1: An overview of our proposed latent stochastic bridge regularizer. different target-specific diffusion bridges. The obtained mapping can then be plugged to the model to regularize the hidden states when training PETs parameters. Besides, since a diffusion bridge is (1) a Markov process and (2) a solution to a stochastic differential equation (SDE), we correspondingly propose two methods to learn the mapping: (1) fitting the Markov transition probability density function (PDF) and (2) fitting the SDE directly. These two methods act as a trade-off between efficiency and effectiveness: the first method incurs only negligible computational cost and has satisfactory results, while the second one is slower, but yield better regularizers.

d is the state at the position j of the i-th layer, and d is the model dimension. We denote the position where the model outputs the target y as o, i.e., the model should predict y with the hidden states h (L) o .

