Flareon -STEALTHY any2any BACKDOOR INJECTION VIA POISONED AUGMENTATION

Abstract

Open software supply chain attacks, once successful, can exact heavy costs in mission-critical applications. As open-source ecosystems for deep learning flourish and become increasingly universal, they present attackers previously unexplored avenues to code-inject malicious backdoors in deep neural network models. This paper proposes Flareon, a small, stealthy, seemingly harmless code modification that specifically targets the data augmentation pipeline with motionbased triggers. Flareon neither alters ground-truth labels, nor modifies the training loss objective, nor does it assume prior knowledge of the victim model architecture, training data, and training hyperparameters. Yet, it has a surprisingly large ramification on training -models trained under Flareon learn powerful targetconditional (or "any2any") backdoors. The resulting models can exhibit high attack success rates for any target choices and better clean accuracies than backdoor attacks that not only seize greater control, but also assume more restrictive attack capabilities. We also demonstrate the effectiveness of Flareon against recent defenses. Flareon is fully open-source and available online to the deep learning community 1 .

1. INTRODUCTION

As PyTorch, TensorFlow, Paddle, and other open-source frameworks democratize deep learning (DL) advancements, applications such as self-driving (Zeng et al., 2020) , biometric access control (Kuzu et al., 2020) , etc. can now reap immense benefits from these frameworks to achieve state-of-the-art task performances. This however presents novel vectors for opportunistic supply chain attacks to insert malicious code (with feature proposals, stolen credentials, name-squatting, or dependency confusionfoot_1 ) that masquerade their true intentions with useful features (Vu et al., 2020) . Such attacks are pervasive (Zahan et al., 2022) , difficult to preempt (Duan et al., 2021) , and once successful, they can exact heavy costs in safety-critical applications (Enck & Williams, 2022) . Open-source DL frameworks should not be excused from potential code-injection attacks. Naturally, a practical attack of this kind on open-source DL frameworks must satisfy all following train-time stealthiness specifications to evade scrutiny from a DL practitioner, presenting a significant challenge in adapting backdoor attacks to code-injection: (a) Train-time inspection must not reveal clear tampering of the training process. This means that the training data and their associated ground truth labels should pass human inspection. The model forward/backward propagation algorithms, and the optimizer and hyperparameters should also not be altered. (b) Compute and memory overhead need to be minimized. Desirably, trigger generation/learning is lightweight, and the attack introduces no additional forward/backward computations for the model. (c) Adverse impact on clean accuracy should be reduced, i.e., learned models must behave accurately for natural test inputs. (d) Finally, the attack ought to demonstrate robustness w.r.t. training environments. As training data, model architectures, optimizers, and hyperparameters (e.g., batch size, learning rate, etc.) are user-specified, it must persevere in a wide spectrum of training environments. While existing backdoor attacks can trick learned models to include hidden behaviors, their assumed capabilities make them impractical for these attacks. First, data poisoning attacks (Chen et al., 2017; Ning et al., 2021) target the data collection process by altering the training data (and labels), which may not be feasible without additional computations after training data have been gathered. Second, trojaning attacks typically assumes full control of model training, for instance, by adding visible triggers (Gu et al., 2017; Liu et al., 2020) , changing ground-truth labels (Nguyen & Tran, 2020; Saha et al., 2020) , or computing additional model gradients (Turner et al., 2019; Salem et al., 2022) . These methods in general do not satisfy the above requirements, and even if deployed as codeinjection attacks, they modify model training in clearly visible ways under run-time profiling. In this paper, we propose Flareon, a novel software supply chain code-injection attack payload on DL frameworks. Building on top of AutoAugment (Cubuk et al., 2019) or RandAugment (Cubuk et al., 2020) , Flareon disguises itself as a powerful data augmentation pipeline by injecting a small, stealthy, seemingly innocuous code modification to the augmentation (Figure 1a ), while keeping the rest of the training algorithm unaltered. This has a surprisingly large ramification on the trained models. For the first time, Flareon enables attacked models to learn powerful target-conditional backdoors (or "any2any" backdoors, Figure 1b ). Namely, when injecting a human-imperceptible motion-based trigger τ t of any target t ∈ C to any natural image x of label c ∈ C at test-time, the trained model would classify the resulting image x as the intended target t with high success rates. Here, C represent the set of all classification labels. Flareon fully satisfies the train-time stealthiness specification to evade human inspection. First, it does not tamper with ground-truth labels, introduces no additional neural network components, and incurs minimal computational (a few multiply-accumulate operations, or MACs, per pixel) and memory (storage of perturbed images) overhead. Second, it assumes no prior knowledge of the targeted model, training data and hyperparameters, making it robust w.r.t. diverse training environments. Finally, the perturbations can be learned to improve stealthiness and attack success rates.  K o N E 0 R i T C R 7 R v s Y Q C 6 r c N D 9 h Z p 1 q Z 2 g F k d Q v B C t 3 f 0 + k W C g 1 F f 6 5 L 3 R z t q d a L G f m f 7 V + A s G V m 7 I w T o C G Z P 5 X k H A L I i u L x B o y S Q n w q Q Z M J N P r W m S M J S a g g 6 v o H J z F q 5 e h U 6 8 5 F 7 X G X a P a r B e J l N E x O k F n y E G X q I l u U Q u 1 E U G P 6 B m 9 o j f j y X g x 3 o 2 P e W v J K G Y O 0 R 8 Z n z 9 s s p k Z < / l a t e x i t > D test sample < l a t e x i t s h a 1 _ b a s e 6 4 = " j m B D t 8 / 9 We highlight added code lines. To improve the effectiveness of Flareon, "pert grid" (i.e., τ in this paper) can be a trainable parameter tensor for learned triggers. (b) Flareon enables backdoored models f θ ⋆ to learn "any2any" backdoors. Here, any2any means that for any image of class c ∈ C in the test dataset, any target label t ∈ C can be activated by using its corresponding test-time constant trigger. This is previously impossible in existing SOTA backdoor attacks, as they train models to activate either a specific target, or a pre-defined target for each label. X Q M u W h F 5 n V z j 8 O J 8 V f 4 = " > A A A C F 3 i c b V A 9 S w N B E N 3 z 2 / g V t b R Z D I K g h D s R t R R s b A Q F o 0 I u h r m 9 O b O 4 e 3 f s z o n h y I + w 9 J d Y q o 3 Y W l j 4 b 9 z E F H 4 9 G H i 8 N 7 M 7 8 6 J c S U u + / + G N j I 6 N T 0 x O T V d m Z u f m F 6 q L S 2 c 2 K 4 z A h s h U Z i 4 i s K h k i g 2 S p P A i N w g 6 U n g e X R / 0 / f M b N F Z m 6 S l 1 c 2 x p u E p l I g W Q k 9 r V j Z D w l o w u T w 2 4 J 2 J + l M W o e I 8 n 7 T K M d E g d J L g M L Y H p t a s 1 v + 4 P w P + S Y E h q b I j j d v U 9 j D N R a E x J K L C 2 G f g 5 t U o w J I X C X i U s L O Y g r u E K m 4 6 m o N G 2 y s F R P b 7 m l J g n m X G V E h + o 3 y d K 0 N Z 2 d b Q Z a d e s g T r 2 t 9 0 X / / O a B S V 7 r V K m e U G Y i q + / k k J x y n g / J B 5 L g 4 J U 1 x E Q R r p 1 u e i A A U E u y o r L I f h 9 9 V 9 y t l U P d u r b J 9 u 1 / a 1 h I l N s h a 2 y d R J L w o K u G T R 7 D P i J X w l s K 5 0 2 v g = " > A A A B 9 H i c b V D L S g M x F L 3 j s 9 Z X 1 a W b Y B F c S J k p R V 0 W 3 L i s Y B / Y D i W T Z t r Q J D M k m W I Z + h c u 1 Y 2 4 9 W 9 c + D d m 2 l l o 6 4 H A 4 Z x 7 u S c n i D n T x n W / n b X 1 j c 2 t 7 c J O c X d v / + C w d H T c 0 l G i C G 2 S i E e q E 2 B N O Z O 0 a Z j h t B M r i k X A a T s Y 3 2 Z + e 0 K V Z p F 8 M N O Y + g I P J Q s Z w c Z K j z 2 B z S g I 0 6 d Z v 1 R 2 K + 4 c a J V 4 O S l D j k a / 9 N U b R C Q R V B r C s d Z d z 4 2 N n 2 J l G O F 0 V u w l m s a Y j P G Q d i 2 V W F D t p / P E M 3 R u l Q E K I 2 W f N G i u / t 5 I s d B 6 K o L L Q N j h L K R e t j P x P 6 + b m P D G T 5 m M E 0 M l W d w K E 4 5 M h L I G 0 I A p S g y f W o K J Y j Y u I i O s M D G 2 p 6 L t w V v + 9 S p p V S v e V a V 2 X y v X q 3 k j B T i F M 7 g A D 6 6 h D n f Q g C Y Q k P A M r / D m T J w X W p J L R F Y h 7 L b o A V 5 U z Q l m a a 0 2 4 i K Y 4 C T j v B + D b 3 O x M q F Y v F g 5 4 m 1 I / w U L C Q E a y N 1 L d t b 4 R 1 5 k V Y j 4 I w e 5 z N + n b F q T p z o F X i F q Q C B Z p 9 + 8 s b x C S N q N C E Y 6 V 6 r p N o P 8 N S M 8 L p r O y l i i a Y j P G Q 9 g w V O K L K z + b J Z + j c K A M U x t I 8 o d F c / b 2 R 4 U i p a R R c B p E Z z l O q Z T s X / / N 6 q Q 5 v / I y J J N V U k M W t M O V I x y h v A g 2 Y p E T z q S G Y S G b i I j L C E h N t + i q b H t z l X 6 + S d q 3 q X l X r 9 / V K o 1 Y 0 U o J T O I M L c O E a G n A H T W g B g Q k 8 w y u 8 W U / W i / V u f S x P V Y 8 O K x g v 2 A J p T N d t M u 3 d 2 E 3 Y l Q Q v + E R / U i X v 0 7 H v w 3 b t o c t P X B w O O 9 G W b m h Y n g B l z 3 2 y l t b G 5 t 7 5 R 3 K 3 v 7 B 4 d H 1 e O T r o l T T V m H x i L W / Z A Y J r h i H e A g W D / R j M h Q s F 4 4 v c v 9 3 h P T h s f q E W Y J C y Q Z K x 5 x S s B K f T + U P p C 0 M q z W 3 L q 7 A F 4 n X k F q q E B 7 W P 3 y R z F N J V N A B T F m 4 L k J B B n R w K l g 8 4 q f G p Y Q O i V j N r B U E c l M k C 3 u n e M L q 4 x w F G t b C v B C / T 2 R E W n M T I Z X P V Y 8 O K x g v 2 A J p T N d t M u 3 d 2 E 3 Y l Q Q v + E R / U i X v 0 7 H v w 3 b t o c t P X B w O O 9 G W b m h Y n g B l z 3 2 y l t b G 5 t 7 5 R 3 K 3 v 7 B 4 d H 1 e O T r o l T T V m H x i L W / Z A Y J r h i H e A g W D / R j M h Q s F 4 4 v c v 9 3 h P T h s f q E W Y J C y Q Z K x 5 x S s B K f T + U P p C 0 M q z W 3 L q 7 A F 4 n X k F q q E B 7 W P 3 y R z F N J V N A B T F m 4 L k J B B n R w K l g 8 4 q f G p Y Q O i V j N r B U E c l M k C 3 u n e M L q 4 x w F G t b C v B C / T 2 R E W n M T I Z X To summarize, this paper makes the following contributions: Backdoor attacks embed hidden backdoors in the trained DNN model, such that its behavior can be steered maliciously by an attacker-specified trigger (Li et al., 2022) . Formally, they learn a backdoored model with parameters θ, by jointly maximizing the following clean accuracy (CA) on natural images and attack success rate (ASR) objectives: E (x,y)∼D 1[arg max f θ (T (x, π(y))) = π(y)], E (x,y)∼D 1[arg max f θ (x) = y]. Here, D is the data sampling distribution that draws an input image x and its label y, the indicator function 1[z] evaluates to 1 if the term z is true, and 0 otherwise. Finally, π(y) specifies how we reassign a target classification for a given label y, and T (x, t) transforms x to trigger the hidden backdoor to maliciously alter model output to t, and this process generally preserves the semantic information in x. In general, current attacks specify either a constant target π(y) ≜ t (Gu et al., 2017; Liu et al., 2017) , or a one-to-one target mapping π(y) ≜ (y + 1) mod |C| as in (Nguyen & Tran, 2020; Doan et al., 2021) . Some even restricts itself to a single source label s (Saha et al., 2020) , i.e., π(y) ≜ (y if y ̸ = s else t). Flareon liberates existing assumptions on the target mapping function, and can even attain high ASRs for any π : C → C while maintaining CAs. Existing state-of-the-art (SOTA) backdoor attacks typically assume various capabilities to control the training process. Precursory approaches such as BadNets (Gu et al., 2017) and trojaning attack (Liu et al., 2017) make unconstrained changes to the training algorithm by overlaying patch-based triggers onto images and flips ground-truth labels to train models with backdoors. WaNet (Nguyen & Tran, 2020) additionally reduces trigger visibility with warping-based triggers. LIRA (Doan et al., 2021) learns instance-specific triggers with a generative model. Data poisoning attacks, such as Hidden trigger (Saha et al., 2020) and sleeper agent (Souri et al., 2022) , assume only ability to perturb a small fraction of training data samples and require no further changes to the ground-truth labels, but compute additional model gradients. Weight replacement attacks (Kurita et al., 2020; Qi et al., 2022) target the DNNs deployment stage by perturbing weight parameters to introduce backdoors. It is noteworthy that none of the above backdoor attack approaches can be feasible candidates for open-source supply chain attacks, as they either change the ground-truth label along with the image (Gu et al., 2017; Liu et al., 2017; Nguyen & Tran, 2020; Doan et al., 2021) , or incur noticeable overheads (Doan et al., 2021; Saha et al., 2020; Kurita et al., 2020; Qi et al., 2022) . Similar to Flareon, blind backdoor attack (Bagdasaryan & Shmatikov, 2021) Defenses against backdoor attacks. Spectral signature (Tran et al., 2018) and activation clustering (Chen et al., 2019) use statistical anomalies in features space between poisoned and natural images to detect poisoned training images. Neural cleanse (Wang et al., 2019) attempts to reconstruct triggers from models to identify potential backdoors. Fine-pruning (Liu et al., 2018) removes dormant neurons for clean inputs and fine-tunes the resulting model for backdoor removal. STRIP (Gao et al., 2019) perturbs test-time inputs by super-imposing natural images from other classes, and determines the presence of backdoors based on the predicted entropy of perturbed images.

3. THE FLAREON METHOD

Figure 2 presents a high-level overview of Flareon. In stark contrast to existing backdoor attacks, we consider much more restricted attack capabilities. Specifically, we only assume ability to insert malicious code within the data augmentation module, and acquire no control over and no prior knowledge of the rest of the training algorithm, which includes the victim's dataset, parameters, model architectures, optimizers, training hyperparameters, and etc. Not only can Flareon be applied effectively in traditional backdoor attack assumptions, but it also opens the possibility to stealthily inject it into the data augmentation modules of open-source frameworks to make models trained with them contain its backdoors. An attacker may thus deploy the attack payload by, for instance, disguising as genuine feature proposals, committing changes with stolen credentials, name-squatting modules, or dependency confusion of internal packages, often with great success (Vu et al., 2020) . C×H×W , C, H, W respectively denote the number of channels, height, and width of the input image, and C is the set of possible labels. Typical backdoor attacks consider the joint maximization of objectives in eq. ( 1), and transform them into a unified objective: < l a t e x i t s h a 1 _ b a s e 6 4 = " U 9 o 4 3 Y Q y T a J l 7 C O v k J f g Z X d E W 4 I = " > A A A C A 3 i c b Z D L S s N A F I Y n X m u 9 R d 3 p Z r A I r k p S i r o s 6 M J l B X u B N o T J d N I O n Z m E m Y l Q Q s C N r + L G h S J u f Q l 3 v o 2 T N A t t / W H g 4 z / n M O f 8 Q c y o 0 o 7 z b a 2 s r q 1 v b F a 2 q t s 7 u 3 v 7 9 s F h V 0 W J x K S D I x b J f o A U Y V S Q j q a a k X 4 s C e I B I 7 1 g e p 3 X e w 9 E K h q J e z 2 L i c f R W N C Q Y q S N 5 d v H Q 4 7 0 B C O W 3 m R + w Z K n W i I q M t + u O X W n E F w G t 4 Q a K N X 2 7 a / h K M I J J 0 J j h p Q a u E 6 s v R R J T T E j W X W Y K B I j P E V j M j A o E C f K S 4 s b M n h m n B E M I 2 m e 0 L B w f 0 + k i C s 1 4 4 H p z L d U i 7 X c / K 8 2 S H R 4 5 a V U x I k m A s 8 / C h M G d Q T z Q O C I S o I 1 m x l A W F K z K 8 Q T J B H W J r a q C c F d P H k Z u o 2 6 e 1 F v 3 j V r r U Y Z R w W c g F N w D l x w C V r g F r R B B 2 D w C J 7 B K 3 i z n q w X 6 9 3 6 m L e u W O X M E f g j 6 / M H J m m Y d A = = < / l a t e x i t > D train RandAugment < l a t e x i t s h a 1 _ b a s e 6 4 = " h 6 G h W e e H x s K / B Q t x Q J R 3 r K g b B g A = " > A A A C C X i c b V B N S 8 N A E N 3 4 b f 2 q e h R h s Q g e p C S l q E f B i x e h g q 1 C E 8 J m O 2 k X d 5 O w O x F L q B e P / h K P 6 k W 8 + h s 8 + G / c 1 h 6 0 + m D g 8 d 4 M M / O i T A q D r v v p T E 3 P z M 7 N L y y W l p Z X V t f K 6 x s t k + a a Q 5 O n M t V X E T M g R Q J N F C j h K t P A V C T h M r o + G f q X N 6 C N S J M L 7 G c Q K N Z N R C w 4 Q y u F 5 W 0 f 4 R a 1 K s 7 S D s i 7 A Y 3 D w o + U j z 1 A N g j L F b f q j k D / E m 9 M K m S M R l j + 8 D s p z x U k y C U z p u 2 5 G Q Y F 0 y i 4 h E H J z w 1 k j F + z L r Q t T Z g C E x S j N w Z 0 1 y o d G q f a V o J 0 p P 6 c K J g y p q 8 i 2 6 k Y 9 s y k N x T 3 I / W f 3 c 4 x P g o K k W Q 5 Q s K / d 8 W 5 p J j S Y S y 0 I z R w l H 1 L G N f C n k t 5 j 2 n G 0 Y Z X s j l 4 k 1 / / J a 1 a 1 T u o 1 s / r l e P a O J E F s k V 2 y B 7 x y C E 5 J q e k Q Z q E k 3 v y S J 7 J i / P g P D m v z t t 3 6 5 Q z n t k k v + C 8 f w F q i 5 q 0 < / l a t e x i t > Model f ✓ < l a t e x i t s h a 1 _ b a s e 6 4 = " v v x u S w r 2 l R W P z c 0 Q m U G G e i W 0 m 2 M = " > A A A B 9 H i c b V D L S g M x F L 3 j s 9 Z X 1 a W b Y B F c S J k p R V 0 W 3 L i s Y B / Y D i W T Z t r Q J D M k m W I Z + h c u 1 Y 2 4 9 W 9 c + D d m 2 l l o 6 4 H A 4 Z x 7 u S c n i D n T x n W / n b X 1 j c 2 t 7 c J O c X d v / + C w d H T c 0 l G i C G 2 S i E e q E 2 B N O Z O 0 a Z j h t B M r i k X A a T s Y 3 2 Z + e 0 K V Z p F 8 M N O Y + g I P J Q s Z w c Z K j z 2 B z S g I 0 6 d Z v 1 R 2 K + 4 c a J V 4 O S l D j k a / 9 N U b R C Q R V B r C s d Z d z 4 2 N n 2 J l G O F 0 V u w l m s a Y j P G Q d i 2 V W F D t p / P E M 3 R u l Q E K I 2 W f N G i u / t 5 I s d B 6 K g I 7 m S X U y 1 4 m X g b i P 7 u b m P D G T 5 m M E 0 M l W d w K E 4 5 M h L I G 0 I A p S g y f W o K J Y j Y u I i O s M D G 2 p 6 L t w V v + 9 S p p V S v e V a V 2 X y v X q 3 k j B T i F M 7 g A D 6 6 h D n f Q g C Y Q k P A M r / D m T J w X 5 9 3 5 W I y u O f n O C f y B 8 / k D 8 V + S I g = = < / l a t e x i t > x < l a t e x i t s h a 1 _ b a s e 6 4 = " V d j p N F N 8 E Y V y z b U v X t T v j A u E A 6 8 = " > A A A B 6 3 i c b V B N S 8 N A E J 3 4 W e t X 1 a O X x S J 4 k J K U o h 4 L X j y 2 Y D + g D W W z n b R L d 5 O w u x F C 6 S / w q F 7 E q z / J g / / G b Z u D t j 4 Y e L w 3 w 8 y 8 I B F c G 9 f 9 d j Y 2 t 7 Z 3 d g t 7 x f 2 D w 6 P j 0 s l p W 8 e p Y t h i s Y h V N 6 A a B Y + w Z b g R 2 E 0 U U h k I 7 A S T + 7 n f e U K l e R w 9 m i x B X 9 J R x E P O q L F S M x u U y m 7 F X Y C s E y 8 n Z c j R G J S + + s O Y p R I j w w T V u u e 5 i f G n V B n O B M 6 K / V R j Q t m E j r B n a U Q l a n + 6 O H R G L q 0 y J G G s b E W G L N T f E 1 M q t c 5 k Y D s l N W O 9 6 s 3 F 6 0 D + Z / d S E 9 7 5 U x 4 l q c G I L X e F q S A m J v P H y Z A r Z E Z k l l C m u D 2 X s D F V l B k b T 9 H m 4 K 1 + v U 7 a 1 Y p 3 U 6 k 1 a + V 6 N U + k A O d w A V f g w S 3 U 4 Q E a 0 A I G C M / w C m + O d F 6 c d + d j 2 b r h 5 D N n 8 A f O 5 w / Q e Y 4 D < / l a t e x i t > y < l a t e x i t s h a 1 _ b a s e 6 4 = " + t q r I q x 8 z p R g m W r Y 6 3 z c G x p z l E U = " > A A A B / H i c b V D L S s N A F L 3 x W e s r 6 t L N Y B F c S E l K U Z c F N y 4 r 2 A c 0 o U y m k 3 b o Z B J m J s U S 6 p e 4 V D f i 1 j 9 x 4 d 8 4 a b P Q 1 g M D h 3 P u 5 Z 4 5 Q c K Z 0 o 7 z b a 2 t b 2 x u b Z d 2 y r t 7 + w e H 9 t F x W 8 W p J L R F Y h 7 L b o A V 5 U z Q l m a a 0 2 4 i K Y 4 C T j v B + D b 3 O x M q F Y v F g 5 4 m 1 I / w U L C Q E a y N 1 L d t b 4 R 1 5 k V Y j 4 I w e 5 z N + n b F q T p z o F X i F q Q C B Z p 9 + 8 s b x C S N q N C E Y 6 V 6 r p N o P 8 N S M 8 L p r O y l i i a Y j P G Q 9 g w V O K L K z + b J Z + j c K A M U x t I 8 o d F c / b 2 R 4 U i p a R S Y y T y i W v Z y 8 T K I / r N 7 q Q 5 v / I y J J N V U k M W t M O V I x y h v A g 2 Y p E T z q S G Y S G b i I j L C E h N t + i q b H t z l X 6 + S d q 3 q X l X r 9 / V K o 1 Y 0 U o J T O I M L c O E a G n A H T W g B g Q k 8 w y u 8 W U / W i / V u f S x G 1 6 x i 5 w T + w P r 8 A V J N l S A = < / l a t e x i t > x < l a t e x i t s h a 1 _ b a s e 6 4 = " a f H K E d c E n Z S 2 t M B 9 S w 8 M 0 J V 2 U T U = " > A A A C G 3 i c b V D L S s N A F J 3 U V 6 2 v q E s 3 g 0 W o U G p S i r o s u H H h o o J 9 Q F P L Z D p p B 2 e S M D M R S 8 h n u P R L X K o b c d u F f + M k z U J b D w y c O e d e 7 r 3 H D R m V y r K + j c L K 6 t r 6 R n G z t L W 9 s 7 t n 7 h 9 0 Z B A J T N o 4 Y I H o u U g S R n 3 S V l Q x 0 g s F Q d x l p O s + X K V + 9 5 E I S Q P / T k 1 D M u B o 7 F O P Y q S 0 N D T P H I 7 U B C M W 3 y T 3 c f Y R P J a Y J E n F m S A 1 l 1 w v f k q S K p y e D s 2 y V b M y w G V i 5 6 Q M c r S G 5 s w Z B T j i x F e Y I S n 7 t h W q Q Y y E o p i R p O R E k o Q I P 6 A x 6 W v q I 0 7 k I M 4 O S + C J V k b Q C 4 R + v o K Z + r s j R l z K K X d 1 Z b q m X P R S s e r y / + x + p L z L Q U z 9 M F L E x / N Z X s S g C m A a F B x R Q b B i U 0 0 Q F l S v C / E E C Y S V j r O k c 7 A X r 1 4 m n X r N P q 8 1 b h v l Z j 1 P p A i O w D G o A B t c g C a 4 B i 3 Q B h g 8 g 1 f w D j 6 M F + P N + D S + 5 q U F I + 8 5 B H 9 g z H 4 A s U W i V A = = < / l a t e x i t > L sce (x, y) Optimizer < l a t e x i t s h a 1 _ b a s e 6 4 = " T Q 0 z s V + w r E e Y j 9 8 3 9 + C a v X B T C c 8 = " > A A A B 8 3 i c b V B N S 8 N A E N 3 4 W e t X 1 a O X x S J 4 k J K U o h 4 L X j x W s B / S h L L Z T t q l u 0 n Y n Q i l 9 F d 4 V C / i 1 Z / j w X / j t s 1 B W x 8 M P N 6 b Y W Z e m E p h 0 H W / n b X 1 j c 2 t 7 c J O c X d v / + C w d H T c M k m m O T R 5 I h P d C Z k B K W J o o k A J n V Q D U 6 G E d j i 6 n f n t J 9 B G J P E D j l M I F B v E I h K c o Z U e / V D 5 O A R k v V L Z r b h z 0 F X i 5 a R M c j R 6 p S + / n / B M Q Y x c M m O 6 n p t i M G E a B Z c w L f q Z g Z T x E R t A 1 9 K Y K T D B Z H 7 w l J 5 b p U + j R N u K k c 7 V 3 x M T p o w Z q / A y V L Z Z M R y a Z X s m / u d 1 M 4 x u g o m I 0 w w h 5 o t d U S Y p J n Q W A O 0 L D R z l 2 B L G t b D n U j 5 k m n G 0 M R V t D t 7 y 1 6 u k V a 1 4 V 5 X a f a 1 c r + a J F M g p O S M X 1 R v F d A m u c n O o M n n 3 g / / J a h o p w = " > A A A B 8 n i c b V B N S 8 N A E N 3 U r 1 q / q h 6 9 L B b B g 5 S k F P V Y 8 O K x g v 2 A J p T N d t M u 3 d 2 E 3 Y l Q Q v + E R / U i X v 0 7 H v w 3 b t o c t P X B w O O 9 G W b m h Y n g B l z 3 2 y l t b G 5 t 7 5 R 3 K 3 v 7 B 4 d H 1 e O T r o l T T V m H x i L W / Z A Y J r h i H e A g W D / R j M h Q s F 4 4 v c v 9 3 h P T h s f q E W Y J C y Q Z K x 5 x S s B K f T + U P p C 0 M q z W 3 L q 7 A F 4 n X k F q q E B 7 W P 3 y R z F N J V N A B T F m 4 L k J B B n R w K l g 8 4 q f G p Y Q O i V j N r B U E c l M k C 3 u n e M L q 4 x w F G t b C v B C / T 2 R E W n M T I Z X o b T N k s D E r N q 5 + J 8 3 S C G 6 D T K u k h S Y o s t d U S o w x D j / H 4 + 4 Z h T E z B J C N b f n Y j o h m l C w K e U 5 e K t f r 5 N u o + 5 d 1 5 s P z V q r U S R S R m f o H F 0 i D 9 2 g F r p H b d R B F A n 0 j F 7 R m w P O i / P u f C x b S 0 4 x c 4 r + w P n 8 A X 0 J k K s = < / l a t e x i t > ⌧ < l a t e x i t s h a 1 _ b a s e 6 4 = " J N j / x 5 h p t x W F W 9 K K 1 u U J v L 1 T J g Y = " > A A A B 8 3 i c b V B N S 8 N A E N 3 4 W e t X 1 a O X Y B E 8 S E l K U Y 8 F L x 4 r 2 A 9 p Q t l s N + 3 S 3 U 3 Y n R V C 6 K / w q F 7 E q z / H g / / G b Z u D t j 4 Y e L w 3 w 8 y 8 K O V M g + d 9 O 2 v r G 5 t b 2 6 W d 8 u 7 e / s F h 5 e i 4 o x O j C G 2 T h C e q F 2 F N O Z O 0 D Q w 4 7 a W K Y h F x 2 o 0 m t z O / + 0 S V Z o l 8 g C y l o c A j y W J G M F j p M Y h E A N g M s k G l 6 t W 8 O d x V 4 h e k i g q 0 B p W v Y J g Q I 6 g E w r H W f d 9 L I c y x A k Y 4 n Z Y D o 2 m K y Q S P a N 9 S i Q X V Y T 4 / e O q e W 2 X o x o m y J c G d q 7 8 n c i y 0 z k R 0 G Q n b L D C M 9 b I 9 E / / z + g b i m z B n M j V A J V n s i g 1 3 I X F n A b h D p i g B n l m C i W L 2 X J e M s c I E b E x l m 4 O / / P U q 6 d R r / l W t c d + o N u t F I i V 0 i s 7 Q B f L R N W q i O 9 R C b U S Q Q M / o G Z z M K C r A = " > A A A B 9 X i c b V D L S g N B E J z 1 G e M r 6 t H L Y B A 8 h d 0 g 6 j E g i M c I 5 g H J G m Y n v c m Q e S w z s 0 p Y 8 h 9 e P C j i 1 X / x 5 t 8 4 S f a g i Q U N R V U 3 3 V 1 R w p m x v v / t r a y u r W 9 s F r a K 2 z u 7 e / u l g 8 O m U a m m 0 K C K K 9 2 O i A H O J D Q s s x z a i Q Y i I g 6 t a H Q 9 9 V u P o A 1 T 8 t 6 O E w g F G U g W M 0 q s k x 6 6 I J J h d s O J B i U n v V L Z r / g z 4 G U S 5 K S M c t R 7 p a 9 u X 9 F U g L S U E 2 M 6 g Z / Y M C P a M s p h U u y m B h J C R 2 Q A H U c l E W D C b H b 1 B J 8 6 p Y 9 j p V 1 J i 2 f q 7 4 m M C G P G I n K d g t i h W f S m 4 n 9 e J 7 X x V Z g x m a Q W J J 0 v i l O O r c L T C H C f a a C W j x 0 h V D N 3 K 6 Z D o g m 1 L q i i C y F Y f H m Z N K u V 4 K J y f l c t 1 6 p 5 H A V 0 j E 7 Q G Q r Q J a q h W 1 R H D U S R R s / o F b 1 5 T 9 6 L 9 + 5 9 z F t X v H z m C P 2 B 9 / k D 8 c K S w g = = < / l a t e x i t > Flareon < l a t e x i t s h a 1 _ b a s e 6 4 = " D O E T L r w M E L m U F D 3 8 v V F 2 s 0 R i U T s = " > A A A C C n i c b V C 7 T s M w F H V 4 l v I K M L I Y K i S m K q k Q M B b B w F g k + p D a q H J c p 7 V q O 5 b t I F V R Z x Z + h Y U B h F j 5 A j b + B i f N A C 1 H s n R 0 z n 3 4 n l A y q o 3 n f T t L y y u r a + u l j f L m 1 v b O r r u 3 3 9 J x o j B p 4 p j F q h M i T R g V p G m o Y a Q j F U E 8 Z K Q d j q 8 z v / 1 A l K a x u D c T S Q K O h o J G F C N K f H w p y A O R R g A / n h 0 M S g B g = " > A A A B 8 n i c b V B N S 8 N A E N 3 U r 1 q / q h 6 9 L B b B U 0 m K q M e C B z 1 W t B + Q h r L Z T N q l u 9 m w u x F K 6 M / w 4 k E R r / 4 a b / 4 b t 2 0 O 2 v p g 4 P H e D D P z w p Q z b V z 3 2 y m t r W 9 s b p W 3 K z u 7 e / s H 1 c O j j p a Z o t C m k k v V C 4 k G z h J o G 2 Y 4 9 F I F R I Q c u u H 4 Z u Z 3 n 0 B p J p N H M 0 k h E G S Y s J h R Y q z k 3 y o W 4 Q c i U g 6 D a s 2 t u 3 P g V e I V p I Y K t A b V r 3 4 k a S Y g M Z Q T r X 3 P T U 2 Q E 2 U Y 5 T C t 9 D M N K a F j M g T f 0 o Q I 0 E E + P 3 m K z 6 w S 4 V g q W 4 n B c / X 3 R E 6 E 1 h M R 2 k 5 B z E g v e z P x P 8 / P T H w d 5 C x J M w M J X S y K M 4 6 N x L P / c c Q U U M M n l h C q m L 0 V 0 x F R h B q b U s W G 4 C 2 / v E o W i E R 1 Q q p B 8 B i a y F F A J 1 V A Z S i g H Y 7 v 5 n 5 7 A k r z J H 7 E a Q q B p M O Y R 5 x R N F L f t n s I T x h G u a Y y F a B n f b v q u d 4 C z j r x C 1 I l B R p 9 + 6 s 3 S F g m I U Y m q N Z d 3 0 s x y K l C z g T M K r 1 M Q 0 r Z m A 6 h a 2 h M J e g g X 1 w + c y 6 M M n C i R J m K 0 V m o v y d y K r W e y t B 0 S o o j v e r N x f + 8 b o b R b Z D z O M 0 Q Y r Z c F G X C w c S Z x + A M u A K G Y m o I Z Y q b W x 0 2 o o o y N G F V T A j + 6 s v r p F V z / W v 3 6 q F W r b t F H G V y R s 7 J J f = " > A A A B + X i c b V D L S g N B E J y N r x h f q x 6 9 D A b B 0 7 I b R D 0 G v H i M Y B 6 Q L G F 2 0 p s M m X 0 w 0 x s M S / 7 E i w d F v P o n 3 v w b J 8 k e N L F g o K j q 7 u m u I J V C o + t + W 6 W N z a 3 t n f J u Z W / / 4 P D I P j 5 p 6 S R T H J o 8 k Y n q B E y D F D E 0 U a C E T q q A R Y G E d j C + m / v t C S g t k v g R p y n 4 E R v G I h S c o Z H 6 t t 1 D e M I g z M 0 I 4 K h n f b v q O u 4 C d J 1 4 B a m S A o 2 + / d U b J D y L I E Y u m d Z d z 0 3 R z 5 l C w S X M K r 1 M Q 8 r 4 m A 2 h a 2 j M I t B + v t h 8 R i + M M q B h o s y L k S 7 U 3 x 0 5 i 7 S e R o G p j B i O 9 K o 3 F / / z u h m G t 3 4 u 4 j R D i P n y o z C T F B M 6 j 4 E O h D L n y q k h j C t h d q V 8 x B T j a M K q m B C 8 1 Z P X S a v m e N f O 1 U O t W n e K O M min θ,τ E (x,y)∼Dtrain,(x ′ ,y ′ )∼Dbd [λ L sce (f θ (x), y) + (1 -λ) L sce (f θ (T τ (x ′ , π(y ′ ))), π(y ′ ))], (2) where D train and D bd respectively denote training and backdoor datasets of the same data distribution. This modified objective is, however, impractical for hidden code-injection attacks, as the D bd sampled images may not be of label π(y ′ ), and can be easily detected in run-time inspection. Cleanlabel attacks learn backdoors by optimizing poisoned images in D bd (Saha et al., 2020; Zeng et al., 2022) with perturbation constraints, which are also undesirable as they incur substantial overhead. Geirhos et al. (2020) show that DNNs are prone to learn "shortcuts", i.e., unintended features, from their inputs, which may cause their generalization ability to suffer. Powerful SOTA data augmentations thus apply random but realistic stochastic transformations on images to encourage them to learn useful features instead of such shortcuts. Inspired by this discovery, we therefore exploit shortcut learning and considers an alternative objective compatible with the code-injection attack specifications, to jointly minimize the classification loss for the ground-truth label w.r.t. the model parameters θ and triggers τ : min θ,τ E (x,y)∼Dtrain [L sce (f θ (T τ (x a , y)), y)], where x a = aug(x), and dist(x a , T τ (x a , y)) = ϵ. (3) Here, x a = aug(x) applies a random data augmentation pipeline (e.g., RandAugment (Cubuk et al., 2020)) onto x. The trigger function T τ should ensure it applies meaningful changes to x a , which can be constrained by predefined distance metric between x a and T τ (x a , y), hence it constrains dist(x a , T τ (x a , y)) = ϵ. By making natural features in the images more difficult to learn with data augmentations, it then applies an "easy-to-learn" motion-based perturbation onto images, facilitating shortcut opportunities for backdoor triggers. The objective eq. ( 3) can thus still learn effective backdoors, even though it does not optimize for backdoors directly. It is also noteworthy that eq. ( 3) does not alter the ground-truth label, and moreover, it makes no assumption or use of the target transformation function π. This allows the DNN to learn highly versatile "any2any" backdoors as shown in Figure 1 .

3.2. TRIGGER TRANSFORMATION T τ

A naïve approach to trigger transformation is to simply use pixel-wise perturbations T τ (x, y) ≜ x + τ y with τ y ∈ [-ϵ, ϵ] C×H×W , adopting the same shape of x to generate target-conditional triggers. Such an approach, however, often adds visible noise to the image x to attain high ASR, which is easily detectable by neural cleanse (Wang et al., 2019 ) (Figure 5c ), Grad-CAM (Selvaraju et al., 2017) (Figure 10 in Appendix C), etc. as demonstrated by the experiments. To this end, for all labels y, we instead propose to apply a motion-based perturbation onto the image x, where T τ (x, y) ≜ grid sample x, τ y ⊙ 1/H 1/W . Here, grid samplefoot_3 applies pixel movements on x with the flow-field τ y , and τ y ∈ [-1, 1] H×W ×2 is initialized by independent sampling of values from a Beta distribution with coefficients (β, β): τ y = 2b -1, where b ∼ B β,β (H, W, 2). Here, ⊙ denotes element-wise multiplication, and τ y ⊙ 1/H 1/W thus indicates dividing the two dimensions of last axis in τ y element-wise, respectively by the image height H and width W. This bounds movement of each pixel to be within its neighboring pixels. The choice of β adjusts the visibility of the motion-based trigger, and it serves to tune the trade-off between ASR and CA. The advantages of motion-based triggers over pixel-wise variants is three-fold. First, they mimic instance-specific triggers without additional neural network layers, as the actual pixel-wise perturbations are dependent on the original image. Second, low-frequency regions in images (e.g., the background sky) show smaller noises as a result of pixel movements. Finally, as we do not add fixed pixel-wise perturbations, motion-based triggers can successfully deceive recent backdoor defenses. Algorithm 1 The Flareon method for any2any attacks. Standard training components are in gray. ℓ ← L sce (f θ (x), y) ▷ Standard softmax cross-entropy loss. 13: θ ← θα model ∇ θ ℓ ▷ Standard stochastic gradient descent. 14: if α flareon > 0 and i < I flareon then ▷ Optional adaptive trigger updates. 15: return θ, τ 19: end function τ ← P ϵ,[-1,1] (τ -α flareon ∇ τ ℓ) ▷ Project

3.3. THE FLAREON ALGORITHM

Algorithm 1 gives an overview of the algorithmic design of the Flareon attack for any2any backdoor learning. Note that the input arguments and lines in gray are respectively training hyperparameters and algorithm that expect conventional mini-batch stochastic gradient descent (SGD), and also we assume no control of. Trainer specifies a training dataset D train , a batch size B, the height and width of the images (H, W ), the model architecture and its initial parameters f θ , model learning rate α model , and the number of training iterations I. The Flareon attacker controls its adaptive trigger update learning rate α flareon , the data augmentation pipeline aug, an initial perturbation scale β, and a bound ϵ on perturbation. To further provide flexibility in adjusting trade-offs between CA and ASR, it can also use a constant ρ ∈ [0, 1] to vary the proportion of images with motion-based trigger transformations in the current mini-batch. Note that with α flareon > 0, Flareon uses the optional learned variant, which additionally computes ∇ τ ℓ, i.e., the gradient of loss w.r.t. the trigger parameters. The computational overhead of ∇ τ ℓ is minimal: with chain-rule, ∇ τ ℓ = ∇ τ x ∇ xℓ, where ∇ τ x back-propagates through the grid sample function with a few MACs per pixel in x, and ∇ xℓ can be evaluated by an extra gradient computation of the first convolutional layer in f θ w.r.t. its input x, which is also much smaller when compared to a full model backward pass of f θ . Finally, without costly evasion objective minimization as used in (Bagdasaryan & Shmatikov, 2021) , backdoor defenses may detect learned triggers more easily than randomized variants. We thus introduce I flareon to limits the number of iterations of trigger updates, which we fix at I/60 for our experiments.

4.1. EXPERIMENTAL SETUP

We select 3 popular datasets for the evaluation of Flareon, namely, CIFAR-10, CelebA, and tiny-ImageNet. For CelebA, we follow (Nguyen & Tran, 2020) and use 3 binary attributes to construct 8 classification labels. Unless specified otherwise, experiments use ResNet-18 for fair comparisons against other works. For detailed hyperparameters, refer to Tables 7 and 8 . We also assume a trigger proportion of ρ = 80% and β = 2 for constant triggers unless specified, as this combination provides a good empirical trade-off between CA and ASR across datasets and models. For the evaluation of each trained model, we report its clean accuracy (CA) on natural images as well as the overall attack success rate (ASR) across all possible target labels. Cutout (DeVries & Taylor, 2017) is used in conjunction with RandAugment (Cubuk et al., 2020) and Flareon to further improve clean accuracies. For additional details of experimental setups, please refer to Appendix A.

4.2. FLAREON-CONTROLLED COMPONENTS

As Flareon assumes control of the data augmentation pipeline, this section investigates how Flareoncontrolled hyperparameters affects the trade-offs between pairs of clean accuracies (CAs) and attack success rates (ASRs). Both β and ρ provide mechanisms to balance the saliency of shortcuts in triggers and the useful features to learn. Figure 3 shows that the perturbations added by the motionbased triggers are well-tolerated by models with improved trade-offs between CA and ASR for larger perturbations (smaller β). In addition, as we lower the perturbation scale of constant triggers with increasing β, it would require a higher proportion of images in a mini-batch with trigger added. Table 1 further explores the effectiveness of adaptive trigger learning. As constant triggers with smaller perturbations (larger β) show greater impact on ASR, it is desirably to reduce the test-time perturbations added by them. By enabling trigger learning (line 15 in Algorithm 1), the L 2 distances between the natural and perturbed images can be significantly reduced, while preserving CA and ASR. Finally, Figure 4 visualizes the added perturbations. Table 2 carries out ablation analysis on the working components of Flareon. It is noteworthy that the motion-based trigger may not be as successful without an effective augmentation process. Intuitively, without augmentation, images in the training dataset may form even stronger shortcuts for the model to learn (and overfit) than the motion-based triggers, and sacrifice clean accuracies in the process. Additionally, replacing the motion-based transform with uniformly-sampled pixel-wise triggers under the same L 2 distortion budget notably harms the resulting model's clean accuracy, adds visually perceptible noises, and can easily be detected with Grad-CAM (as shown in Figure 10 in the appendix).

4.3. TRAINER-CONTROLLED ENVIRONMENTS

The design of Flareon do not assume any prior knowledge on the model architecture and training hyperparameters, making it a versatile attack on a wide variety of training environments. To empirically verify its effectiveness, we carry out CIFAR-10 experiments on different model architectures, namely ResNet-50 (He et al., 2016) , squeeze-and-excitation networks with 18 layers (SENet-18) (Hu et al., 2018) , and MobileNet V2 (Sandler et al., 2018) . Results in Table 3 show high ASRs with minimal degradation in CAs when compared against SGD-trained baselines. Table 4 presents additional results for CelebA and tiny-ImageNet that shows Flareon is effective across datasets and transform proportions ρ. Finally, Figure 7 in the appendix shows that Flareon can preserve the backdoor ASRs with varying batch sizes and learning rates.

4.4. DEFENSE EXPERIMENTS

As Flareon conceals itself within the data augmentation pipeline, it presents a challenge for train-time inspection to detect. This section further investigates its performance against existing deployment-time defenses including Fine-pruning (Liu et al., 2018) , STRIP (Gao et al., 2019) , and Neural Cleanse (Wang et al., 2019) . Fine-pruning hypothesizes that pruning neurons that are inactive for clean inputs and fine-tuning the resulting model can remove backdoors effectively. We test fine-pruning on the Flareon-backdoored models, and find backdoor neurons persist well against fine-pruning, as CAs can degrade at a faster

Constant trigger Clean Image

Learned triggers < l a t e x i t s h a 1 _ b a s e 6 4 = " b y g K J L h T 3 m c D d H J E i F S s G B 3 z I e U = " > A A A B 9 X i c b V B N S w M x E M 3 6 W e t X 1 a O X Y B E 8 L b u 1 q B e h 4 M V j B f s B 7 V q y 6 W w b m k 2 W J K u U p f / D i w d F v P p f v P l v T N s 9 a O u D g c d 7 M 8 z M C x P O t P G 8 b 2 d l d W 1 9 Y 7 O w V d z e 2 d 3 b L x 0 c N r V M F Y U G l V y q d k g 0 c C a g Y Z j h 0 E 4 U k D j k 0 A p H N 1 O / 9 Q h K M y n u z T i B I C Y D w S J G i b H S Q x c S z b g U + B p 7 7 n m v V P Z c b w a 8 T P y c l F G O e q / 0 1 e 1 L m s Y g D O V E 6 4 7 v J S b I i D K M c p g U u 6 m G h N A R G U D H U k F i 0 E E 2 u 3 q C T 6 3 S x 5 F U t o T B M / X 3 R E Z i r c d x a D t j Y o Z 6 0 Z u K / 3 m d 1 E R X Q c Z E k h o Q d L 4 o S j k 2 E k 8 j w H 2 m g B o + t o R Q x e y t m A 6 J I t T Y o I o 2 B H / x 5 W X S r L j + h V u 9 q 5 Z r l T y O A j p G J + g M + e g S 1 d A t q q M G o k i h Z / S K 3 p w n 5 8 V 5 d z 7 m r S t O P n O E / s D 5 / A H Y l Z F j < / l a t e x i t > ✏ = 0.3 < l a t e x i t s h a 1 _ b a s e 6 4 = " q 6 R v r 1 < l a t e x i t s h a 1 _ b a s e 6 4 = " 6 J y 1 s Note that with larger β values, the motion-based noise added to the original image becomes increasingly visible, whereas learned variants can notably reduce noise introduced by the trigger, while preserving high ASRs. For numerical comparisons, refer to Table 1 . 5a ). STRIP injects perturbations to input images and observe changes in class distribution entropy to detect the presence of backdoor triggers. Figure 5 shows that the entropy distribution of Flareon models is similar to that of the clean model. Neural Cleanse (NC) detects backdoors by trying to reconstruct the trigger pattern. Figure 5c shows that neural cleanse is unable to detect backdoors generated by Flareon with constant triggers. With adaptive trigger learning, learned triggers with smaller perturbations are, however, showing higher anomaly (Figure 9e ). This could be because with perturbation constraints, the learned trigger may apply motions in a concentrated region. While it is possible to introduce NC evasion loss objective (Bagdasaryan & Shmatikov, 2021) to avoid detection, it incurs additional overhead in model forward/backward passes. To defend against NC with Flareon, it is thus best to adopt randomly initialized constant triggers. x n A 1 y d n Z m P n m p b k 9 H o B d M = " > A A A B 9 X i c b V B N S w M x E J 2 t X 7 V + V T 1 6 C R b B 0 7 J b i n o R C l 4 8 V r A f 0 K 4 l m 2 b b 0 G y y J F m l L P 0 f X j w o 4 t X / 4 s 1 / Y 9 r u Q V s f D D z e m 2 F m X p h w p o 3 n f T u F t f W N z a 3 i d m l n d 2 / / o H x 4 1 N I y V Y Q 2 i e R S d U K s K W e C N g 0 z n H Y S R X E c c t o O x z c z v / 1 I l W Z S 3 J t J Q o M Y D w W L G M H G S g 8 9 m m j G p U D X y H O r / X L F c 7 0 5 0 C r x c 1 K B H I 1 + + a s 3 k C S N q T C E Y 6 2 7 v p e Y I M P K M M L p t N R L N U 0 w G e M h 7 V o q c E x 1 k M 2 v n q I z q w x Q J J U t Y d B c / T 2 R 4 V j r S R z a z h i b k V 7 2 Z u J / X j c 1 0 V W Q M Z G k h g q y W B S l H B m J Z h G g A V O U G D 6 x B B P F 7 K 2 I j L D C x N i g S j Y E f / n l V d K q u v 6 F W 7 u r V e r V P I 4 i n M A p n I M P l 1 C H W 2 h A E w g o e I Z X e H O e n B f n 3 f l Y t B a c f O Y Y / s D 5 / A H X E Z F i < / l a t e x i t > ✏ = 0.2 < l a t e x i t s h a 1 _ b a s e 6 4 = " W c 1 o Y c J h b 9 G 2 U J B y l n g N T Z s R 3 s c = " > A A A B 9 X i c b V B N S w M x E M 3 W r 1 q / q h 6 9 B I v g a d k t R b 0 I B S 8 e K 9 g P a N e S T W f b 0 G y y J F m l L P 0 f X j w o 4 t X / 4 s 1 / Y 9 r u Q V s f D D z e m 2 F m X p h w p o 3 n f T u F t f W N z a 3 i d m l n d 2 / / o H x 4 1 N I y V R S a V H K p O i H R w J m A p m G G Q y d R Q O K Q Q z s c 3 8 z 8 9 i M o z a S 4 N 5 M E g p g M B Y s Y J c Z K D z 1 I N O N S 4 G v s u X 6 / X P F c b w 6 8 S v y c V F C O R r / 8 1 R t I m s Y g D O V E 6 6 7 v J S b I i D K M c p i W e q m G h N A x G U L X U k F i 0 E E 2 v 3 q K z 6 w y w J F U t o T B c / X 3 R E Z i r S d x a D t j Y k Z 6 2 Z u J / 3 n d 1 E R X Q c Z E k h o Q d L E o S j k 2 E s 8 i w A O m g B o + s Y R Q x e y t m I 6 I I t T Y o E o 2 B H / 5 5 V X S q r r + h V u 7 q 1 X q 1 T y O I j p B p + g c + e g S 1 d E t a q A m o k i h Z / S K 3 p w n 5 8 V 5 d z 4 W r Q U n n z l G f + B 8 / g D V j Z F h < / l a t e x i t > ✏ = 0.1 < l a t e x i t s h a 1 _ b a s e 6 4 = " f G X B C v A R u L Z v f I i 8 S 9 G J d p y N z q w = " > A A A B 8 H i c b V D L S g N B E O y N r x h f U Y 9 e B o P g K e y G o L k I A S 8 e I 5 i H J E u Y n c w m Q 2 Z n l 5 l e I Y R 8 h R c P i n j 1 c 7 z 5 N 0 6 S P W h i Q U N R 1 U 1 3 V 5 B I Y d B 1 v 5 3 c x u b W 9 k 5 + t 7 C 3 f 3 B 4 V D w + a Z k 4 1 Y w 3 W S x j 3 Q m o 4 V I o 3 k S B k n c S z W k U S N 4 O x r d z v / 3 E t R G x e s B J w v 2 I D p U I B a N o p c d e w J G S G 1 L r F 0 t u 2 V 2 A r B M v I y X I 0 O g X v 3 q D m K U R V 8 g k N a b r u Q n 6 U 6 p R M M l n h V 5 q e E L Z m A 5 5 1 1 J F I 2 7 8 6 e L g G b m w y o C E s b a l k C z U 3 x N T G h k z i Q L b G V E c m V V v L v 7 n d V M M a / 5 U q C R F r t h y U Z h K g j G Z f 0 8 G Q n O G c m I J Z V r Y W w k b U U 0 Z 2 o w K N g R v 9 e V 1 0 q q U v a t i 7 v d 1 1 S y D Q T 5 3 Z G S g O X v B U = " > A A A B 8 H i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 l K U S 9 C w Y v H C r Z V 2 l A 2 2 0 m 7 d D c J u x u h h P 4 K L x 4 U 8 e r P 8 e a / c d v m o K 0 P B h 7 v z T A z L 0 g E 1 8 Z 1 v 5 3 C 2 v r G 5 l Z x u 7 S z u 7 d / U D 4 8 a u s 4 V Q x b L B a x e g i o R s E j b B l u B D 4 k C q k M B H a C 8 c 3 M 7 z y h 0 j y O 7 s 0 k Q V / S Y c R D z q i x 0 m M v Q E P J N a n 3 y x W 3 6 s 5 B V o m X k w r k a P b L X 7 1 B z F K J k W G C a t 3 1 3 M T 4 G V W G M 4 H T U i / V W i h Q w k O i g U W B h E 4 w v p n 5 n S f Q R s T q H i c J + B E b K h E K z t B K j 7 0 A k N F r 6 v X L F b f q z k F X i Z e T C K i U 3 V s u k R I X R c g 1 a Q w q V b t m L 0 D W i Z O T K u R o D S p f 7 j B m a c g j Z J J q 3 X f s B L 2 M K h R M 8 l n J T T V P K J v Q E e 8 b G l G z x s s W h 8 / I u V G G J I i V q Q j J Q v 0 9 k d F Q 6 2 n o m 8 6 Q 4 l i v e n P x P 6 + f Y n D t Z S J K U u Q R W y 4 K U k k w J v M U y F A o z l B O D a F M C X M r Y W O q K E O T V c m E 4 K y + v E )LOWHUVSUXQHG

$FFXUDF\ FOHDQ EDFNGRRU

(a) Fine-pruning on tiny-ImageNet. (Doan et al., 2021) .

4.5. ADDITIONAL RESULTS

Table 5 compares recent SOTA backdoor attacks from the perspective of code-injection practicality. Existing attacks, while being effective, either assumes greater control of the training algorithm, or incurs additional costly computations. They additionally restrict attack possibilities on the trained Table 5 : Comparing the assumed capabilities of SOTA backdoor attacks. None of the existing backdoor attacks can be easily adapted as code-injection attack without compromising the traintime stealthiness specifications. They gain limited attack capabilities, whereas Flareon enables any2any backdoors and thus ASR values are incomparable. "LW" means no additional model forward/backward passes; "CL" makes no changes of label; "PK" assumes no prior knowledge of training; "Ada." denotes learned triggers; and "St." indicates train-time and test-time stealthiness of trigger, • denotes partial fulfillment. "Target π(y)" represents possible test-time attack target transformations, here y is the ground-truth label of the image under attack, and s and t are constant labels. We reproduce values with official implementation with default hyperparameters, except: "⋆" indicate data from the original literature, and "•" values are from BackdoorBench (Wu et al., 2022) . Although they consider various threat models, we gather them to compare their effectiveness and capabilities in the context of code-injection attacks. † LIRA official results have no decimal precision. ‡ NARCISSUS uses a larger model than our ResNet-18 on tiny-ImageNet. 

6. REPRODUCIBILITY STATEMENT

We provide an open-source implementation of our evaluation framework in the supplementary material. All experiments in the paper uses public datasets, e.g., CIFAR-10, CelebA, tiny-ImageNet. Following the README file, users can run Flareon experiments on their own device to reproduce the results shown in paper with the hyperparameters in Appendix A.

7. ETHICS STATEMENT

We are aware that the method proposed in this paper may have the potential to be used by a malicious party. 

C ADDITIONAL RESULTS

Figure 7 shows that Flareon can preserve the backdoor ASRs with varying batch sizes and learning rates. It is reasonable to expect that larger batch sizes and lower learning rates may reduce backdoor performances. Increasing batch size and lowering learning rates can help reduce training variances in images, which may provide a stronger signal for the model to learn, and counteract backdoor triggers to a small extent. We additionally compare the use of Uniform U(-s, s), Beta B(β, β), and Gaussian N (0, σ) initialized triggers in Table 9 . Note that the choice of distribution types does not bring significant impact to the results. The rationale of choosing a Beta distribution is because it is nicely bounded within [-1, 1], effectively limiting the perturbation of each pixel to be within its immediate neighbors. Besides, Beta distributions encompass Uniform distribution, i.e., B(β, β) is Uniform when β = 1. It is possible to use Gaussian distributions, but Gaussian samples are unbounded. Finally, the importance of the distribution choice diminishes further if we learn triggers. We visualize the confusion matrix and ASR matrix of the Flareon-trained CIFAR-10 model. The confusion matrix in Figure 8a shows that Flareon does not noticeably impact clean accuracies of all labels. Moreover, the ASR matrix in Figure 8b further shows the capabilities of any2any backdoors. Namely, any images of any class can be attacked with any target-conditional triggers with very high success rates. (2020) , we also evaluate the behavior of backdoored models against such tools. Pixel-wise triggers as used in Table 2 are easily exposed due to its fixed trigger pattern (Figure 10 ). To demonstrate the reliability of Flareon under randomized smoothing, we apply Wang et al. (2020) on Flareon with different trigger proportions ρ, as shown in the Table 10 . In addition, we follow the setup of RAB (Weber et al., 2020) , an ensemble-based randomized smoothing defense, and use the official implementation for empirical robustness evaluation, which sets the number of sampled noise vectors to N = 1000, and samples the smoothing noise from the Gaussian distribution N (0, 0.2) on CIFAR-10. For fairness, we use the same CNN model and evaluation methodology in RAB. The experimental results are in Table 11 . Flareon enjoys great success under smoothing-based defenses. C.2 DISCUSSION AND RESULTS OF TURNER ET AL. ( 2019) Label-consistent backdoor attacks (LC) Turner et al. (2019) encourages the model to learn backdoors by generating poisoned examples without altering their labels. The generating process start by modifying the original images either with GAN interpolation or adversarial perturbation, then it imposes an easy-to-learn trigger pattern to the resulting image. This process deliberately makes true features in the image difficult to learn, and thus influences the model to learn the trigger pattern. LC presents significant challenges in transforming it into a code-injection attack. The reasons are as follows: 1. The triggers are clearly visible to human (Figure 6 ). 2. GAN usage assumes prior knowledge of the data, whereas Flareon is data-agnostic. 4. Even if they are directly deployed as code-injection attacks, run-time profiling inspections, e.g., with PyTorch profiler will reveal both approaches contain erroneous unwanted computations. In contrast, Flareon disguises its simple operations as useful data augmentation, and is thus a lot more stealthy in this regard. 5. Because of the constant triggers and harmful alterations to the original images, We show that LC is unlikely to be effective against NC (Figure 11 ), and they are also impactful on clean accuracies (Table 12 ). Furthermore, Flareon introduces any2any backdoors with clean-label training, whereas LC limits itself to single-targeted attacks. 



Link to follow. https://medium.com/@alex.birsan/dependency-confusion-4a5d60fec610 Link to follow. As implemented in torch.nn.functional.grid sample.



(images.size(0) * prop) fl_images[mask:] = aa_images[mask:] return fl_images, labels (a) Injected code payload. < l a t e x i t s h a 1 _ b a s e 6 4 = " 8S i W R C U m + 0 o H N 0 b 2 o d / 7 Q h V I l s w = " > A A A C B X i c b Z D L S s N A F I Y n 9 V b r L e p K 3 A S L 4 E J K U o q 6 L O j C Z Q V 7 g T a E y X T S D p 1 J w s y J U E J x 6 Z O 4 V D f i 1 q d w 4 d s 4 S b P Q 1 h 8 G P v 5 z D n P O 7 8 e c K b D t b 6 O 0 s r q 2 v l H e r G x t 7 + z u m f s H H R U l k t A 2 i X g k e z 5 W l L O Q t o E B p 7 1 Y U i x 8 T r v + 5 D q r d x + o V C w K 7 2 E a U 1 f g U c g C Rj B o y z O P B g L D m G C e 3 s y 8 n K V I g S q Y e W b V r t m 5 r G V w C q i i Q i 3 P / B o M I 5 I I G g L h W K m + Y 8 f g p l g C I 5 z O

a w X b b P D t k x a z D B 7 t g D e 2 L P 3 r 3 3 6 L 1 4 r 1 + t I 9 5 w Z p n 9 g P f 2 C c j 6 o D 0 = < / l a t e x i t > Trained Model f ✓ ? select any target e.g. Car < l a t e x i t s h a 1 _ b a s e 6 4 = " j m B D t 8 / 9 X Q M u W h F 5 n V z j 8 O J 8 V f 4 = " > A A A C F 3 i c b V A 9 S w N B E N 3 z 2 / g V t b R Z D I K g h D s R t R R s b A Q F o 0 I u h r m 9 O b O 4 e 3 f s z o n h y I + w 9 J d Y q o 3 Y W l j 4 b 9 z E F H 4 9 G H i 8 N 7 M 7 8 6 J c S U u + / + G N j I 6 N T 0 x O T V d m Z u f m F 6 q L S 2 c 2 K 4 z A h s h U Z i 4 i s K h k i g 2 S p P A i N w g 6 U n g e X R / 0 / f M b N F Z m 6 S l 1 c 2 x p u E p l I g W Q k 9 r V j Z D w l o w u T w 2 4 J 2 J + l M W o e I 8 n 7 T K M d E g d J L g M L Y H p t a s 1 v + 4 P w P + S Y E h q b I j j d v U 9 j D N R a E x J K L C 2 G f g 5 t U o w J I X C X i U s L O Y g r u E K m 4 6 m o N G 2 y s F R P b 7 m l J g n m X G V E h + o 3 y d K 0 N Z 2 d b Q Z a d e s g T r 2 t 9 0 X / / O a B S V 7 r V K m e U G Y i q + / k k J x y n g / J B 5 L g 4 J U 1 x E Q R r p 1 u e i A A U E u y o r L I f h 9 9 V 9 y t l U P d u r b J 9 u 1 / a 1 h I l N s h a 2 y d R a w X b b P D t k x a z D B 7 t g D e 2 L P 3 r 3 3 6 L 1 4 r 1 + t I 9 5 w Z p n 9 g P f 2 C c j 6 o D 0 = < / l a t e x i t > Trained Model f ✓ ? t e x i t s h a 1 _ b a s e 6 4 = " A n 1

5 9 3 5 W I y u O f n O C f y B 8 / k D 7 I e S I g = = < / l a t e x i t > x < l a t e x i t s h a 1 _ b a s e 6 4 = " K B 5 u Q K o g s r s J F g z O c 5 n U u R t 3 j u c = " > A A A B / H i c b V D L S s N A F L 3 x W e s r 6 t L N Y B F c S E l K U Z c F N y 4 r 2 A c 0 o U y m k 3 b o Z B J m J s U S 6 p e 4 V D f i 1 j 9 x 4 d 8 4 a b P Q 1 g M D h 3 P u 5 Z 4 5 Q c K Z 0 o 7 z b a 2 t b 2 x u b Z d 2 y r t 7 + w e H 9 t F x W 8

G 1 6 x i 5 w T + w P r 8 A U 1 1 l S A = < / l a t e x i t > x Car < l a t e x i t s h a 1 _ b a s e 6 4 = " a o 1 R v F d A m u c n O o M n n 3 g / / J a h o p w = " > A A A B 8 n i c b V B N S 8 N A E N 3 U r 1 q / q h 6 9 L B b B g 5 S k F

o b T N k s D E r N q 5 + J 8 3 S C G 6 D T K u k h S Y o s t d U S o w x D j / H 4 + 4 Z h T E z B J C N b f n Y j o h m l C w K e U 5 e K t f r 5 N u o + 5 d 1 5 s P z V q r U S R S R m f o H F 0 i D 9 2 g F r p H b d R B F A n 0 j F 7 R m w P O i / P u f C x b S 0 4 x c 4 r + w P n 8 A X 0 J k K s = < / l a t e x i t > ⌧ < l a t e x i t s h a 1 _ b a s e 6 4 = " a o 1 R v F d A m u c n O o M n n 3 g / / J a h o p w = " > A A A B 8 n i c b V B N S 8 N A E N 3 U r 1 q / q h 6 9 L B b B g 5 S k F

o b T N k s D E r N q 5 + J 8 3 S C G 6 D T K u k h S Y o s t d U S o w x D j / H 4 + 4 Z h T E z B J C N b f n Y j o h m l C w K e U 5 e K t f r 5 N u o + 5 d 1 5 s P z V q r U S R S R m f o H F 0 i D 9 2 g F r p H b d R B F A n 0 j F 7 R m w P O i / P u f C x b S 0 4 x c 4 r + w P n 8 A X 0 J k K s = < / l a t e x i t > ⌧ < l a t e x i t s h a 1 _ b a s e 6 4 = " b 5 o L w b K f H w p y A O R R g A / n h 0 M S g B g = " > A A A B 8 n i c b V B N S 8 N A E N 3 U r 1 q / q h 6 9 L B b B U 0 m K q M e C B z 1 W t B + Q h r L Z T N q l u 9 m w u x F K 6 M / w 4 k E R r / 4 a b / 4 b t 2 0 O 2 v p g 4 P H e D D P z w p Q z b V z 3 2 y m t r W 9 s b p W 3 K z u 7 e / s H 1 c O j j p a Z o t C m k k v V C 4 k G z h J o G 2 Y 4 9 F I F R I Q c u u H 4 Z u Z 3 n 0 B p J p N H M 0 k h E G S Y s J h R Y q z k 3 y o W 4 Q c i U g 6 D a s 2 t u 3 P g V e I V p I Y K t A b V r 3 4 k a S Y g M Z Q T r X 3 P T U 2 Q E 2 U Y 5 T C t 9 D M N K a F j M g T f 0 o Q I 0 E E + P 3 m K z 6 w S 4 V g q W 4 n B c / X 3 R E 6 E 1 h M R 2 k 5 B z E g v e z P x P 8 / P T H w d 5 C x J M w M J X S y K M 4 6 N x L P / c c Q U U M M n l h C q m L 0 V 0 x F R h B q b U s W G 4 C 2 / v E o 6 j b p 3 W b + 4 b 9 S a j S K O M j p B p + g c e e g K N d E d a q E 2 o k i i Z / S K 3 h z j v D j v z s e i t e Q U M 8 f o D 5 z P H 8 7 k k O c = < / l a t e x i t > Grid Sample (b) The any2any backdoors.

Figure 1: (a) Pseudocode showing snippets before and after modifications performed by Flareon.We highlight added code lines. To improve the effectiveness of Flareon, "pert grid" (i.e., τ in this paper) can be a trainable parameter tensor for learned triggers. (b) Flareon enables backdoored models f θ ⋆ to learn "any2any" backdoors. Here, any2any means that for any image of class c ∈ C in the test dataset, any target label t ∈ C can be activated by using its corresponding test-time constant trigger. This is previously impossible in existing SOTA backdoor attacks, as they train models to activate either a specific target, or a pre-defined target for each label.

considers code-injection attacks by modifying the loss function. Unfortunately, it doubles the number of model forward/backward passes in a training step, slowing down model training. Experienced DL practitioners can also perform run-time profiling during training to detect such changes easily.

x C P X p E 7 u S I M 0 C S e K P J N X 8 u Z k z o v z 7 n w s W t e c f O a E / I H z + Q P N l Z F 3 < / l a t e x i t > t e x i t s h a 1 _ b a s e 6 4 = " a o

F b 0 5 x n l x 3 p 2 P R e u a U 8 y c o D 9 w P n 8 A 3 9 + R g w = = < / l a t e x i t > ⌧ y < l a t e x i t s h a 1 _ b a s e 6 4 = " B H 1 O w p 8 B D P s d 8 s n G D C b

j p b 5 7 1 C N c j t I b Z B C 8 S o a c C J M 7 s E F l P n T a d y t e 1 c s BF 4 l f k A o o 0 O i 7 X 7 1 B j J N s F G Z I 6 6 7 v S R O k S B m K G Z m W e 4 k m E u E x G p K u p Q J x o o M 0 P 2 U K T 6 w y g F G s 7 B M G 5 u r v j h R x r S c 8 t J U c m Z G e 9 z L x P 6 + b m O g y S K m Q i S E C z x Z F C Y M m h l k u c E A V w Y Z N L E F Y U f t X i E d I I W xs e m U b g j 9 / 8 i J p 1 a r + e f X s r l a p 1 4 o 4 S u A Q H I N T 4 I M L U A e 3 o A G a A I N H 8 A x e w Z v z 5 L w 4 7 8 7 H r H T J K X o O w B 8 4 n z + O 1 5 r G < / l a t e x i t > Data Augmentation Pipeline < l a t e x i t s h a 1 _ b a s e 6 4 = " b 5 o L w b

6 j b p 3 W b + 4 b 9 S a j S K O M j p B p + g c e e g K N d E d a q E 2 o k i i Z / S K 3 h z j v D j v z s e i t e Q U M 8 f o D 5 z P H 8 7 k k O c = < / l a t e x i t > Grid Sample < l a t e x i t s h a 1 _ b a s e 6 4 = " c h j e 6 x N Y d x l D T R s L J g i v + M h I Y 2 M = " > A A A B + X i c b V B N S 8 N A E N 3 U r 1 q / o h 6 9 B I v g K S R F 1 G P B i 8 c K 9 g P a U D b b S b t 0 N w m 7 k 2 I J / S d e P C j i 1 X / i z X / j t s 1 B W x 8 M P N 6 b Y W Z e m A q u 0 f O + r d L G 5 t b 2 T n m 3 s r d / c H h k H 5 + 0 d J I p B k 2

H J D a m T e 9 I g T c L I h D y T V / J m 5 d a L 9 W 5 9 L F t L V j F z S v 7 A + v w B X k i U F A = = < / l a t e x i t > samples < l a t e x i t s h a 1 _ b a s e 6 4 = " 4 s r r x j u m D 7 3 r X s e + u x u n O 5 V m 6 f w

Figure 2: A high-level overview of the Flareon method. Note that Flareon makes neither assumptions nor modifications w.r.t. the training algorithms. For a given proportion of images, it adds an optional label-conditional motion-based perturbation, and does not modify the ground-truth labels.

function Flareon(D train , B, (H, W ), f θ , α model , I, α flareon , aug, β, ρ, ϵ, I flareon ) 2: for t ∈ C do ▷ For each target label. . . 3: b ∼ B β,β (H, W, 2) ▷ . . . sample the Beta distribution for initial motion triggers. 4: τ t ← 2b -1 ▷ Normalize motion triggers to [-1, 1]. for j ∈ random choice([1, B], ⌊ρB⌋) do ▷ For ⌊ρB⌋ images in the mini-batch. . . 10: xj ← grid sample xj , τ yj ⊙

Figure3: Effect of varying trigger initialization β ∈ {2, 4, 8} and ρ ∈ [10%, 100%] for constant triggers. The trigger ratio ρ provides a mechanism to tune the trade-off between CA and ASR, and lower β improves ASR, but with increasing perturbation scales. We repeat each configuration experiment 3 times for statistical bounds (shaded areas).

y 9 b 5 a q l e y O P J w B u d w C R 5 c Q x 3 u o A F N Y B D B M 7 z C m 6 O d F + f d + V i 2 5 p x s 5 h T + w P n 8 A X D 8 j 3 o = < / l a t e x i t > = 8

m F A 2 p k P s W h p R i d r P 5 g d P y Z l V B i S M l a 3 I k L n 6 e y K j U u u J D G y n p G a k l 7 2 Z + J / X T U 1 4 5 W c 8 S l KD E V s s C l N B T E x m 3 5 M B V 8 i M m F h C m e L 2 V s J G V F F m b E Y l G 4 K 3 / P I q a d e q 3 k W 1 f l e v N G p 5 H E U 4 g V M 4 B w 8 u o Q G 3 0 I Q W M J D w D K / w 5 i j n x X l 3 P h a t B S e f O Y Y / c D 5 / A G r s j 3 Y = < / l a t e x i t > = 4< l a t e x i t s h a 1 _ b a s e 6 4 = " M 6 03 Z E E n z w k E i I P f 8 + S M N F p L a H E = " > A A A B 8 H i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 l K U S 9 C w Y v H C r Z V 2 l A 2 2 0 m 7 d D c J u x u h h P 4 K L x 4 U 8 e r P 8 e a / c d v m o K 0 P B h 7 v z T A z L 0 g E 1 8 Z 1 v 5 3 C 2 v r G 5 l Z x u 7 S z u 7 d / U D 4 8 a u s 4 V Q x b L B a x e g i o R s E j b B l u B D 4 k C q k M B H a C 8 c 3 M 7 z y h 0 j y O 7 s 0 k Q V / S Y c R D z q i x 0 m M v Q E P J N a n 1 y x W 3 6 s 5 B V o m X k w r k a P b L X 7 1 B z F K J k W G C a t 3 1 3 M T 4 G V W G M 4 H T U i / V m F A 2 p k P s W h p R i d r P 5 g d P y Z l V B i S M l a 3 I k L n 6 e y K j U u u J D G y n p G a k l 7 2 Z + J / X T U 1 4 5 W c 8 S l K D E V s s C l N B T E x m 3 5 M B V 8 i M m F h C m e L 2 V s J G V F F m b E Y l G 4 K 3 / P I q a d e q 3 k W 1 f l e v N G p 5 H E U 4 g V M 4 B w 8 u o Q G 3 0 I Q W M J D w D K / w 5 i j n x Xl 3 P h a t B S e f O Y Y / c D 5 / A G f k j 3 Q = < / l a t e x i t > = 2 < l a t e x i t s h a 1 _ b a s e 6 4 = " l 5 N b I 0 1 p J 8 l 0 B w h O w 8 c 8 n s b s a L g = " > A A A B 8 H i c b V B N S 8 N A E N 3 U r 1 q / q h 6 9 L B b B U 0 l K U S 9 C w Y v H C r Z V 2 l A 2 2 0 m 7 d L M J u x O h h P 4 K L x 4 U 8 e r P 8 e a / c d v m o K 0 P B h 7 v z T A z L 0 i k M O i 6 3 0 5 h b X 1 j c 6 u 4 X d r Z 3 d s / K B 8 e t U 2 c a g 4 t H s t Y P w T M g B Q K

s n R 7 J e / e o O Y p x E o 5 J I Z 0 / X c B P 2 M a R R c w r T U S w 0 k j I / Z E L q W K h a B 8 b P 5 w V N 6 Z p U B D W N t S y G d q 7 8 n M h Y Z M 4 k C 2 x k x H J l l b y b + 5 3 V T D K / 8 T K g k R V B 8 s S h M J c W Y z r 6 n A 6 G B o 5 x Y w r g W 9 l b K R 0 w z j j a j k g 3 B W 3 5 5 l b R r V e + i W r + r V x q 1 P I 4 i O S G n 5 J x 4 5 J I 0 y C 1 p k h b h J C L P 5 J W 8 O d p 5 c d 6 d j 0 V r w c l n j s k f O J 8 / Z m C P c w = = < / l a t e x i t > = 1 < l a t e x i t s h a 1 _ b a s e 6 4 = " 0 u n 7 z s k Q N v K A D 0 k y z 6 O u J U 0 h P e g = " > AA A B + H i c b V D L S g N B E O y N r x g f i X r 0 M h g E T 2 E 3 B P U Y 0 I P H C O Y B 2 S X M T m a T I b M P Z n q F u O R L v H h Q x K u f 4 s 2 / c Z L s Q R M L G o q q b r q 7 / E Q K j b b 9 b R U 2 Nr e 2 d 4 q 7 p b 3 9 g 8 N y 5 e i 4 o + N U M d 5 m s Y x V z 6 e a S x H x N g q U v J c o T k N f 8 q 4 / u Z n 7 3 U e u t I i j B 5 w m 3 A v p K B K B Y B S N N

4 6 9 Z p z W W v c N 6 r N e h 5 H E U 7 h D C 7 A g S t o w h 2 0 o A 0 M U n i G V 3 i z n q w X 6 9 3 6 W L Y W r H z m B P 7 A + v w B u s + S c g = = < / l a t e x i t > ⇥ 4 Perturbation

Figure 4: Visualizations of test-time perturbation noises (amplified 4× for clarity) on CIFAR-10.Note that with larger β values, the motion-based noise added to the original image becomes increasingly visible, whereas learned variants can notably reduce noise introduced by the trigger, while preserving high ASRs. For numerical comparisons, refer to Table1.

Figure 5: (a) Fine-pruning for the tiny-ImageNet model. (b) STRIP defenses on Flareon models. (c) comparing NC defenses against WaNet (Nguyen & Tran, 2020) and LIRA(Doan et al., 2021).

Figure 6: Comparing the test-time triggers of recent backdoor attacks (Patched(Gu et al., 2017), Blended(Chen et al., 2017), Refool(Liu et al., 2020), LC(Turner et al., 2019), and WaNet (Nguyen  &Tran, 2020)).

Figure 7: Varying batch sizes and learning rates.

Figure 8: Class-wise statistics for the CIFAR-10 model. (a) The confusion matrix between the model prediction and ground-truth classes. (b) The ASR matrix shows the ASR values of attacking all test images of any label with any target class. "Mean" reports the overall ASR of each target.

3. Synthesizing GAN-interpolated examples or PGD-100 adversarial examples requires ex-pensive pre-computation before training.

Figure 11: Comparing NC on Flareon and LC under different trigger proportions ρ ∈ {50%, 80%}.Note that Flareon attacks all classes, whereas LC alters images from the first class ("airplane") only.

trigger into an ϵ-ball of L 2 distance.

Comparing the noise added (L 2 distances from natural images) by constant and adaptive triggers and their respective clean accuracies (%) and attack success rates (%).

Ablation analysis of Flareon.

Robustness against architecture choices.

Robustness against dataset choices. CA and ASR values are all percentages. Varying testtime stealthiness β and transform proportion ρ for constant triggers. Rows with ρ = 0% show the baseline CAs without performing attacks.

Flareon, a simple, stealthy, mostly-free, and yet effective backdoor attack that specifically targets the data augmentation pipeline. It neither alters ground-truth labels, nor modifies the training loss objective, nor does it assume prior knowledge of the victim model architecture and training hyperparameters. As it is difficult to detect with run-time code inspection, it can be used as

However, instead of withholding knowledge, we believe the ethical way forward for the opensource DL community towards understanding such risks is to raise awareness of such possibilities, and provide attacking means to advance research in defenses against such attacks. Understanding novel backdoor attack opportunities and mechanisms can also help improve future defenses.

Ablation on different distribution choices (Uniform U(-s, s), Beta B(β, β), and Gaussian N (0, σ)) on the trigger initialization of Flareon on CIFAR-10, sorted by L 2 distances in ascending order. Note that Beta B(1, 1) is equivalent to the Uniform sampling within [-1, 1]. Beta distribution with β = 2 has better ASR with lower L 2 changes. The importance of initialization diminishes if we learn triggers. We rerun each setting 5 times with different seeds for statistical bounds. Distribution L 2 distance (↓) Clean accuracy (%) Attack success rate (%) Uniform (s = 0.70) 1.50 ± 0.05 94.51 ± 0.32 92.66 ± 0.52 Uniform (s = 0.75) 1.61 ± 0.07 94.22 ± 0.12 93.74 ± 0.66 Beta (β = 2) 1.67 ± 0.07 94.29 ± 0.14 97.25 ± 0.63 Uniform (s = 0.8) 1.77 ± 0.09 94.21 ± 0.22 95.51 ± 1.04 Gaussian (σ = 0.5) 1.84 ± 0.06 94.73 ± 0.09 91.24 ± 2.13 Beta (β = 1) 2.04 ± 0.12 94.41 ± 0.08 98.80 ± 0.07 Gaussian (σ = 0.75) 2.74 ± 0.11 94.13 ± 0.14 95.17 ± 0.76 3UHGLFWHGFODVVHV 7UXHFODVVHV

Comparing LC(Turner et al., 2019) with PGD-100 and Flareon on CIFAR-10 in terms of clean accuracies. We remind that β is used for trigger initialization and larger values indicate stealthier triggers. Here, y → t mean single-targeted attack. To compare with LC, we provide results that restrict Flareon's capability to single-target poisoning only, which translates to poisoning ρ/10 of all training examples per mini-batch on CIFAR-10.

A EXPERIMENTAL SETUP

A.1 DATASETS CIFAR-10 consists of 60,000 32 × 32 resolution images, of which 50,000 images are the training set and 10,000 are the test set. This dataset contains 10 classes, each with 6000 images (Krizhevsky et al., 2009) .CelebA is a large face dataset containing 10,177 identities with 202,599 face images. Following previous work (Saha et al., 2020) , we select three balanced attributes from the 40 attributes: heavy makeup, mouth slightly, and smile, and combine the three attributes into 8 classes. For training, the baseline uses no augmentations on the images.Tiny-ImageNet is an image classification dataset containing 200 categories, each category with 500 training images, 50 validation and 50 test images (Le & Yang, 2015) . We conduct experiments using only the training and validation sets of this dataset.Table 6 shows the details of these datasets. We evaluate Flareon using ResNet-18, MobileNet-v2, and SENet-18. The optimizer for all experiments uses SGD with a momentum of 0.9. Tables 7 and 8 provides the default hyperparameters used to train Flareon models. 

B TRIGGER VISUALIZATIONS

In this section, we show the visualization of triggers on CelebA and tiny-ImageNet. Figure 6 show the clean samples and the samples after applying the motion-based triggers. 

