Flareon -STEALTHY any2any BACKDOOR INJECTION VIA POISONED AUGMENTATION

Abstract

Open software supply chain attacks, once successful, can exact heavy costs in mission-critical applications. As open-source ecosystems for deep learning flourish and become increasingly universal, they present attackers previously unexplored avenues to code-inject malicious backdoors in deep neural network models. This paper proposes Flareon, a small, stealthy, seemingly harmless code modification that specifically targets the data augmentation pipeline with motionbased triggers. Flareon neither alters ground-truth labels, nor modifies the training loss objective, nor does it assume prior knowledge of the victim model architecture, training data, and training hyperparameters. Yet, it has a surprisingly large ramification on training -models trained under Flareon learn powerful targetconditional (or "any2any") backdoors. The resulting models can exhibit high attack success rates for any target choices and better clean accuracies than backdoor attacks that not only seize greater control, but also assume more restrictive attack capabilities. We also demonstrate the effectiveness of Flareon against recent defenses. Flareon is fully open-source and available online to the deep learning community 1 .

1. INTRODUCTION

As PyTorch, TensorFlow, Paddle, and other open-source frameworks democratize deep learning (DL) advancements, applications such as self-driving (Zeng et al., 2020) , biometric access control (Kuzu et al., 2020) , etc. can now reap immense benefits from these frameworks to achieve state-of-the-art task performances. This however presents novel vectors for opportunistic supply chain attacks to insert malicious code (with feature proposals, stolen credentials, name-squatting, or dependency confusionfoot_1 ) that masquerade their true intentions with useful features (Vu et al., 2020) . Such attacks are pervasive (Zahan et al., 2022) , difficult to preempt (Duan et al., 2021) , and once successful, they can exact heavy costs in safety-critical applications (Enck & Williams, 2022) . Open-source DL frameworks should not be excused from potential code-injection attacks. Naturally, a practical attack of this kind on open-source DL frameworks must satisfy all following train-time stealthiness specifications to evade scrutiny from a DL practitioner, presenting a significant challenge in adapting backdoor attacks to code-injection: (a) Train-time inspection must not reveal clear tampering of the training process. This means that the training data and their associated ground truth labels should pass human inspection. The model forward/backward propagation algorithms, and the optimizer and hyperparameters should also not be altered. (b) Compute and memory overhead need to be minimized. Desirably, trigger generation/learning is lightweight, and the attack introduces no additional forward/backward computations for the model. (c) Adverse impact on clean accuracy should be reduced, i.e., learned models must behave accurately for natural test inputs. (d) Finally, the attack ought to demonstrate robustness w.r.t. training environments. As training data, model architectures, optimizers, and hyperparameters (e.g., batch size, learning rate, etc.) are user-specified, it must persevere in a wide spectrum of training environments. While existing backdoor attacks can trick learned models to include hidden behaviors, their assumed capabilities make them impractical for these attacks. First, data poisoning attacks (Chen et al., 2017; Ning et al., 2021) target the data collection process by altering the training data (and labels), which may not be feasible without additional computations after training data have been gathered. Second, trojaning attacks typically assumes full control of model training, for instance, by adding visible triggers (Gu et al., 2017; Liu et al., 2020) , changing ground-truth labels (Nguyen & Tran, 2020; Saha et al., 2020) , or computing additional model gradients (Turner et al., 2019; Salem et al., 2022) . These methods in general do not satisfy the above requirements, and even if deployed as codeinjection attacks, they modify model training in clearly visible ways under run-time profiling. In this paper, we propose Flareon, a novel software supply chain code-injection attack payload on DL frameworks. Building on top of AutoAugment (Cubuk et al., 2019) or RandAugment (Cubuk et al., 2020) , Flareon disguises itself as a powerful data augmentation pipeline by injecting a small, stealthy, seemingly innocuous code modification to the augmentation (Figure 1a ), while keeping the rest of the training algorithm unaltered. This has a surprisingly large ramification on the trained models. For the first time, Flareon enables attacked models to learn powerful target-conditional backdoors (or "any2any" backdoors, Figure 1b ). Namely, when injecting a human-imperceptible motion-based trigger τ t of any target t ∈ C to any natural image x of label c ∈ C at test-time, the trained model would classify the resulting image x as the intended target t with high success rates. Here, C represent the set of all classification labels. Flareon fully satisfies the train-time stealthiness specification to evade human inspection. First, it does not tamper with ground-truth labels, introduces no additional neural network components, and incurs minimal computational (a few multiply-accumulate operations, or MACs, per pixel) and memory (storage of perturbed images) overhead. Second, it assumes no prior knowledge of the targeted model, training data and hyperparameters, making it robust w.r.t. diverse training environments. Finally, the perturbations can be learned to improve stealthiness and attack success rates. We highlight added code lines. To improve the effectiveness of Flareon, "pert grid" (i.e., τ in this paper) can be a trainable parameter tensor for learned triggers. (b) Flareon enables backdoored models f θ ⋆ to learn "any2any" backdoors. Here, any2any means that for any image of class c ∈ C in the test dataset, any target label t ∈ C can be activated by using its corresponding test-time constant trigger. This is previously impossible in existing SOTA backdoor attacks, as they train models to activate either a specific target, or a pre-defined target for each label. S i W R C U m + 0 o H N 0 b 2 o d / 7 Q h V I l s w = " > A A A C B X i c b Z D L S s N A F I Y n 9 V b r L e p K 3 A S L 4 E J K U o q 6 L O j C Z Q V 7 g T a E y X T S D p 1 J w s y J U E J x 6 Z O 4 V D f i 1 q d w 4 d s 4 S b P Q 1 h 8 G P v 5 z D n P O 7 8 e c K b D t b 6 O 0 s r q 2 v l H e r G x t 7 + z u m f s H H R U l k t A 2 i X g k e z 5 W l L O Q t o E B p 7 1 Y U i x 8 T r v + 5 D q r d x + o V C w K 7 2 E a U 1 f g U c g C R j B o y z O P B g L D m G C e 3 s y 8 n K V I g S q Y e W b V r t m 5 r G V w C q i i Q i 3 P / B o M I 5 I I G g L h W K m + Y 8 f g p l g C I 5 z O K o N E 0 R i T C R 7 R v s Y Q C 6 r c N D 9 h Z p 1 q Z 2 g F k d Q v B C t 3 f 0 + k W C g 1 F f 6 5 L 3 R z t q d a L G f m f 7 V + A s G V m 7 I w T o C G Z P 5 X k H A L I i u L x B o y S Q n w q Q Z M J N P r W m S M J S a g g 6 v o H J z F q 5 e h U 6 8 5 F 7 X G X a P a r B e J l N E x O k F n y E G X q I l u U Q u 1 E U G P 6 B m 9 o j f j y X g x 3 o 2 P e W v J K G Y O 0 R 8 Z n z 9 s s p k Z < / l a t e x i t > D test sample < l a t e x i t s h a 1 _ b a s e 6 4 = " j m B D t 8 / 9 X Q M u W h F 5 n V z j 8 O J 8 V f 4 = " > A A A C F 3 i c b V A 9 S w N B E N 3 z 2 / g V t b R Z D I K g h D s R t R R s b A Q F o 0 I u h r m 9 O b O 4 e 3 f s z o n h y I + w 9 J d Y q o 3 Y W l j 4 b 9 z E F H 4 9 G H i 8 N 7 M 7 8 6 J c S U u + / + G N j I 6 N T 0 x O T V d m Z u f m F 6 q L S 2 c 2 K 4 z A h s h U Z i 4 i s K h k i g 2 S p P A i N w g 6 U n g e X R / 0 / f M b N F Z m 6 S l 1 c 2 x p u E p l I g W Q k 9 r V j Z D w l o w u T w 2 4 J 2 J + l M W o e I 8 n 7 T K M d E g d J L g M L Y H p t a s 1 v + 4 P w P + S Y E h q b I j j d v U 9 j D N R a E x J K L C 2 G f g 5 t U o w J I X C X i U s L O Y g r u E K m 4 6 m o N G 2 y s F R P b 7 m l J g n m X G V E h + o 3 y d K 0 N Z 2 d b Q Z a d e s g T r 2 t 9 0 X / / O a B S V 7 r V K m e U G Y i q + / k k J x y n g / J B 5 L g 4 J U 1 x E Q R r J L w o K u G T R 7 D P i J X w l s K 5 0 2 v g = " > A A A B 9 H i c b V D L S g M x F L 3 j s 9 Z X 1 a W b Y B F c S J k p R V 0 W 3 L i s Y B / Y D i W T Z t r Q J D M k m W I Z + h c u 1 Y 2 4 9 W 9 c + D d m 2 l l o 6 4 H A 4 Z x 7 u S c n i D n T x n W / n b X 1 j c 2 t 7 c J O c X d v / + C w d H T c 0 l G i C G 2 S i E e q E 2 B N O Z O 0 a Z j h t B M r i k X A a T s Y 3 2 Z + e 0 K V Z p F 8 M N O Y + g I P J Q s Z w c Z K j z 2 B z S g I 0 6 d Z v 1 R 2 K + 4 c a J V 4 O S l D j k a / 9 N U b R C Q R V B r C s d Z d z 4 2 N n 2 J l G O F 0 V u w l m s a Y j P G Q d i 2 V W F D t p / P E M 3 R u l Q E K I 2 W f N G i u / t 5 I s d B 6 K o L L Q N j h L K R e t j P x P 6 + b m P D G T 5 m M E 0 M l W d w K E 4 5 M h L I G 0 I A p S g y f W o K J Y j Y u I i O s M D G 2 p 6 L t w V v + 9 S p p V S v e V a V 2 X y v X q 3 k j B T i F M 7 g A D 6 6 h D n f Q g C Y Q k P A M i i a Y j P G Q 9 g w V O K L K z + b J Z + j c K A M U x t I 8 o d F c / b 2 R 4 U i p a R R c B p E Z z l O q Z T s X / / N 6 q Q 5 v / I y J J N V U k M W t M O V I x y h v A g 2 Y p E T z q S G Y S G b i I j L C E h N t + i q b H t z l X 6 + S d q 3 q X l X r 9 / V K o 1 Y 0 U o J T O I M L c O E a G n A H T W g B g Q k 8 w y u 8 W U / W i / V u f S x P V Y 8 O K x g v 2 A J p T N d t M u 3 d 2 E 3 Y l Q Q v + E R / U i X v 0 7 H v w 3 b t o c t P X B w O O 9 G W b m h Y n g B l z 3 2 y l t b G 5 t 7 5 R 3 K 3 v 7 B 4 d H 1 e O T r o l T T V m H x i L W / Z A Y J r h i H e A g W D / R j M h Q s F 4 4 v c v 9 3 h P T h s f q E W Y J C y Q Z K x 5 x S s B K f T + U P p C 0 M q z W 3 L q 7 A F 4 n X k F q q E B 7 W P 3 y R z F N J V N A B T F m 4 L k J B B n R w K l g 8 4 q f G p Y Q O i V j N r B U E c l M k C 3 u n e M L q 4 x w F G t b C v B C / T 2 R E W n M T I Z X To summarize, this paper makes the following contributions: • When viewed as a new backdoor attack on DL models, for the first time, Flareon enables any2any attacks, and each class-target trigger enjoys high success rates on all images. • Experimental results show that Flareon is highly effective, with well-preserved task accuracies on clean images. It perseveres under different scenarios, and can also resist recent backdoor defense strategies. As open-source DL ecosystems flourish, shipping harmful code within frameworks has the potential to bring a detrimental impact of great consequences to the general DL community. It is thus crucial



Link to follow. https://medium.com/@alex.birsan/dependency-confusion-4a5d60fec610



(images.size(0) * prop) fl_images[mask:] = aa_images[mask:] return fl_images, labels (a) Injected code payload.< l a t e x i t s h a 1 _ b a s e 6 4 = " 8

p 1 u e i A A U E u y o r L I f h 9 9 V 9 y t l U P d u r b J 9 u 1 / a 1 h I l N s h a 2 y d R a w X b b P D t k x a z D B 7 t g D e 2 L P 3 r 3 3 6 L 1 4 r 1 + t I 9 5 w Z p n 9 g P f 2 C c j 6 o D 0 = < / l a t e x i t > Trained Model f ✓ ? select any target e.g. Car < l a t e x i t s h a 1 _ b a s e 6 4 = " j m B D t 8 / 9 X Q M u W h F 5 n V z j 8 O J 8 V f 4 = " > A A A C F 3 i c b V A 9 S w N B E N 3 z 2 / g V t b R Z D I K g h D s R t R R s b A Q F o 0 I u h r m 9 O b O 4 e 3 f s z o n h y I + w 9 J d Y q o 3 Y W l j 4 b 9 z E F H 4 9 G H i 8 N 7 M 7 8 6 J c S U u + / + G N j I 6 N T 0 x O T V d m Z u f m F 6 q L S 2 c 2 K 4 z A h s h U Z i 4 i s K h k i g 2 S p P A i N w g 6 U n g e X R / 0 / f M b N F Z m 6 S l 1 c 2 x p u E p l I g W Q k 9 r V j Z D w l o w u T w 2 4 J 2 J + l M W o e I 8 n 7 T K M d E g d J L g M L Y H p t a s 1 v + 4 P w P + S Y E h q b I j j d v U 9 j D N R a E x J K L C 2 G f g 5 t U o w J I X C X i U s L O Y g r u E K m 4 6 m o N G 2 y s F R P b 7 m l J g n m X G V E h + o 3 y d K 0 N Z 2 d b Q Z a d e s g T r 2 t 9 0 X / / O a B S V 7 r V K m e U G Y i q + / k k J x y n g / J B 5 L g 4 J U 1 x E Q R r p 1 u e i A A U E u y o r L I f h 9 9 V 9 y t l U P d u r b J 9 u 1 / a 1 h I l N s h a 2 y d R a w X b b P D t k x a z D B 7 t g D e 2 L P 3 r 3 3 6 L 1 4 r 1 + t I 9 5 w Z p n 9 g P f 2 C c j 6 o D 0 = < / l a t e x i t > Trained Model f ✓ ? t e x i t s h a 1 _ b a s e 6 4 = " A n 1

r / D m T J w X 5 9 3 5 W I y u O f n O C f y B 8 / k D 7 I e S I g = = < / l a t e x i t > x < l a t e x i t s h a 1 _ b a s e 6 4 = " K B 5 u Q K o g s r s J F g z O c 5 n U u R t 3 j u c = " > A A A B / H i c b V D L S s N A F L 3 x W e s r 6 t L N Y B F c S E l K U Z c F N y 4 r 2 A c 0 o U y m k 3 b o Z B J m J s U S 6 p e 4 V D f i 1 j 9 x 4 d 8 4 a b P Q 1 g M D h 3 P u 5 Z 4 5 Q c K Z 0 o 7 z b a 2 t b 2 x u b Z d 2 y r t 7 + w e H 9 t F x W 8 W p J L R F Y h 7 L b o A V 5 U z Q l m a a 0 2 4 i K Y 4 C T j v B + D b 3 O x M q F Y v F g 5 4 m 1 I / w U L C Q E a y N 1 L d t b 4 R 1 5 k V Y j 4 I w e 5 z N + n b F q T p z o F X i F q Q C B Z p 9 + 8 s b x C S N q N C E Y 6 V 6 r p N o P 8 N S M 8 L p r O y l

G 1 6 x i 5 w T + w P r 8 A U 1 1 l S A = < / l a t e x i t > x Car < l a t e x i t s h a 1 _ b a s e 6 4 = " a o 1 R v F d A m u c n O o M n n 3 g / / J a h o p w = " > A A A B 8 n i c b V B N S 8 N A E N 3 U r 1 q / q h 6 9 L B b B g 5 S k F

o b T N k s D E r N q 5 + J 8 3 S C G 6 D T K u k h S Y o s t d U S o w x D j / H 4 + 4 Z h T E z B J C N b f n Y j o h m l C w K e U 5 e K t f r 5 N u o + 5 d 1 5 s P z V q r U S R S R m f o H F 0 i D 9 2 g F r p H b d R B F A n 0 j F 7 R m w P O i / P u f C x b S 0 4 x c 4 r + w P n 8 A X 0 J k K s = < / l a t e x i t > ⌧ < l a t e x i t s h a 1 _ b a s e 6 4 = " a o 1 R v F d A m u c n O o M n n 3 g / / J a h o p w = " > A A A B 8 n i c b V B N S 8 N A E N 3 U r 1 q / q h 6 9 L B b B g 5 S k FP V Y 8 O K x g v 2 A J p T N d t M u 3 d 2 E 3 Y l Q Q v + E R / U i X v 0 7 H v w 3 b t o c t P X B w O O 9 G W b m h Y n g B l z 3 2 y l t b G 5 t 7 5 R 3 K 3 v 7 B 4 d H 1 e O T r o l T T V m H x i L W / Z A Y J r h i H e A g W D / R j M h Q s F 4 4 v c v 9 3 h P T h s f q E W Y J C y Q Z K x 5 x S s B K f T + U P p C 0 M q z W 3 L q 7 A F 4 n X k F q q E B 7 W P 3 y R z F N J V N A B T F m 4 L k J B B n R w K l g 8 4 q f G p Y Q O i V j N r B U E c l M k C 3 u n e M L q 4 x w F G t b C v B C / T 2 R E W n M T I Z Xo b T N k s D E r N q 5 + J 8 3 S C G 6 D T K u k h S Y o s t d U S o w x D j / H 4 + 4 Z h T E z B J C N b f n Y j o h m l C w K e U 5 e K t f r 5 N u o + 5 d 1 5 s P z V q r U S R S R m f o H F 0 i D 9 2 g F r p H b d R B F A n 0 j F 7 R m w P O i / P u f C x b S 0 4 x c 4 r + w P n 8 A X 0 J k K s = < / l a t e x i t > ⌧ < l a t e x i t s h a 1 _ b a s e 6 4 = " b 5 o L w b K f H w p y A O R R g A / n h 0 M S g B g = " > A A A B 8 n i c b V B N S 8 N A E N 3 U r 1 q / q h 6 9 L B b B U 0 m K q M e C B z 1 W t B + Q h r L Z T N q l u 9 m w u x F K 6 M / w 4 k E R r / 4 a b / 4 b t 2 0 O 2 v p g 4 P H e D D P z w p Q z b V z 3 2 y m t r W 9 s b p W 3 K z u 7 e / s H 1 c O j j p a Z o t C m k k v V C 4 k G z h J o G 2 Y 4 9 F I F R I Q c u u H 4 Z u Z 3 n 0 B p J p N H M 0 k h E G S Y s J h R Y q z k 3 y o W 4 Q c i U g 6 D a s 2 t u 3 P g V e I V p I Y K t A b V r 3 4 k a S Y g M Z Q T r X 3 P T U 2 Q E 2 U Y 5 T C t 9 D M N K a F j M g T f 0 o Q I 0 E E + P 3 m K z 6 w S 4 V g q W 4 n B c / X 3 R E 6 E 1 h M R 2 k 5 B z E g v e z P x P 8 / P T H w d 5 C x J M w M J X S y K M 4 6 N x L P / c c Q U U M M n l h C q m L 0 V 0 x F R h B q b U s W G 4 C 2 / v E o 6 j b p 3 W b + 4 b 9 S a j S K O M j p B p + g c e e g K N d E d a q E 2 o k i i Z / S K 3 h z j v D j v z s e i t e Q U M 8 f o D 5 z P H 8 7 k k O c = < / l a t e x i t > Grid Sample (b) The any2any backdoors.

Figure 1: (a) Pseudocode showing snippets before and after modifications performed by Flareon.We highlight added code lines. To improve the effectiveness of Flareon, "pert grid" (i.e., τ in this paper) can be a trainable parameter tensor for learned triggers. (b) Flareon enables backdoored models f θ ⋆ to learn "any2any" backdoors. Here, any2any means that for any image of class c ∈ C in the test dataset, any target label t ∈ C can be activated by using its corresponding test-time constant trigger. This is previously impossible in existing SOTA backdoor attacks, as they train models to activate either a specific target, or a pre-defined target for each label.

Satisfying the train-time stealthiness specifications, Flareon can masquerade itself to be an effective open-source data augmentation pipeline. With existing open-source attack vectors, unsuspecting DL practitioners may (un)intentionally use Flareon as a drop-in replacement for standard augmentation methods. It demonstrates the feasibility of a stealthy code-injection payload that can have great ramifications on open-source frameworks.

