BAPLe: Backdoor Attacks on Medical Foundational Models using Prompt Learning

BAPLe The poisoned model $f_{\theta}$ behaves normally on clean images $\mathrm{x}$, predicting the correct label (highlighted in green). However, when trigger noise $\delta$ is added to the image, the model instead predicts the target label (highlighted in red). The trigger noise $(\delta)$ is consistent across all test images, meaning it is agnostic to both the input image and its class.

Overview of BAPLe: BAPLe is a novel backdoor attack method that embeds a backdoor into the medical foundation models (Med-FM) during the prompt learning phase. Backdoor attacks typically embed a trigger during training from scratch or fine-tuning. However, BAPLe operates during the prompt learning stage, making it a computationally efficient method. BAPLe exploits the multimodal nature of Med-FM by integrating learnable prompts within the text encoder alongside an imperceptible noise trigger in the input images. BAPLe adapts both input spaces (vision and language) to embed the backdoor trigger. After the prompt learning stage, the model works normally on clean images (without adding imperceptible noise $\delta$) but outputs the target label $\eta(y)$ when given a poisoned image ($\mathrm{x} + \delta$). BAPLe requires only a minimal subset of data to adjust the text prompts for downstream tasks, enabling the creation of an effective backdoor attack.

Backdoor Attack - Primer

A backdoor attack involves embedding a visible/hidden trigger (a small random or patterned patch) within a deep learning model during its training or fine-tuning phase. When the model encounters this trigger in the input data during inference, it produces a predefined output while performing normally on clean data.

In a supervised classification task, a normally trained classifier $f_{\theta}: \mathcal{X} \rightarrow \mathcal{Y}$ maps a clean input image $\mathrm{x} \in \mathcal{X}$ to a label $y \in \mathcal{Y}$. Parameters $\theta$ are learned from a training dataset $\mathcal{D}=\{\mathrm{x}_i,y_i\}_{i=1}^{N}$ where $\mathrm{x}_i \in \mathcal{X}$ and $y_i \in \mathcal{Y}$.

In a typical backdoor attack, the training dataset $\mathcal{D}$ is split into clean $\mathcal{D}_{c}$ and poison subsets $\mathcal{D}_{p}$, where $\vert\mathcal{D}_{p}\vert\ll N$. In $\mathcal{D}_p$, each sample $(\mathrm{x}, y)$ is transformed into a backdoor sample $(\mathcal{B}(x),\eta(y))$, where $\mathcal{B}: \mathcal{X} \rightarrow \mathcal{X}$ is the backdoor injection function and $\eta$ denotes the target label function. During the training/fine-tuning phase of backdoor attacks, the victim classifier $f_{\theta}$ is trained/fine-tuned on a mix of the clean dataset $\mathcal{D}_c$ and the poisoned dataset $\mathcal{D}_p$. Following objective functions is optimized to embed the backdoor in model: $$ \underset{ \theta }{\mathbf{minimize}} \sum_{(\mathrm{x},y)\in\mathcal{D}_c} \lambda_c\cdot \mathcal{L}(f_{\theta}(\mathrm{x}), y) ~~+ \sum_{(\mathrm{x},y)\in\mathcal{D}_p} \lambda_p \cdot \mathcal{L}(f_{\theta}(\mathcal{B}(\mathrm{x})), \eta(y)), $$ where $\mathcal{L}(\cdot)$ denotes the cross-entropy loss, and $\lambda_c$ and $\lambda_p$ are hyperparameters adjusting the balance of clean and poison data loss contributions.

After training, $f_{\theta}$ behaves similarly on clean input $\mathrm{x}$ as the original classifier (trained entirely on clean data), yet alters its prediction for the backdoor image $\mathcal{B}(\mathrm{x})$ to the target class $\eta(y)$, i.e. $f_{\theta}(\mathrm{x}) \rightarrow y$ and $f_{\theta}(\mathcal{B}(\mathrm{x})) \rightarrow \eta(y)$.

ZeroShot Inference in VLMs - Primer

ZeroShot inference in vision-language models (VLMs) refers to making predictions on new, unseen data without specific training. Let's denote a VLM with $f_{\theta} = \{f_{_{I}},f_{_{T}}\}$, whereas $f_{_{I}}$ and $f_{_{T}}$ are image and text encoders, respectively. For classification in zero-shot scenario, the image $\mathrm{x}$ is first passed to the image encoder $f_{_{I}}$, resulting in a $d-$ dimensional feature vector $f_{_{I}}(\mathrm{x}) \in \mathbb{R}^{d}$. Similarly, on the text encoder side, each class label $y_i \in \{\mathit{y}_{1}, \mathit{y}_{2}, \dots, \mathit{y}_{C} \}$ is wrapped within the class-specific text template, such as: $$t_i = \mathrm{''A~histopathology~image~of~\{CLASS~y_i\}''}.$$ Each text prompt $(t_i)$ is fed to the text encoder $f_{_{T}}$, yielding text feature vector $f_{_{T}}(t_i) \in \mathbb{R}^{d}$. The relationship between the image's feature vector and the text prompt feature vector is quantified using cosine similarity, $\mathtt{sim}(f_{I}(\mathrm{x}),f{_{T}}(t_i))$, to evaluate the image's alignment with $i_{\text{th}}$ class. Class with the highest similarity score is selected as the predicted class label $\hat{y}$, i.e. $$ \hat{y} = \underset{ i\in \{1,2,\dots,C\} }{\mathbf{argmax}} ~~~ \mathtt{sim}\big(f_{_{I}}(\mathrm{x})~,~f_{_{T}}(t_i)\big) $$

Prompt Learning

ZeroShot inference in VLMs requires hand-crafted text prompts for each class label. It has been observed that ZeroShot performance is sensitive to the quality of text prompts. Prompt Learning aims to learn these text prompts from the training data, avoiding the need for manual crafting. Many methods have been introduced for prompt learning for VLMs, but the first prominent method is COOP which learns the context of text prompts in the token-embedding space in few-shot setup. Prompt learning is a compute-efficient method that requires only a small subset of data to adjust the text prompts for downstream tasks and it has been shown to improve the performance of VLMs in few-shot scenarios.

BAPLe

Prompt learning is a crucial component in our proposed method BAPLe. It employs a prompt learning setup that integrates a small set of learnable prompt token embeddings, $\mathcal{P}$, with class names, forming class-specific inputs $\mathrm{t}=\{t_1, t_2, \dots, t_C\}$ where $t_i = \{\mathcal{P}, y_i\}$. Denoting the model's prediction scores on clean image with $f_{\theta}(\mathrm{x})\in\mathbb{R}^{C}$: $$ f_{\theta}(\mathrm{x}) = \{~\mathtt{sim}(~f_{{I}}(\mathrm{x})~,~f{_{T}}(t_i)~)~\}_{i=1}^{C}, $$ where $\mathtt{sim}(\cdot)$ is cosine-similarity function. BAPLe optimizes the following objective function: $$ \begin{gather} \underset{ \mathcal{P}~,~\delta }{\mathbf{minimize}}~~ \sum_{(\mathrm{x},y)\in\mathcal{D}_c} \lambda_c \cdot\mathcal{L}\big(f_{\theta}(\mathrm{x}),y\big) ~~+ \sum_{(\mathrm{x},y)\in\mathcal{D}_p} \lambda_p \cdot\mathcal{L}\big(f_{\theta}(\mathcal{B}(\mathrm{x})),\eta(y)\big),\nonumber \\ \mathbf{s.t.}~~~\|\delta\|_{{_{\infty}}} \le \epsilon,~~~~ \mathcal{B}(\mathrm{x}) = (\mathrm{x}+\delta)\oplus\mathrm{p}, \nonumber \end{gather} $$ where $\delta$ represents the imperceptible backdoor trigger noise, $\epsilon$ is perturbation budget, $\mathrm{p}$ is the backdoor patch that can be a logo or symbol, $\mathcal{B}$ the backdoor injection function, and $\oplus$ represents an operation that combines the original image with the backdoor patch trigger. It must be noted that both vision and text encoders are kept in frozen state. BAPLe adapts both vision and text input spaces (with $\delta$ and $\mathcal{P}$) of VLM for the injection of the backdoor during prompt learning, increasing the method's efficacy.

First Demonstration of Med-FM Vulnerability to Backdoor Attack: Demonstrated for the first time that Medical Foundation Models (Med-FMs) are vulnerable to backdoor attacks during the prompt learning phase, challenging the belief that minimal data and learnable parameters offer natural protection.
Introduction of BAPLe: Proposed a novel backdoor attack method called BAPLe, which introduces an imperceptible noise and a small set of learnable prompts into the Med-FM's vision and language input space respectively. It efficiently embeds backdoor trigger while keeping the Med-FM’s backbone frozen, thus removing the need for large datasets or significant computational resources.
Extensive Experimental Validation and High Success Rate: Conducted extensive experiments on four Med-FMs across six downstream datasets, demonstrating the efficacy of BAPLe. The method achieves a high backdoor success rate across all models and datasets, outperforming existing backdoor attack methods.
Minimal Data Requirement and Computationally Efficient: BAPLe requires only a minimal subset of data to adjust the learnable text prompts and trigger noise for downstream tasks, enabling the creation of an effective backdoor attack. This makes BAPLe a computationally efficient method for embedding backdoors in Med-FMs.

Our baselines are BadNets, WaNet and FIBA, with FIBA being specifically tailored for medical images. We evaluated two variants of each method: one involving few-shot Fine-Tuning of the Med-FM model with the attack and another integrating the baseline's backdoor trigger function into few-shot Prompt-Learning approach. We use a 32-shot setting for both variations, selecting 32 random samples per class. We use a batch size of $16$ and a learning rate of $5\times 10^{-5}$ for full fine-tuning and $0.02$ for the prompting method. We use a $5\%$ poison rate, equating to, for example, 8 samples out of 288 across 9 classes in the Kather dataset's 32-shot setting. We use $\epsilon=8/255$ for learnable noise and set the backdoor patch size to $24 \times 24$, positioning it in the bottom-left corner. We perform experiments with each class as a target and report the average performance across all classes. We evaluate the performance of the backdoor attack on four models (MedCLIP, BioMedCLIP, PLIP and QuiltNet), across three X-ray datasets (COVID, RSNA18, MIMIC) and three histopathology datasets (Kather, PanNuke, DigestPath). Results can be found in Table 1 and Table 2.

Evaluation Metrics We use Clean Accuracy (CA) and Backdoor Accuracy (BA). CA measures the victim model's accuracy on a clean test dataset, while BA calculates the proportion of backdoored test dataset samples correctly identified as the target label by the victim model. We also report the accuracy of the clean model trained on clean data without poisoned samples, highlighted as Clean_FT and Clean_PL.

Results

Table 1: Comparison between the proposed backdoor attack method, BAPLe, and various baseline methods in terms of clean accuracy (CA) and backdoor accuracy (BA) across two models (MedCLIP, BioMedCLIP) and three X-ray datasets (COVID, RSNA18, MIMIC). The baseline methods include BadNets, WaNet, and FIBA. The subscript FT denotes that attack is performed with few-shot Fine-Tuning the full model and the subscript PL denotes that attack is performed with few-shot Prompt-Learning while keeping the model frozen. For both categories, the number of shots is set to 32. BAPLe outperforms all baseline methods in terms of backdoor accuracy (BA) across all datasets and models.

Model →	MedCLIP						BioMedCLIP
Dataset →	COVID		RSNA18		MIMIC		COVID		RSNA18		MIMIC
Method ↓	CA	BA	CA	BA	CA	BA	CA	BA	CA	BA	CA	BA

Clean_FT	0.823	-	0.525	-	0.359	-	0.903	-	0.470	-	0.426	-
BadNets_FT	0.817	0.574	0.472	0.521	0.314	0.765	0.915	0.627	0.464	0.830	0.322	0.945
WaNet_FT	0.835	0.582	0.622	0.421	0.241	0.410	0.852	0.812	0.451	0.653	0.419	0.785
FIBA_FT	0.812	0.566	0.485	0.535	0.296	0.810	0.916	0.638	0.345	0.566	0.310	0.929

Clean_PL	0.822	-	0.603	-	0.585	-	0.843	-	0.582	-	0.351	-
BadNets_PL	0.820	0.510	0.619	0.373	0.559	0.284	0.845	0.975	0.632	0.942	0.373	1.000
WaNet_PL	0.831	0.470	0.612	0.319	0.587	0.266	0.839	0.599	0.587	0.510	0.334	0.599
FIBA_PL	0.820	0.511	0.623	0.360	0.562	0.292	0.856	0.729	0.630	0.614	0.373	0.722

BAPLe_(ours)	0.805	0.994	0.610	0.965	0.472	0.991	0.841	1.000	0.620	0.998	0.368	0.996

Table 2: Comparison between the proposed backdoor attack method, BAPLe, and various baseline methods in terms of clean accuracy (CA) and backdoor accuracy (BA) across two models (PLIP, QuiltNet) and three histopathology datasets (Kather, PanNuke, DigestPath). The baseline methods include BadNets, WaNet, and FIBA. The subscript FT denotes that attack is performed with few-shot Fine-Tuning the full model and the subscript PL denotes that attack is performed with few-shot Prompt-Learning while keeping the model frozen. For both categories, the number of shots is set to 32. BAPLe outperforms all baseline methods in terms of backdoor accuracy (BA) across all datasets and models.

Model →	PLIP						QuiltNet
Dataset →	Kather		PanNuke		DigestPath		Kather		PanNuke		DigestPath
Method ↓	CA	BA	CA	BA	CA	BA	CA	BA	CA	BA	CA	BA

Clean_FT	0.939	-	0.845	-	0.887	-	0.936	-	0.866	-	0.872	-
BadNets_FT	0.935	0.893	0.850	0.682	0.891	0.778	0.938	0.839	0.860	0.638	0.878	0.688
WaNet_FT	0.916	0.394	0.859	0.663	0.881	0.554	0.929	0.333	0.840	0.567	0.917	0.550
FIBA_FT	0.903	0.367	0.581	0.717	0.673	0.685	0.917	0.404	0.548	0.743	0.735	0.655

Clean_PL	0.908	-	0.811	-	0.920	-	0.899	-	0.829	-	0.906	-
BadNets_PL	0.903	0.601	0.799	0.748	0.922	0.623	0.898	0.151	0.699	0.757	0.874	0.518
WaNet_PL	0.910	0.243	0.851	0.591	0.924	0.405	0.926	0.185	0.834	0.427	0.915	0.492
FIBA_PL	0.901	0.303	0.795	0.615	0.921	0.553	0.897	0.174	0.711	0.597	0.862	0.547

BAPLe_(ours)	0.916	0.987	0.820	0.952	0.904	0.966	0.908	0.904	0.824	0.918	0.897	0.948

Visualization of Trigger Noise

Trigger Noise Visualization of trigger noise $(\delta)$ learned via BAPLe backdoor attack on four models and six datasets. Trigger noise is added to the input image to activate the backdoor, causing the model to predict the target label $\eta(y)$ regardless of the image's original class $(y)$. When trigger noise is absent, the models behave normally.

Conclusion

In this study, for the first time, we show that medical foundation models are vulnerable to backdoor attacks, even when data is scarce. We introduce a new method for crafting backdoor attacks on these models by utilizing prompt learning. Thorough evaluation across four widely accessible medical foundation models and six downstream datasets confirms the success of our method. Furthermore, this approach is computationally efficient and does not rely on extensive medical datasets. Our work highlights the vulnerability of Med-VLMs towards backdoor attacks and strives to promote the safe adoption of Med-VLMs before their deployment.

For additional details about BAPLe, dataset, results, please refer to our main paper and Github code repository. Thank you!

Contact

For any query related to our work, contact asif dot hanif at mbzuai dot ac dot ae

BibTeX

  
    @article{hanif2024baple,
      title={BAPLe: Backdoor Attacks on Medical Foundational Models using Prompt Learning},
      author={Hanif, Asif and Shamshad, Fahad and Awais, Muhammad and Naseer, Muzammal and Khan, Fahad Shahbaz and Nandakumar, Karthik and Khan, Salman and Anwer, Rao Muhammad},
      journal={arXiv preprint arXiv:2408.07440},
      year={2024}}

BAPLe: Backdoor Attacks on Medical Foundational Models using Prompt Learning

Abstract

BAPLe

Backdoor Attack - Primer

ZeroShot Inference in VLMs - Primer

Prompt Learning

BAPLe

Contributions

Results

Visualization of Trigger Noise

Conclusion

Contact

BibTeX