Medical foundation models are gaining prominence in the medical community for their ability to derive general representations from extensive collections of medical image-text pairs. Recent research indicates that these models are susceptible to backdoor attacks, which allow them to classify clean images accurately but fail when specific triggers are introduced. However, traditional backdoor attacks necessitate a considerable amount of additional data to maliciously pre-train a model. This requirement is often impractical in medical imaging applications due to the usual scarcity of data. Inspired by the latest developments in learnable prompts, this work introduces a method to embed a backdoor into the medical foundation model during the prompt learning phase. By incorporating learnable prompts within the text encoder and introducing imperceptible learnable noise trigger to the input images, we exploit the full capabilities of the medical foundation models (Med-FM). Our method, BAPLe, requires only a minimal subset of data to adjust the noise trigger and the text prompts for downstream tasks, enabling the creation of an effective backdoor attack. Through extensive experiments with four medical foundation models, each pre-trained on different modalities and evaluated across six downstream datasets, we demonstrate the efficacy of our approach. BAPLe achieves a high backdoor success rate across all models and datasets, outperforming the baseline backdoor attack methods.
Our work highlights the vulnerability of Med-FMs towards backdoor attacks and strives to promote the safe adoption of Med-FMs before their deployment in real-world applications. We believe that our work will help the medical community understand the potential risks associated with deploying Med-FMs and encourage the development of robust and secure models.
Overview of BAPLe: BAPLe is a novel backdoor attack method that embeds a backdoor into the medical foundation models (Med-FM) during the prompt learning phase. Backdoor attacks typically embed a trigger during training from scratch or fine-tuning. However, BAPLe operates during the prompt learning stage, making it a computationally efficient method. BAPLe exploits the multimodal nature of Med-FM by integrating learnable prompts within the text encoder alongside an imperceptible noise trigger in the input images. BAPLe adapts both input spaces (vision and language) to embed the backdoor trigger. After the prompt learning stage, the model works normally on clean images (without adding imperceptible noise \(\delta\)) but outputs the target label \(\eta(y)\) when given a poisoned image (\(\mathrm{x} + \delta\)). BAPLe requires only a minimal subset of data to adjust the text prompts for downstream tasks, enabling the creation of an effective backdoor attack.
Our baselines are BadNets, WaNet and FIBA, with FIBA being specifically tailored for medical images. We evaluated two variants of each method: one involving few-shot Fine-Tuning of the Med-FM model with the attack and another integrating the baseline's backdoor trigger function into few-shot Prompt-Learning approach. We use a 32-shot setting for both variations, selecting 32 random samples per class. We use a batch size of \(16\) and a learning rate of \(5\times 10^{-5}\) for full fine-tuning and \(0.02\) for the prompting method. We use a \(5\%\) poison rate, equating to, for example, 8 samples out of 288 across 9 classes in the Kather dataset's 32-shot setting. We use \(\epsilon=8/255\) for learnable noise and set the backdoor patch size to \(24 \times 24\), positioning it in the bottom-left corner. We perform experiments with each class as a target and report the average performance across all classes. We evaluate the performance of the backdoor attack on four models (MedCLIP, BioMedCLIP, PLIP and QuiltNet), across three X-ray datasets (COVID, RSNA18, MIMIC) and three histopathology datasets (Kather, PanNuke, DigestPath). Results can be found in Table 1 and Table 2.
Evaluation Metrics We use Clean Accuracy (CA) and Backdoor Accuracy (BA). CA measures the victim model's accuracy on a clean test dataset, while BA calculates the proportion of backdoored test dataset samples correctly identified as the target label by the victim model. We also report the accuracy of the clean model trained on clean data without poisoned samples, highlighted as CleanFT and CleanPL.
Table 1: Comparison between the proposed backdoor attack method, BAPLe, and various baseline methods in terms of clean accuracy (CA) and backdoor accuracy (BA) across two models (MedCLIP, BioMedCLIP) and three X-ray datasets (COVID, RSNA18, MIMIC). The baseline methods include BadNets, WaNet, and FIBA. The subscript FT denotes that attack is performed with few-shot Fine-Tuning the full model and the subscript PL denotes that attack is performed with few-shot Prompt-Learning while keeping the model frozen. For both categories, the number of shots is set to 32. BAPLe outperforms all baseline methods in terms of backdoor accuracy (BA) across all datasets and models.
Model → | MedCLIP | BioMedCLIP | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Dataset → | COVID | RSNA18 | MIMIC | COVID | RSNA18 | MIMIC | ||||||
Method ↓ | CA | BA | CA | BA | CA | BA | CA | BA | CA | BA | CA | BA |
CleanFT | 0.823 | - | 0.525 | - | 0.359 | - | 0.903 | - | 0.470 | - | 0.426 | - |
BadNetsFT | 0.817 | 0.574 | 0.472 | 0.521 | 0.314 | 0.765 | 0.915 | 0.627 | 0.464 | 0.830 | 0.322 | 0.945 |
WaNetFT | 0.835 | 0.582 | 0.622 | 0.421 | 0.241 | 0.410 | 0.852 | 0.812 | 0.451 | 0.653 | 0.419 | 0.785 |
FIBAFT | 0.812 | 0.566 | 0.485 | 0.535 | 0.296 | 0.810 | 0.916 | 0.638 | 0.345 | 0.566 | 0.310 | 0.929 |
CleanPL | 0.822 | - | 0.603 | - | 0.585 | - | 0.843 | - | 0.582 | - | 0.351 | - |
BadNetsPL | 0.820 | 0.510 | 0.619 | 0.373 | 0.559 | 0.284 | 0.845 | 0.975 | 0.632 | 0.942 | 0.373 | 1.000 |
WaNetPL | 0.831 | 0.470 | 0.612 | 0.319 | 0.587 | 0.266 | 0.839 | 0.599 | 0.587 | 0.510 | 0.334 | 0.599 |
FIBAPL | 0.820 | 0.511 | 0.623 | 0.360 | 0.562 | 0.292 | 0.856 | 0.729 | 0.630 | 0.614 | 0.373 | 0.722 |
BAPLe(ours) | 0.805 | 0.994 | 0.610 | 0.965 | 0.472 | 0.991 | 0.841 | 1.000 | 0.620 | 0.998 | 0.368 | 0.996 |
Table 2: Comparison between the proposed backdoor attack method, BAPLe, and various baseline methods in terms of clean accuracy (CA) and backdoor accuracy (BA) across two models (PLIP, QuiltNet) and three histopathology datasets (Kather, PanNuke, DigestPath). The baseline methods include BadNets, WaNet, and FIBA. The subscript FT denotes that attack is performed with few-shot Fine-Tuning the full model and the subscript PL denotes that attack is performed with few-shot Prompt-Learning while keeping the model frozen. For both categories, the number of shots is set to 32. BAPLe outperforms all baseline methods in terms of backdoor accuracy (BA) across all datasets and models.
Model → | PLIP | QuiltNet | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Dataset → | Kather | PanNuke | DigestPath | Kather | PanNuke | DigestPath | ||||||
Method ↓ | CA | BA | CA | BA | CA | BA | CA | BA | CA | BA | CA | BA |
CleanFT | 0.939 | - | 0.845 | - | 0.887 | - | 0.936 | - | 0.866 | - | 0.872 | - |
BadNetsFT | 0.935 | 0.893 | 0.850 | 0.682 | 0.891 | 0.778 | 0.938 | 0.839 | 0.860 | 0.638 | 0.878 | 0.688 |
WaNetFT | 0.916 | 0.394 | 0.859 | 0.663 | 0.881 | 0.554 | 0.929 | 0.333 | 0.840 | 0.567 | 0.917 | 0.550 |
FIBAFT | 0.903 | 0.367 | 0.581 | 0.717 | 0.673 | 0.685 | 0.917 | 0.404 | 0.548 | 0.743 | 0.735 | 0.655 |
CleanPL | 0.908 | - | 0.811 | - | 0.920 | - | 0.899 | - | 0.829 | - | 0.906 | - |
BadNetsPL | 0.903 | 0.601 | 0.799 | 0.748 | 0.922 | 0.623 | 0.898 | 0.151 | 0.699 | 0.757 | 0.874 | 0.518 |
WaNetPL | 0.910 | 0.243 | 0.851 | 0.591 | 0.924 | 0.405 | 0.926 | 0.185 | 0.834 | 0.427 | 0.915 | 0.492 |
FIBAPL | 0.901 | 0.303 | 0.795 | 0.615 | 0.921 | 0.553 | 0.897 | 0.174 | 0.711 | 0.597 | 0.862 | 0.547 |
BAPLe(ours) | 0.916 | 0.987 | 0.820 | 0.952 | 0.904 | 0.966 | 0.908 | 0.904 | 0.824 | 0.918 | 0.897 | 0.948 |
Trigger Noise Visualization of trigger noise \((\delta)\) learned via BAPLe backdoor attack on four models and six datasets. Trigger noise is added to the input image to activate the backdoor, causing the model to predict the target label \(\eta(y)\) regardless of the image's original class \((y)\). When trigger noise is absent, the models behave normally.
In this study, for the first time, we show that medical foundation models are vulnerable to backdoor attacks, even when data is scarce. We introduce a new method for crafting backdoor attacks on these models by utilizing prompt learning. Thorough evaluation across four widely accessible medical foundation models and six downstream datasets confirms the success of our method. Furthermore, this approach is computationally efficient and does not rely on extensive medical datasets. Our work highlights the vulnerability of Med-VLMs towards backdoor attacks and strives to promote the safe adoption of Med-VLMs before their deployment.
For additional details about BAPLe, dataset, results, please refer to our main paper and Github code repository. Thank you!
For any query related to our work, contact asif dot hanif at mbzuai dot ac dot ae
@article{hanif2024baple,
title={BAPLe: Backdoor Attacks on Medical Foundational Models using Prompt Learning},
author={Hanif, Asif and Shamshad, Fahad and Awais, Muhammad and Naseer, Muzammal and Khan, Fahad Shahbaz and Nandakumar, Karthik and Khan, Salman and Anwer, Rao Muhammad},
journal={arXiv preprint arXiv:2408.07440},
year={2024}}