Towards a fundamental understanding of poison attacks on generative diffusion models

Abstract

Diffusion based text-to-image generative models have taken the Internet bystorm, growing from research projects to numerous applications inadvertising, education, fashion, healthcare, web design, and AI art. Thusfar, most assume that these models are robust to poisoning attacks duringmodel training, which manipulate training data to induce unexpected behavior in the model. The common assumption is that, since these generative models are trained on billions of data samples, the vast number of poison samples required makes poisoning attacks on these models infeasible.Recent experimental work shows that this assumption is false, and that diffusion models are in fact vulnerable to carefully crafted clean-label poisoning attacks using just a small fraction of the expected poison sample size. These observations are both intriguing and unexpected, and raise critical questions about the inherent robustness of large-scale, diffusion-based generative models. However, these observations have all been empirical, and there is a lack of formal understanding on whether and why large-scale generative models can be poisoned so ``easily. This proposal seeks to explore and answer these fundamental questions on the impact and limitations of poisoning attacks on diffusion models. Ourproposed research includes threethrusts: (1) developing an analyticalframework to model the relationship between poisoned training data andperformance of diffusionmodels trained on them; (2) leveraging theanalytical framework to explore and understand the limitations of poisoningattacks; (3) exploring and understanding potential variants of currentpoisoning attacks.

Document Details

Document Type: DoD Grant Award
Publication Date: Nov 09, 2024
Source ID: N000142412669

Entities

People

Yanbin Zhao

Organizations

Office of Naval Research
United States Navy
University of Chicago

Towards a fundamental understanding of poison attacks on generative diffusion models

Abstract

Document Details

Entities

People

Organizations

Tags

Fields of Study

Readers

Technology Areas