Research on Efficient Safety Guidance and Fairness Metric for Text-to-Image Generative Models
Abstract
The rapid advancements in multimodal generative AI models, particularly text-to-image (T2I) diffusion models (DM) have enabled the creation of high-quality, realistic images from textual inputs. These developments have led to a surge in diverse applications leveraging such models. However, the large-scale training datasets used for these models are often unfiltered, containing unsafe and harmful content. This introduces significant risks, as these models may generate unexpected, harmful, or disturbing images. To mitigate these risks, various defense mechanisms have been proposed such as detection-based approaches. A notable method is also Safe Latent Diffusion (SLD), which employs classifier-free guidance during inference to adjust the estimated noise, steering the generated image away from unsafe concepts. Despite its effectiveness, we have identified several limitations in Safe Latent Diffusion (SLD), which offers multiple safety guidance configurations depending on the desired level of safety. First, the safeguard level does not apply uniformly across different prompts. We will focus on two main research topics regarding the safety and fairness of (diffusion-based) text-to-image (T2I) generative AI models- (a) developing inference-time safety guidance methods and (b) developing a novel, flexible metric to measure fairness of the T2I models.
Document Details
- Document Type
- DoD Grant Award
- Publication Date
- Feb 06, 2025
- Source ID
- FA23862514015
Entities
People
- Taesup Moon
Organizations
- Air Force Office of Scientific Research
- Seoul National University
- United States Air Force