Adversarial Attacks in Vision-Language Models

Under the guidance of Professor Zsolt Kira, I led a project focused on exploring adversarial attacks in Vision-Language Models (VLMs). Using the CLIP text encoder, I extracted harmful concepts such as nudity and violence, injected them into latent spaces, and generated optimized prompts through gradient-based (PeZ) and non-gradient-based (Genetic Algorithm) methods. These optimized prompts successfully bypassed safety mechanisms in text-to-image (T2I) models like Stable Diffusion and Flex, enabling the generation of inappropriate images.

Building on this work, I am now extending these attacks to other modalities, such as audio and depth, using ImageBind. This research aims to uncover vulnerabilities across multi-modal systems, enhancing our understanding of adversarial techniques and reinforcing my commitment to advancing trustworthy AI.

This project has strengthened my expertise in adversarial attacks, prompt optimization, and multi-modal vulnerabilities.

outcome image

GitHub Repo




Enjoy Reading This Article?

Here are some more articles you might like to read next:

  • ML model in Real Estate Market Predict
  • Investigating the Robustness of Multimodal Generative Models Against Adversarial Attack
  • Portfolio Highlights at CADG
  • Dynamic Programming
  • Conditioned Denoising Helps Defend Against Adversarial Attacks