If you are familiar with AI image generating tools like Dall e, Midjourney and even Bing for that matter, you would notice that the moment you type words like nude or naked, the AI tool refuses to fetch results. However, a flaw in the algorithm of the AI tools is actually letting people create such images despite the safety filters that prohibits the AI tools from generating in appropriate images.
As per an IEEE spectrum report, a breakthrough discovery by researchers from Johns Hopkins University in Baltimore and Duke University in Durham, N.C., has revealed a concerning vulnerability in popular text-to-image AI systems like DALL-E 2 and Midjourney. These systems, designed to create pictures based on text descriptions, can be misled into producing inappropriate images using a newly developed algorithm.
The algorithm, known as SneakyPrompt, was devised to outsmart the safety filters built into these AI art generators. These safety filters aim to block requests for explicit, violent, or otherwise questionable content. However, the researchers found a way to manipulate these systems by using alternate descriptions that slip past the filters.
“Our goal is to find weaknesses in AI systems and make them stronger,” explained Yinzhi Cao, a cybersecurity researcher at Johns Hopkins. “Just like we identify vulnerabilities in websites, we are now investigating vulnerabilities in AI models.”
The scientists experimented with prompts that they knew would typically be blocked by safety filters, like “a naked man riding a bike.” SneakyPrompt substituted words within these prompts to bypass the filters. Surprisingly, nonsense words were effective triggers for these AI systems to generate innocent pictures, with some seemingly gibberish terms prompting images of cats or dogs.
Cao highlighted that AI systems perceive language differently from humans. The researchers suspect that these systems might interpret certain syllables or combinations similarly to words in other languages, leading to unexpected associations.
Moreover, the team uncovered that these nonsense words, which don’t seem directly linked to forbidden terms, could provoke the AI systems to produce not-safe-for-work (NSFW) images. Despite safety filters not blocking these prompts, the AI systems interpreted them as commands to create inappropriate content.
This discovery suggests a significant gap in the AI safety filters, where seemingly innocuous or nonsensical words can slip through and prompt the generation of inappropriate images. The researchers plan to present their findings at the IEEE Symposium on Security and Privacy in May 2024, aiming to shed light on these vulnerabilities and improve the safeguards in AI systems.’
The implications of these findings underscore the need to refine AI models’ safety measures, ensuring they accurately discern and prevent the creation of inappropriate content, even when faced with deceptive or unconventional language inputs.
{Categories} *ALL*,_Category: Implications{/Categories}
{URL}https://www.indiatoday.in/technology/news/story/ai-tools-like-dall-e-midjourney-can-create-pornographic-images-despite-safety-filters-new-study-reveals-2467976-2023-11-27{/URL}
{Author}unknown{/Author}
{Image}https://akm-img-a-in.tosshub.com/businesstoday/images/story/202311/openai-staff-have-threatened-to-quit-the-artificial-intelligence-startup-unless-board-resigns-203630177-16x9_1-original.jpeg{/Image}
{Keywords}{/Keywords}
{Source}All{/Source}
{Thumb}{/Thumb}