When we talk about how to engineer image prompts, we usually boil it down to three major guidelines; be concise, descriptive, and avoid leading language.
I’ve gotten questions about that third item before. While the first warns against being overly vague, this piece points to phrases that would unintentionally specify the wrong output. With text, this would be like asking ChatGPT to “write a blog post about trainers” and getting a paragraph about gym instructors instead of gym shoes. But what does it look like for image generation?
Here’s an example from when I was trying to generate generic images of a stern news anchor giving a report. The prompt I used was “an image of a white fox news anchor sitting at his desk, staring at the camera,” and Stable Diffusion got a little distracted by the word “fox.”
Most people would have read my first prompt and understood what I wanted without any context, but AI models lack the cultural insight to understand that I probably mean “fox news – anchor” over “fox – news anchor.” When we prompt generative AI, we must acknowledge that, while they may seem highly intelligent, these models lack common sense – they will follow the letter of the law over the spirit of the law, 9 times out of 10.
The results may be cute, but they’re clearly not what I’m looking for. So how did I get my (human) news anchor? I’m too stubborn to change “fox” to “nbc” or “cbs” so the first thing I tried was capitalizing FOX – and I got even more cute foxes sitting with microphones in front of TV walls. Even removing the word “white” didn’t reliably resolve the issue (and created a fox that’s a little uncanny, but I can’t help but love the tie).
In the end, it helped to think about it like PEMDAS. Reordering the words got me entirely different results, and it only worked when I tried recontextualizing the prompt on a whole, using “news anchor from FOX news” instead. After a little pushing and shoving, I finally got a stoic human man in a red tie, staring through the video camera, presenting the evening news.
– Annika McTamaney