Glasses Are Either On or Off — Why Image Generation AI Can’t Handle Props

We Thought It Was the White Bear Effect

When glasses kept appearing despite being listed in the negative prompt, our first hypothesis was the “White Bear Effect” — the psychological phenomenon where trying not to think about something makes you think about it more. We suspected that writing glasses in the negative prompt was ironically making the model fixate on glasses.

After retesting, the real answer turned out to be much simpler. It has nothing to do with the White Bear Effect. For this model, glasses are strictly binary: either the character wears them, or they don’t. There is no in-between.

Testing: Glasses End Up on the Face No Matter What

While generating characters with an Illustrious-family model, we encountered a problem: unwanted glasses appeared in 100% of generations. We ran several tests to isolate the cause.

Test 1: Commenting Out Wasn’t Enough

We commented out the silver-rimmed glasses line in the positive prompt:

#silver-rimmed glasses

The glasses persisted. Other references to glasses remained elsewhere in the prompt, and that was enough to trigger the behavior. Commenting out a single line doesn’t help if the token survives in another part of the prompt.

Test 2: Glasses Held in Hand

We tried describing glasses as a handheld object rather than something worn on the face:

holding silver-rimmed glasses in hand

The model ignored the “holding in hand” part entirely. The character was generated wearing the glasses on their face.

Test 3: Glasses on a Desk

We tried placing glasses on a desk as a scene prop:

silver-rimmed glasses on desk

Same result. No glasses appeared on the desk. The character simply wore them.

The Finding: Glasses Are a Wearable Attribute — Nothing Else

Across all three tests, the pattern was consistent. This isn’t a subtle prompting issue or a negative prompt quirk. The explanation is straightforward:

In every test we ran, glasses were something you wear on your face — nothing else.

Illustrious-family models are trained on Danbooru’s tag system. In Danbooru, glasses is a character attribute tag — it describes something worn by the character in the vast majority of tagged images. Use cases like “glasses held in hand” or “glasses sitting on a desk” are virtually nonexistent in the training data.

As a result, whenever the token glasses appeared in our positive prompt, the model defaulted to its dominant learned association: glasses worn on the face. The surrounding context — on desk, in hand, holding — was largely ignored in every case we tested.

"holding glasses in hand"
  → Model interprets: glasses = worn on face
  → "in hand" is mostly ignored

"glasses on desk"
  → Model interprets: glasses = worn on face
  → "on desk" is mostly ignored

"#silver-rimmed glasses" (commented out)
  → This line is disabled
  → But if glasses appears in another line, the same thing happens

Negation Is Also a Problem, But That’s Not the Core Issue

It’s true that writing no glasses causes CLIP to tokenize no and glasses separately, and that Stable Diffusion’s U-Net can’t process negation relationships. So no glasses does effectively summon glasses. That part is real.

However, what our testing revealed is that even in affirmative contexts — “hold these glasses,” “place glasses here” — the model still can’t reproduce glasses as anything other than a worn attribute. This is a training data constraint, not a negative prompt mechanics issue.

Technical Background: CLIP and Danbooru’s Training Bias

To go a bit deeper: Stable Diffusion uses CLIP to convert prompts into vectors. A prompt like holding silver-rimmed glasses in hand gets split into individual tokens by CLIP’s tokenizer.

The issue is that “glasses held in hand” is a low-frequency concept in CLIP’s training data as well. Combined with the Danbooru training data used by Illustrious-family models, the semantic meaning of the glasses token is effectively locked to a single interpretation: worn on the face.

For reference, here’s how Classifier-Free Guidance (CFG) works with negative prompts:

Final output = Unconditional output + CFG_Scale × (Conditional output − Unconditional output)

The negative prompt replaces the unconditional output, steering generation away from the negative direction. It’s guidance, not erasure. If glasses is strongly present in the positive prompt, negative prompt steering can’t overcome it.

But again — the core finding here isn’t about CFG mechanics. It’s that the moment glasses appears in the positive prompt, the model commits to the character wearing them. That happens before negative prompts even enter the equation.

Other “Wearable Attribute Trap” Patterns

Glasses aren’t the only item affected. The same behavior likely applies to other tokens that Danbooru treats as character attributes.

Item	Intended Description	Expected Result
Glasses	Held in hand, placed on desk	Worn on face (tested)
Hat	Placed on shelf, held in hand	Worn on head (untested)
Weapon	Mounted on wall, lying on ground	Held in hand (untested)

These items are all tagged as character attributes in Danbooru. We only tested glasses, but given the same training data structure, hats and weapons likely behave similarly.

Conclusion: Think in Binary — On or Off

The practical takeaway from our testing is this:

Treat glasses as a binary choice: the character either wears them or doesn’t. In every method we tried, placing glasses on a desk or in a character’s hand simply didn’t work. There may be more advanced approaches that can get around this limitation, but they’re beyond what we’ve explored so far.

Here’s what that means in practice:

Character should wear glasses → Add glasses to the positive prompt. It will reliably appear on the face.
Character should not wear glasses → Remove every glasses-related token from the positive prompt. A light negative (glasses, eyewear at weight 1.0–1.3) can serve as insurance.
Glasses needed as a scene prop → Accept that prompting alone won’t achieve this. The realistic option is to generate the image without glasses, then paint them in manually afterward.

That third option is admittedly unsatisfying — it means the prompt can’t do everything on its own, and manual editing requires a level of artistic skill that not everyone has. But trying to force a concept that doesn’t exist in the training data tends to waste more time than it saves.

A Note on Negative Prompts

Negative prompts aren’t useless here. After removing all glasses-related tokens from the positive prompt, adding glasses, eyewear to the negative at a modest weight (1.0–1.3) is a reasonable safety net. Going above 1.5 risks artifacts.

What doesn’t work is piling up negative prompt entries while glasses remains in the positive. No amount of negative engineering can override a strongly present positive token.

Takeaways

Three things to remember:

In our testing, Illustrious-family models treated glasses as a wearable attribute — not a flexible prop. “Held in hand” and “on desk” descriptions were ignored every time.
We believe this is a training data issue, not a negative prompt issue. Danbooru’s tag system defines glasses almost exclusively as something worn on the face.
For now, think binary: on or off. If you need glasses as a scene prop, generate without them and add them manually afterward.

We initially suspected the White Bear Effect, but retesting revealed something simpler. The model can’t generate what it was never trained to understand. It’s an obvious point in hindsight, but easy to overlook when you’re deep in prompt debugging.