Is Negative Prompt Necessary? Unleashing AI’s Creativity!

SDXL #CLIP #Flux.1 #SDXL #T5xxl

2024-12-192025-2-17

Anime illustration of a woman with brown hair and blue eyes smiling at us in a snowy town.png (1600×1600)

Keep CFG Scale low.
Use Negative Prompts sparingly.
Let AI create freely and observe its original illustrations

Introduction

Hello, this is Easygoing.

Today, let’s explore the concept of Negative Prompts in image generation AI.

Theme: Night Snapshots

The theme for this post is night snapshots.

anime illustration of an asian woman standing in the street with a bunch of snow falling from her arms looking over her shoulder at the window.png (2579×2579)

We aim to capture the fleeting expressions noticed during a walk through the town.

How Does AI Reproduce Prompts?

When generating images with AI, we input prompts while visualizing the desired image.

Graph showing the relationship between Target Image, Conditioning and Unconditioning.png (1200×800)

Red: Target Image
Green: Conditioning (the image generated based on the entered prompt)

The CLIP model associates the input prompt with the actual image, producing what is called the Conditioning image.

When Prompts Are Not Effective Enough

Although CLIP is trained to associate images with captions, its accuracy is not very high.

Often, the effects of the entered prompt are weaker than expected.

Thus, image generation AI requires methods to enhance the prompt's impact.

Enhancing Prompts with CFG Scale

The first tool to enhance prompt effects is the CFG Scale.

Graph showing the relationship between Target Image, Conditioning and Unconditioning and CFG scale

CFG Scale: Classifier-Free Diffusion Guidance Scale
A scale that enhances diffusion guidance without external models.

CFG Scale amplifies the prompt’s effect according to the input value.

Establishing a Baseline is Key!

When using CFG Scale, it is essential to define a proper baseline.

Image generation AI introduces various noises during its learning and computational processes.

anime illustration of boy wearing winter clothes looking off to a window with snow coming up and buildings in the background in the foreground of the photo, snowy night.png (2579×2579)

Even without entering a prompt, such noise influences the generated image, meaning the output deviates from the true zero point.

Using Unconditioning as the Baseline

To address this, AI uses an unconditioned baseline, called Unconditioning, generated without any prompts.

Graph showing the relationship between Target Image, Conditioning and Unconditioning and CFG scale 2

When generating an image, a vector is drawn from Unconditioning to Conditioning, and the CFG Scale amplifies this vector to enhance the prompt’s effect.

The graph shows that using CFG Scale moves the generated image closer to the target.

Shifting the Baseline with Negative Prompts

This mechanism also led to the invention of Negative Prompts.

Typically, Unconditioning is generated without any prompt input.

Graph showing the relationship between Target Image, Conditioning and Unconditioning and CFG scale 3

Graph showing the relationship between Target Image, Conditioning, Unconditioning, CFG scale and Negative Prompt.png (1200×800)

Left: Without Negative Prompt
Right: With Negative Prompt

Negative Prompts allow prompts to be input for Unconditioning, thereby shifting the baseline itself.

The right graph demonstrates how the arrow's direction changes, bringing the output closer to the desired image.

Negative Prompts Have a Significant Impact!

Since Negative Prompts shift the baseline, they have a substantial impact on the entire illustration.

Negative Prompts generally describe undesirable elements, but their influence extends beyond omitting specific features to affecting other parts of the image.

Anime illustration of a woman with brown hair and blue eyes warming her hands with her breath on a snowy night in the city_cleanup.png (2456×2456) — Negative Prompts are powerful!

Modern image generation AIs offer numerous settings. Drastically shifting the baseline with Negative Prompts can cause you to lose your direction.

It’s crucial to maintain balance and use Negative Prompts sparingly.

When are Negative Prompts Effective?

When should Negative Prompts be used?

Stable Diffusion 1: Low Prompt Reproducibility

In the era of Stable Diffusion 1, prompt reproducibility was limited due to CLIP constraints.

anime illustration of young girl holding gun in snowy city alley area with heavy snowdrops and buildings behind her, and brick wall overhead from left side shot shot.png (2456×2456) — Are you even listening?

Stable Diffusion 1’s CLIP-L could recognize only a limited number of words, with restricted training data.

To improve prompt reproducibility, users had to increase CFG Scale values and input extensive Negative Prompts, pulling Unconditioning negatively to approximate the target.

Custom Models with Explicit Instructions

Even in later iterations like SDXL, there are cases where Negative Prompts remain effective, such as when custom models explicitly recommend their use.

anime illustration of kid in city street wearing scarf and scarf around neck reading book, with lights in distance behind him over blue eyes as snow falles in background.png (2456×2456) — Be sure to read carefully.

Example: Recommended Negative Prompt for Animagine-XL 3.1

Animagine XL V3.1 - v3.1 | Stable Diffusion XL Checkpoint | Civitai

nsfw, lowres, (bad), text, error, fewer, extra, missing, worst quality, jpeg artifacts, low quality, watermark, unfinished, displeasing, oldest, early, chromatic aberration, signature, extra digits, artistic error, username, scan

In such cases, the custom model is fine-tuned to work with Negative Prompts, leading to higher-quality images when used.

The Proper Solution: Improving CLIP

While CFG Scale and Negative Prompts can enhance illustrations, they are challenging to fine-tune.

To improve prompt reproducibility, the ideal solution is to enhance CLIP itself.

Graph showing the relationship between Target Image, Conditioning, Refine CLIP and Unconditioning.png (1200×800)

Improving CLIP allows Conditioning to align more closely with the target image, reducing reliance on CFG Scale and Negative Prompts.

SDXL: CLIP-G, a Stronger Version of CLIP-L

laion/CLIP-ViT-bigG-14-laion2B-39B-b160k · Hugging Face

Improved CLIP-L: CLIP-GmP-ViT-L-14

zer0int/CLIP-GmP-ViT-L-14 · Hugging Face

Through extensive trials to improve image quality, I’ve found upgrading CLIP-L to be the most effective method for achieving better results.

T5xxl Provides Powerful Support

Another approach to improving prompt reproducibility is through the introduction of T5xxl.

What Are CLIP and T5xxl ? How Text Encoders Can Make Illustrations Stunning! | AI image journey

Unlike CLIP, T5xxl lacks direct image recognition capabilities but excels at understanding complex text. It reformulates input prompts into their optimal form for CLIP to process.

Graph showing the relationship between Target Image and Conditioning and T5xxl and Unconditioning.png (1200×800)

Since June 2024, Stable Diffusion 3 and Flux.1 have included T5xxl, significantly enhancing Conditioning accuracy. As a result, they’ve announced that Negative Prompts are no longer necessary.

AI's Creativity: What Does It Mean?

In this section, let’s delve into the concept of AI’s creativity.

What kind of image does AI envision when it receives a prompt from us?

Anime illustration of a woman with brown hair and blue eyes looking at you somehow in the evening in a snowy town.png (2456×2456) — What Does AI Imagine?

When considering AI’s freest expression, Conditioning represents the exact image that AI envisions based on the input prompt.

Outputting Conditioning Directly

What happens if we directly output this Conditioning? There are two ways to achieve this.

Set CFG Scale to 1

As shown earlier, CFG Scale is calculated as follows:

Graph showing the relationship between Target Image, Conditioning, Unconditioning and CFG scale x1.png (1200×800)

Generated Image = Unconditioning + (Conditioning − Unconditioning) × CFG Scale

From this formula, we see that when CFG Scale equals 1, the impact of Unconditioning is canceled out.

While there might be minor deviations, setting CFG Scale to 1 minimizes Unconditioning’s influence.

Input the Same Prompt for Positive and Negative

Another method is to input the exact same prompt for both Positive and Negative.

anime illustration of a girl writing something into an open book outside in the night time with a crowd on the sidewalk behind her holding her and looking at the camera.png (2576×2576) — Freedom is wonderful!

In this case, Conditioning and Unconditioning are identical, completely eliminating Unconditioning’s influence.

Under these conditions, AI depicts its interpretation of the prompt, showcasing its maximum creativity.

Flux.1: Fewer Variations?

Flux.1 and SD 3.5 have significantly improved prompt reproducibility thanks to the integration of T5xxl.

Flux.1, which I frequently use, is highly reliable and produces illustrations with an exceptionally high level of completion.

Anime illustration of a woman with brown hair and blue eyes smiling as she turns around in a snowy town.png (2576×2576) — Flux.1 is precise.

However, I’ve noticed a decline in encountering unexpectedly creative illustrations with Flux.1 compared to before.

T5xxl, while highly effective at understanding prompts, might sometimes overthink the details.

Is Prompt Input Creative?

When discussing image generation AI, inputting a prompt is the opposite of creativity.

AI learns from billions of images, but our prompts constrain AI’s expression.

While imposing constraints to improve quality enhances illustration fidelity, it simultaneously diminishes AI’s inherent diversity and creative range.

Does the masterpiece you generated look familiar?

The Pros and Cons of "Masterpiece" – Overcoming Perfection with the Help of Masters | AI image journey

Conclusion: Unleashing AI's Creativity!

Keep CFG Scale low.
Use Negative Prompts sparingly.
Let AI create freely and observe its original illustrations.

When I first started using image generation AI, I thought it was a tool to create the illustrations I wanted.

However, seeing AI’s exceptional compositional skills and artistic expression, I realized that our role as humans is to gently guide AI’s creativity rather than control it entirely.

Let's Learn from AI! Recreating Dynamic Compositions | AI image journey

Recently, I’ve started setting CFG Scale to around 1–2.

I am grateful for the availability of refined CLIP models and high-quality free resources, and I look forward to continuing to enjoy the creative possibilities of image generation AI.

Thank you for reading until the end!