Let AI Give You Ideas! How to Automatically Generate Prompts

20250110_111147_931_Preview.png (1131×1600)
  • Generate captions with CLIP.
  • Use looping workflows to create variations.
  • Make minimal corrections when necessary.

Introduction

Hello, this is Easygoing.

The header image above was laid out using Manga Editor DESU!, an AI manga creation support app published by new-sankaku.

This app offers a variety of features that are completely free!

I’ll introduce how to use this app in detail in another post. For now, let’s explore how to generate prompts automatically using image generation AI.

Theme: Supercar Concept Art

This time, the theme is supercar concept art.

Flux1_Redraw_ a close - up view of a sleek modern sports car with a sleek d_00132_.png (2407×2407)

Cool car illustrations are something everyone admires. Let’s see if we can recreate them using AI.

Workflow!

Here’s the workflow used for this process:


flowchart LR
subgraph Input
A1(Prompt)
end
subgraph SDXL
B1(Base Sketch)
end
subgraph FLUX.1-Depth-dev 
C1(Fix Composition)
end
subgraph FLUX.1-dev
D1(Final Touches)
end
A1-->B1
B1-->|ControlNet<br>Depth|C1
C1-->D1
D1-.->|Captioning|A1

Now, let’s break down the process step by step, illustrated with examples.

Step 1: SDXL (Base Sketch)

First, we use the previous generation SDXL to create a base sketch.

SDXL_Rough_ a gold sports car is parked on a street at night with a cityscape in the background the car has a sleek design with a prominent front grille round headlights and a distinctive front grille.png (1024×1024)

SDXL anime models can produce diverse compositions. However, since the texture quality of Flux.1 is superior, this composition will be transferred to Flux.1 using ControlNet’s depth feature.

Step 2: Depth-Anything-V2 (Depth Map Extraction)

depth_color_map

Here, a depth map is extracted from the base sketch using Depth-Anything-V2.

Step 3: Flux.1-Depth-dev (Fixing Composition)

Flux1_Depth_ a vibrant multi - colored sports car is parked on a street at night with a cityscape in the background the car has a sleek design with a prominent front grille round headlights and a dist.png (1448×1448)

Using the extracted depth map, the official Flux.1 depth model refines the composition.

The Flux.1-depth-dev model works with ControlNet while consuming regular VRAM, but its texture quality is inferior to the standard model.

Therefore, after partial rendering, we switch to the standard Flux.1 model for the final touches.

Step 4: FluxesCore-Dev_V1.0 (Final Touches)

Flux1_hires_ a vibrant multi - colored sports car is parked on a street at night with a cityscape in the background the car has a sleek design with a prominent front grille round headlights and a dist.png (2407×2407)

Finally, we complete the illustration using the high-quality Flux.1[dev] custom model.

Reusing the Prompt for More!

The process doesn’t end here. Using the finalized image, we generate captions with CLIP and feed them into the next image generation.


flowchart LR
subgraph Input
A1(Prompt)
end
subgraph CLIP
B1(CLIP Text Encoder)
B2(CLIP Vision Encoder)
end
subgraph Image
D1(Illustration)
end
subgraph Captioning Model
E1(Cliption)
end
A1 --> B1
B1 --> D1
D1 --> B2
B2 --> E1
E1 --> A1

Note: To save storage, typically only the Text Encoder component of CLIP is distributed.

CLIP, commonly used for generating images from text, can create captions from images when combined with its Vision Encoder and a dedicated captioning model.

The captions generated from the image differ slightly from the original prompt. Repeating this process leads to gradual changes in the illustrations.

Since this workflow relies entirely on AI for prompt generation, the AI naturally produces new variations as the process continues.

Watching the Prompts Evolve!

Let’s look at how the prompts and images evolve through iterations.

First Image

Flux1_hires_ a white sports car is parked on a street at night with a person walking by in the background the car has a sleek design with a prominent front grille round headlights and a rear spoiler t.png (2407×2407)

night, supercar, monaco, dutch angle, close up

This is the first image, created with just the five keywords above. Using CLIP, captions from this image are generated for the next iteration.

Second Image

Flux1_hires_ a white sports car is parked on a city street at night the car is positioned on the left side of the image with its headlights on casting a warm glow on the pavement the street is lined w.png (2407×2407)

A white sports car is parked on a city street at night. The car is positioned on the left side of the image, with its headlights on, casting a warm glow on the pavement. The street is lined with buildings, and there are people walking in the background.
night, supercar, monaco, dutch angle, close up

The car and its color resemble the first image, but the overall vibe feels slightly different.

Ninth Image!

Flux1_hires_ a red sports car with a british flag design is parked on a cobblestone street at night with a man walking away and a woman in a black dress nearby the cars headlights are illuminated cast.png (2407×2407)

A red sports car is parked on a street at night, with a person walking by in the background. The car has a sleek design with a prominent front grille, round headlights, and a rear spoiler. The street is illuminated by streetlights, and there are other cars parked along the sides of the road.
night, supercar, monaco, dutch angle, close up

By the ninth iteration, the car’s color, design, and composition have changed significantly, making for a unique and engaging illustration.

This technique allows AI to generate new ideas and creative variations with each iteration.

Flux.1[shnell] models for Commercial Use!


flowchart LR
subgraph Input
A1(Prompt)
end
subgraph SDXL
B1(Base Illustration)
end
subgraph Flux.1-shnell
D1(Final Touches)
end
A1-->B1
B1-->D1
D1-.->|Captioning|A1

The earlier examples used Flux.1[dev], which isn’t commercially available. Let’s try the same process using Flux.1[shnell], a commercially viable model.

First Image

Flux1_hires_ a silver sports car with the number1on its side is parked on a circular track at night with a city skyline and a ferris wheel in the background bathed in the warm glow of the setting or r.png (2407×2407)

Third Image

Flux1_hires_ a silver sports car is speeding on a racetrack with a city skyline in the background the car has a sleek design with a low profile a distinctive rear spoiler and a distinctive rear spoile.png (2407×2407)

Fifth Image

Flux1_Redraw_ a silver sports car is parked on a wet street at night with a full moon in the background the car is positioned on the left side of the image and the street is illuminated by streetlight.png (2407×2407)

While the quality is slightly inferior to the previous model, it’s still sufficient for generating ideas.

AI's "Bias" Problem

This workflow relies on AI to handle everything from creating prompts to drawing and looping the process.

While this generates new ideas over time, correcting course becomes challenging if the direction deviates.

For instance, image generation AI tends to depict women more often.

Flux1_Redraw_ a woman in a purple dress is standing next to a silver sports car the car has a sleek design with a distinctive front grille and headlights the background features a cityscape at night w.png (2407×2407)
Women are strong

In this workflow, once a person is included in the illustration, it keeps generating similar results.

Minimal Course Correction

To address this, we introduced minimal corrections.

Specifically, certain keywords in the generated prompts were removed.

Removed Keywords

  • People: girl, woman, female, lady, boy, man, male, gentleman
  • Colors: black, white, silver, blue, red
Flux1_Redraw_ the 2 0 2 1 can - am spy spy spy s is shown at night with its headlights on and the hood up the car is positioned on a street at night with its headlights on casting a warm glow on the g.png (2407×2407)
No people needed

Although removing words might make the prompts grammatically awkward, advanced models like CLIP-G and T5xxl can understand the context well, so it doesn’t cause major issues.

Unlike using negative prompts, this approach doesn’t affect the overall image but simply excludes specific elements.

Workflow!

This Flux.1 workflow is complex, so here’s a simplified version using SDXL alone:


flowchart LR
subgraph Input
A1(Prompt)
end
subgraph SDXL
B1(Illustration)
end
A1-->B1
B1-.->|Captioning|A1

Cliption_auto-prompt_SDXL 2025.1.10.json

This simplified workflow can be adapted by changing the base model or integrating it into existing workflows for various applications.

Introducing Custom Nodes!

Here are the custom nodes used in this workflow.

comfy-cliption

Screenshot of the cliption selection screen of the custom node of comfyui manager with comment.png (3195×1680)
ComfyUI Manager search screen

comfy-cliption is a custom node that uses CLIP-L to caption images. It’s lightweight and achieves high accuracy, especially when paired with improved CLIP-L models.

How to Use comfy-cliption

First, download the full CLIP-L model from the following page:

Screenshot of Huggingface's LongCLIP-SAE-ViT-L-14 download page with comment.png (3672×2307)
FP16 format is usually OK!

Place the downloaded Long-CLIP-L model in the following folders:

  • InstallFolder/Models/CLIP
  • InstallFolder/Models/InvokeClipVision or InstallFolder/models/clip_vision

Next, open ComfyUI and arrange the nodes as shown below.

Cliption sample workflow.png (2870×1662)

The generated captions can be connected to the CLIP Text Encode node to be used as prompts for image generation.

To reuse captions for subsequent prompts, save them using the Save Text node and reload them with the Load Text node.

D2-nodes-ComfyUI

Screenshot of the D2nodes selection screen of the custom node of comfyui manager with English comment.png (3187×1682)

D2-nodes-ComfyUI, by da2el-AI, is a versatile node pack offering various functionalities.

In this workflow, we used the D2 Regex Replace node to remove specific words from the prompts.

How to Use D2 Regex Replace

To use the D2 Regex Replace node:

Screenshot of ComfyUI workflow explaining how to use the D2 Regex Replace node.png (3312×990)
  1. connect the input text you want to modify.
  2. Enter the words to be removed, separated by a vertical bar |.
  3. Running the node will delete the specified words!

Although we used it for word removal here, it supports advanced replacement using regular expressions.

More Automatic Prompt Generators

While this workflow used CLIP-L for simple prompt automation, there are other methods available.

Flux1_hires_ a white sports car with black accents is parked on a city street at night the car has a sleek design with a prominent front grille round headlights and a distinctive front grille the stre.png (2407×2407)

In the future, I plan to compare various approaches, including those using large language models (LLMs), for even more sophisticated prompt automation!

Update: 2025.2.7

I compared three different automatic prompt generation methods.

Conclusion: Trust AI to Explore Creativity

  • Generate captions with CLIP.
  • Use looping workflows to create variations.
  • Make minimal corrections when necessary.

AI has learned from billions of images—something no human can match.

AI knows far more designs and compositions than any individual.

Flux1_Redraw_ a young woman with dark hair sits in a modern car wearing a floral top and white pants with a focused expression surrounded by a cityscape at night with neon lights and a hint of a bustl.png (2393×2393)
Can you master it?

Recently, I’ve found the best approach is to let AI work freely and draw out ideas that I wouldn’t have thought of.

Thank you for reading to the end!


Models

anima_pencil-XL_v5.0.0

Flux.1-Depth-dev (Not for commercial use)

FluxesCore-Dev - V1.0 (Not for commercial use)

Flux.1-shnell

Depth-Anything-V2 (Only small models are available for commercial use)

Related Articles