Create Image Generation Prompts with AI! Let's Try Using ChatGPT

Anime-style digital art of a young girl with short black hair and green eyes wearing a white school uniform with a brown vest and a white shirt standing in a garden with lovelt parrot
  • Large Language Models (LLMs) understand text.
  • Vision Language Models (VLMs) can also understand images.
  • ChatGPT is a versatile, all-around model.

Introduction

Hello, this is Easygoing.

Today, I'm going to show you how to create prompts for image generation AI using ChatGPT.

This guide is perfect for beginners who are wondering how to write prompts for generating images.

Topic: A Friendly Parrot

Our topic today is a friendly parrot.

Anime-style digital art of a young man with short brown hair and yellow eyes wearing a white polo shirt with a pocket brown pants and a black backpack He is standing in a sunlit garden

We’ll try to illustrate a cute parrot interacting with people.

Two Ways to Automatically Create Prompts

There are basically two ways to automatically generate prompts for image generation AI.


flowchart LR
subgraph Input
A1(Keywords)
A2(Images)
end
subgraph AI
B1("Large Language Model (LLM)<br>Text -> Text")
B2("Vision Language Model (VLM)<br>Image -> Text")
end
C1(Prompt)
A1-->B1
A2-->B2
B1-->C1
B2-->C1

  1. Generate prompts from keywords
  2. Generate prompts from images

Both methods use AI to create prompts automatically, but they work in slightly different ways.

Creating Prompts from Keywords

When making prompts from keywords, we use an AI called a Large Language Model (LLM), which generates text from text.

Anime-style digital illustration of a cute young girl with short black hair and green eyes wearing a white hoodie with the number 30 on it holding a green parrot on her hand
LLMs really get text

Notable Large Language Models (LLM)

Year Released by Parameters Open Source
T5 2019 Google ~11B Yes
GPT-3 2020 OpenAI 175B No
PaLM 2022 Google 540B No
LLaMA 2023 Meta 7B ~ 65B Yes

One of the first practical, open-source LLMs was Google’s T5.

For image generation, the biggest T5xxl model is used as the text encoder in Flux.1 / SD3.5.

Then in 2020, OpenAI’s GPT-3 came out and wowed everyone with its performance when it was integrated into ChatGPT.

And in 2023, Meta released LLaMA as open source, which has become a popular, high-performance LLM that you can even run on your own machine.

Let’s Give ChatGPT a Try!

Now, let’s actually use ChatGPT to create a prompt. First, head over to this page:

https://chat.openai.com

Screenshot of ChatGPT English top page.png (2687×1152)

In the chat box, type in two keywords—parrot and human friendly—and write something like this:

Please write a prompt for an image generation AI in about 50 words. The keywords are parrot and human friendly.

Screenshot of ChatGPT being used to create image generation prompts from keywords English version.png (2527×1027)
Flux1_euler_ a young girl with brown hair and blue eyes holds a colorful parrot close to her face surrounded by lush greenery and pink flowers with a cityscape visible through a large window in the ba.png (2505×2505)
Friendly parrot

ChatGPT comes up with just the right length of prompt for image generation.

When you plug it into your favorite image generation AI (in this case, the anime model on Flux.1[dev]), you'll get a beautifully atmospheric illustration.

So, using ChatGPT makes it super easy to create prompts from keywords.

Creating Prompts from Images!

Now, let’s try making a prompt from an image.

When you create a prompt from an image, the AI needs to understand not just text but also images. For this, we use a Vision Language Model (VLM).

Anime-style digital illustration of a young man with short black hair and green eyes wearing a light grey shirt standing in a lush garden with red roses he is looking at a green parrot
VLMs can understand images too

Vision Language Model (VLM)

Year Released by Parameters Training Images Open Source
Open-CLIP-L 2022 LAION + Hugging Face ~430M ~400M Yes
GPT-4 2023 OpenAI Not Disclosed Not Disclosed No
LLaVA 2023 Meta 7B ~ 65B ~110M Yes
Florence-2 2023 Microsoft 230M ~ 771M ~126M Partially Open

Among vision language models, Open-CLIP-L is an open-source AI developed by OpenAI that offers performance on par with CLIP-L. It’s been integrated into many image generation AIs.

Lately, people often shorten Open-CLIP-L to CLIP-L, so from now on in this note, I'll refer to Open-CLIP-L simply as CLIP-L.

In March 2023, GPT-4 was released as the successor to GPT-3, and it comes with image recognition capabilities. That means today's ChatGPT can analyze images, too.

In June 2024, Microsoft released Florence-2, which, thanks to training on a dataset with detailed captions, offers remarkably high accuracy in recognition despite its light weight.

Let’s Try It Out!

Now, let’s create a prompt from an image.

Screenshot of chatGPT file upload screen in English with comment.png (2860×1530)

With ChatGPT, you can upload an image file by clicking the plus button at the bottom left of the input area, or simply by dragging and dropping the image onto the screen.

Based on the following image, please write a prompt for an image generation AI in about 50 words.

Screenshot of ChatGPT being used to create image generation prompts from image English version.png (2245×1600)
Flux1_euler_ a young female character with brown hair and blue eyes wearing a black leather jacket and holding a colorful parrot stands in a greenhouse surrounded by lush greenery and pink flowers wit.png (2505×2505)

This time as well, an illustration very similar to the input image was generated.

Testing ChatGPT’s Analysis Skills!

To put ChatGPT’s analytical abilities to the test, let’s input a slightly more complex image.

We’ll have it analyze a screenshot of top page of this site.

Analyze the screenshot of the following web page and summarize it in about 70 bullet points. Please take into account the content of this url and explain it clearly to a first time viewer. https://ai-image-journey.blogspot.com/

Screenshot of the top page of ai image journey.png (3470×2280)

Website Overview "AI Image Journey" explores the world of AI-generated images, with a focus on tools like Stable Diffusion and Flux.1. The site offers resources, visual showcases, and tips for enthusiasts.

Key Articles

  1. Optimizing Tools:
  • Guides on using Flux.1 with GGUF format.
  1. GPU Recommendations:
  • Best GPUs for AI image generation.
  1. Fun Topics:
  • Lighthearted discussions on AI and ramen.
  1. Online Platforms:
  • Exploring SeaArt AI for image generation.

Visual Content Showcases various AI-generated artworks, such as:

  • A girl dancing in a kimono.
  • AI personifications.
  • Scenic illustrations like a stormy Atlantic ocean at dusk.

Features

  • Dark Mode: Improves low-light viewing.
  • Update History: Tracks recent site changes.
  • Comprehensive Resources: Guides and showcases for leveraging AI creativity.

ChatGPT’s caption nails both the image and the text, and it even takes into account information from the internet, resulting in a very accurate caption.

In short, ChatGPT is a highly capable and versatile AI model.

What’s the Catch with ChatGPT?

While ChatGPT and other online AI models are incredibly versatile and accurate, they do have some downsides.

  • They require external communication when used.
  • They’re tuned to avoid inappropriate outputs.

Since online AI models communicate with external servers, that’s something to keep in mind.

Also, these online models—especially since they’re often used in business contexts—are adjusted to prevent inappropriate outputs.

Anime-style digital illustration of a young boy with short black hair sitting at a desk in a bright sunlit room
Local freedom

One of the big advantages of running image generation locally is that you can experiment with free and wide-ranging expressions, without the restrictions imposed by output tuning. In some art projects, those restrictions can be a downside.

These kinds of limitations can be avoided if you’re using local AI models.

Next time, I’ll compare methods for creating prompts using local AI in more detail!

Update: 2025.2.7

I compared three different automatic prompt generation methods.

In Summary: ChatGPT is a Jack-of-All-Trades!

  • Large Language Models (LLMs) understand text.
  • Vision Language Models (VLMs) can also understand images.
  • ChatGPT is a versatile, all-around model.

Today, I introduced ChatGPT, the most widely used generation AI out there.

But there are also image-recognition-capable models like Google’s Gemini and the much-discussed Deepseek R1's Janus-Pro, which work in similar ways.

Anime-style drawing of a young boy with short blonde hair sitting at a desk with a birdcage in front of him He is wearing a white t-shirt and has a sleepy expression on his face
Good night

If you’re thinking image generation is too hard or wondering what to do with ChatGPT, why not just have some fun with it like this?

Thanks for reading until the end!


Model Introduction

blue_pencil-flux1-v0.0.1