The Best GPUs for Image Generation: Comfortable Use of Flux.1 / SD 3.5 / AuraFlow!


- Choose NVIDIA GPUs for AI tasks.
- Aim for 16GB VRAM for models like Flux.1, SD 3.5 large, and AuraFlow.
- Recommended GPUs: RTX 4060 Ti 16GB or RTX 3060 12GB.
Introduction
Hello, this is Easygoing.
In this article, we'll explore the best GPUs for utilizing image generation AI!
Useful Reference Sites!
First, let me introduce a very informative website for comparing GPUs used in image generation AI.
This blog, authored by staff from an AI research institute in the US, provides detailed explanations about the hardware required for AI computations.
Performance Chart Overview!
Let's start by referencing the performance chart of GPUs from the site mentioned above.

- RTX/GTX Series: For general use.
- A Series / H Series: For professional use.
When comparing GPU performance, focus on the red 16-bit Inference bar in the chart.
Larger numbers in the GPU model indicate newer and higher performance. For example, the GTX 1060, released in 2016, demonstrates how GPU performance has increased several times over just a few years.
GPUs Are Manufactured by Three Companies
As of November 2024, GPUs are produced by three main companies:
- NVIDIA (RTX / GTX Series)
- AMD (Radeon Series)
- Intel (Arc Series)

Among these, NVIDIA GPUs are optimized for AI applications, making them the top choice for image generation.
Key Features of GPU Generations
Let’s examine the characteristics of NVIDIA GPU generations. AI computations rely heavily on floating-point (FP) calculations.
Generational Summary Table
Released | FP32 | FP16 | BF16 | FP8 | |
---|---|---|---|---|---|
RTX 4000 Series | 2022 | ✅ | ✅ | ✅ | ✅ |
RTX 3000 Series | 2020 | ✅ | ✅ | ✅ | ❌ |
RTX 2000 Series | 2018 | ✅ | ✅ | ❌ | ❌ |
GTX 1000 Series | 2016 | ✅ | ❌ | ❌ | ❌ |
NVIDIA introduces new GPU generations approximately every two years, and each generation brings optimizations for newer formats.
FP32, FP16, BF16, FP8 Explained
Image generation involves a vast amount of computation. Floating-point formats used in these calculations include:
Format | Sign | Exponent | Mantissa | Precision | Accuracy |
---|---|---|---|---|---|
FP32 | 1 bit | 8 bits | 23 bits | 6–7 digits | Excellent |
FP16 | 1 bit | 5 bits | 10 bits | 3–4 digits | Good |
BF16 | 1 bit | 8 bits | 7 bits | 3 digits | Good |
FP8 (e4m3) | 1 bit | 4 bits | 3 bits | 1–2 digits | Fair |
FP8 (e5m2) | 1 bit | 5 bits | 2 bits | 1–2 digits | Fair |
- Sign: Indicates positive or negative.
- Exponent: Handles large numbers for stability.
- Mantissa: Determines precision.
While FP32 offers the highest precision, it requires significant computational resources. Reducing precision (FP16 → FP8) decreases VRAM usage and computational load but also impacts accuracy.
GPU Memory (VRAM)
VRAM is critical for handling the heavy memory requirements of AI computations. Here's a comparison of VRAM types:
Memory Type | Bandwidth | Example GPUs |
---|---|---|
GDDR7 | Up to 1.5 TB/sec | RTX 5000 Series (Upcoming) |
GDDR6X | Up to 1 TB/sec | RTX 4090, 4080, 4070, etc. |
GDDR6 | Up to 750 GB/sec | RTX 4000, 3000, 2000, GTX 1600 Series |
GDDR5X | Up to 550 GB/sec | GTX 1080 |
GDDR5 | Up to 335 GB/sec | GTX 1070, 1060, 1050 |
DDR5 | Up to 50 GB/sec | System RAM and integrated GPUs |
DDR4 | Up to 25 GB/sec | System RAM and integrated GPUs |
The bandwidth figures in the table represent the memory transfer speed.
GDDR is a high-performance memory optimized for VRAM, offering much wider bandwidth compared to general-purpose DDR memory. This allows for significantly faster processing.
What Happens When VRAM Runs Out?
When VRAM is insufficient during processing, the system will rely on system RAM to handle the overflow.
However, system RAM has narrower bandwidth and introduces latency due to PCIe connections, resulting in a significant slowdown.

If VRAM exceeds its capacity, processing times can multiply by several factors, severely impacting the practicality of image generation.
How Much VRAM Do You Need?
So, how much VRAM is necessary for image generation AI?
Let’s measure the VRAM usage for various models.
Test Environment
- RTX 4060 Ti 16GB
- System RAM: 64GB
- 1024 x 1024 resolution, Euler, 20 steps
ComfyUI FP16 Format

First, let’s look at the VRAM usage in FP16 format (blue graph).
Newer models like Flux.1, AuraFlow, and SD 3.5 (large) use 14–15GB of VRAM, meaning you’ll need at least 16GB of VRAM for smooth performance.
Meanwhile, SD 3.5 (medium) and SDXL require just 6GB of VRAM to run.
Using --novram Option to Save VRAM
ComfyUI provides two options for saving VRAM when running low: --lowvram
and --novram
.
In the purple graph, you can see that the
--lowvram
option does not significantly reduce VRAM usage.

In contrast, the red --novram
option processes data entirely in system RAM, greatly reducing VRAM usage.
Although --novram
processing increases generation time by 1.5–2x, it is still significantly faster than exceeding VRAM capacity.
ComfyUI FP8 Format

Next, let’s examine VRAM usage in FP8 format, which is less demanding than FP16.
While FP8 format (green graph) generally uses less VRAM, Flux.1 still exceeds 12GB of VRAM, even in FP8 format.
For systems with only 12GB of VRAM, FP8 compression remains insufficient for Flux.1.
Stable Diffusion WebUI Forge

Finally, let’s measure VRAM usage in Stable Diffusion WebUI Forge.
Even here, Flux.1 exceeds 12GB of VRAM usage in both FP16 and FP8 formats.
Using GGUF Format with Flux.1
For systems with less than 12GB of VRAM, it is recommended to use the GGUF format, introduced in August 2024.

We’ll cover GGUF format in detail in the next article.
Comparing Actual GPUs
Now, let’s dive into the performance and pricing of specific GPUs.
High-End Options if Budget Allows
Here is a lineup of the latest RTX 4000 series GPUs:
GPU Model | VRAM Capacity | Tensor Cores | Power Usage | Price |
---|---|---|---|---|
RTX 4090 | 24GB | 336 | ~450W | ¥350,000 |
RTX 4080 SUPER | 16GB | 320 | ~320W | ¥200,000 |
RTX 4080 | 16GB | 304 | ~320W | ¥180,000 |
RTX 4070 Ti SUPER | 16GB | 264 | ~285W | ¥150,000 |
RTX 4070 Ti | 12GB | 240 | ~285W | ¥140,000 |
RTX 4070 | 12GB | 192 | ~285W | ¥90,000 |
RTX 4060 Ti 16GB | 16GB | 136 | ~160W | ¥80,000 |
If budget is not a concern, high-end GPUs are ideal.
Among the RTX 4000 series, the following offer the best VRAM value for their price:
- RTX 4090 (24GB)
- RTX 4070 Ti SUPER (16GB)
- RTX 4060 Ti (16GB)
Mid-Range Options with Ample VRAM
High-end GPUs offer exceptional performance, but if you're looking to save costs, mid-range GPUs are a viable option.
NVIDIA has introduced large VRAM models in the mid-range lineup, including the RTX 4060 Ti and RTX 3060.
GPU Model | Release Date | Tensor Cores | Power Usage | Price | Used Price |
---|---|---|---|---|---|
RTX 4060 Ti 16GB | July 2023 | 136 cores | 160W | ¥80,000 | ¥65,000 |
RTX 3060 12GB | February 2021 | 112 cores | 170W | ¥55,000 | ¥30,000 |
If you're looking for an affordable GPU for AI applications, these two models are strong contenders.
Consider Certified Used GPUs
GPUs are robust components, and pre-owned units generally work without issues.
Many PC shops sell pre-owned GPUs with return guarantees. Based on my experience, if no issues appear within the first week, the GPU is likely reliable.
Common issues with pre-owned GPUs include cooling fan malfunctions, but replacing the fan with a larger aftermarket unit often improves cooling efficiency.
For the RTX 4060 Ti, pre-owned options are still relatively expensive due to its recent release. However, RTX 3060 units can be found at very affordable prices in the pre-owned market.

If you're comfortable with more risk, platforms like Mercari can offer even cheaper deals, albeit without warranties.
What About the RTX 5000 Series?
New RTX series GPUs are released approximately every two years. Originally expected this year, the RTX 5000 series has been delayed until next year.
The VRAM lineup may include Samsung’s new GDDR7 3GB chips, which could provide 50% more VRAM capacity compared to the current 2GB chips.
The top-tier RTX 5090 is expected to feature 32GB of VRAM using the current 2GB chips. Products exceeding this capacity are unlikely in this generation.
Expected Pricing of the RTX 5060
NVIDIA is enjoying record-breaking success, particularly in the AI GPU market, where it holds a near monopoly. Consequently, the RTX 5000 series is expected to be priced aggressively.
Currency devaluation has driven up GPU prices in Japan. For example, the RTX 4060 Ti 16GB is priced near what was historically considered high-end.
Model | Release Date | Price (USD) | Price (JPY) | USD/JPY Rate |
---|---|---|---|---|
RTX 4060 Ti 16GB | July 2023 | $499 | ¥71,000 | 142 |
RTX 3060 12GB | February 2021 | $329 | ¥34,800 | 106 |
RTX 2060 | January 2019 | $349 | ¥37,700 | 108 |
Given the current economic conditions, waiting for the RTX 5060 might not guarantee better pricing.
How Does AMD’s Radeon Compare?
AMD's Radeon series is the second most widely used GPU line after NVIDIA.
While Radeon GPUs excel at general computation, they lack hardware equivalent to Tensor cores, lagging behind NVIDIA for AI applications.

AMD has no immediate plans to implement Tensor core-like hardware in consumer GPUs, so NVIDIA’s dominance will likely continue for at least 1–2 years.
Challenges with Intel Arc GPUs
Intel Arc GPUs debuted in April 2022 as newcomers to the market.
However, their software ecosystem, particularly drivers, remains underdeveloped. This results in instability, especially for AI workloads.

When I tested an Intel Arc GPU on a borrowed PC, frequent memory leaks rendered it nearly unusable for image generation.
For AI applications, NVIDIA GPUs remain the clear recommendation.
Online Services vs Local PCs: Which Is More Affordable?
Online services offer access to high-performance computational resources but require ongoing rental fees.
In general, local GPUs become more cost-effective after about 300 days of use.

Additionally, many online platforms, including SeaArt AI, optimize for FP8 execution and limit FP16 utilization.
For high-quality AI illustrations, FP16 capability gives local setups an edge.
Trivia from the Blog
- PCIe 3.0 x4 bandwidth is sufficient for GPU connectivity.
- System RAM should equal or exceed your VRAM capacity; add more if necessary.
- The FP8 performance of a single RTX 4090 surpasses the combined computational power of all the world's supercomputers in 2007.
Summary: Prioritize VRAM in Your GPU Purchase
- Choose NVIDIA GPUs for AI tasks.
- Aim for 16GB VRAM for models like Flux.1, SD 3.5 large, and AuraFlow.
- Recommended GPUs: RTX 4060 Ti 16GB or RTX 3060 12GB.
I hope this guide assists you in selecting the right GPU for your AI image generation needs.
Thank you for reading!