The Best GPUs for Image Generation: Comfortable Use of Flux.1 / SD 3.5 / AuraFlow!

Anime illustration of a female knight in sheet metal armor 3031.png (1600×1600)
  • Choose NVIDIA GPUs for AI tasks.
  • Aim for 16GB VRAM for models like Flux.1, SD 3.5 large, and AuraFlow.
  • Recommended GPUs: RTX 4060 Ti 16GB or RTX 3060 12GB.

Introduction

Hello, this is Easygoing.

In this article, we'll explore the best GPUs for utilizing image generation AI!

Useful Reference Sites!

First, let me introduce a very informative website for comparing GPUs used in image generation AI.

This blog, authored by staff from an AI research institute in the US, provides detailed explanations about the hardware required for AI computations.

Performance Chart Overview!

Let's start by referencing the performance chart of GPUs from the site mentioned above.

GPUS_Ada_raw_performance3.png (1600×1572)
  • RTX/GTX Series: For general use.
  • A Series / H Series: For professional use.

When comparing GPU performance, focus on the red 16-bit Inference bar in the chart.

Larger numbers in the GPU model indicate newer and higher performance. For example, the GTX 1060, released in 2016, demonstrates how GPU performance has increased several times over just a few years.

GPUs Are Manufactured by Three Companies

As of November 2024, GPUs are produced by three main companies:

  • NVIDIA (RTX / GTX Series)
  • AMD (Radeon Series)
  • Intel (Arc Series)
Anime illustration of a male knight in sheet metal armor 6.png (2048×2048)

Among these, NVIDIA GPUs are optimized for AI applications, making them the top choice for image generation.

Key Features of GPU Generations

Let’s examine the characteristics of NVIDIA GPU generations. AI computations rely heavily on floating-point (FP) calculations.

Generational Summary Table

Released FP32 FP16 BF16 FP8
RTX 4000 Series 2022
RTX 3000 Series 2020
RTX 2000 Series 2018
GTX 1000 Series 2016

NVIDIA introduces new GPU generations approximately every two years, and each generation brings optimizations for newer formats.

FP32, FP16, BF16, FP8 Explained

Image generation involves a vast amount of computation. Floating-point formats used in these calculations include:

Format Sign Exponent Mantissa Precision Accuracy
FP32 1 bit 8 bits 23 bits 6–7 digits Excellent
FP16 1 bit 5 bits 10 bits 3–4 digits Good
BF16 1 bit 8 bits 7 bits 3 digits Good
FP8 (e4m3) 1 bit 4 bits 3 bits 1–2 digits Fair
FP8 (e5m2) 1 bit 5 bits 2 bits 1–2 digits Fair
  • Sign: Indicates positive or negative.
  • Exponent: Handles large numbers for stability.
  • Mantissa: Determines precision.

While FP32 offers the highest precision, it requires significant computational resources. Reducing precision (FP16 → FP8) decreases VRAM usage and computational load but also impacts accuracy.

GPU Memory (VRAM)

VRAM is critical for handling the heavy memory requirements of AI computations. Here's a comparison of VRAM types:

Memory Type Bandwidth Example GPUs
GDDR7 Up to 1.5 TB/sec RTX 5000 Series (Upcoming)
GDDR6X Up to 1 TB/sec RTX 4090, 4080, 4070, etc.
GDDR6 Up to 750 GB/sec RTX 4000, 3000, 2000, GTX 1600 Series
GDDR5X Up to 550 GB/sec GTX 1080
GDDR5 Up to 335 GB/sec GTX 1070, 1060, 1050
DDR5 Up to 50 GB/sec System RAM and integrated GPUs
DDR4 Up to 25 GB/sec System RAM and integrated GPUs

The bandwidth figures in the table represent the memory transfer speed.

GDDR is a high-performance memory optimized for VRAM, offering much wider bandwidth compared to general-purpose DDR memory. This allows for significantly faster processing.

What Happens When VRAM Runs Out?

When VRAM is insufficient during processing, the system will rely on system RAM to handle the overflow.

However, system RAM has narrower bandwidth and introduces latency due to PCIe connections, resulting in a significant slowdown.

Anime illustration of a female knight in sheet metal armor 24.png (2048×2048)

If VRAM exceeds its capacity, processing times can multiply by several factors, severely impacting the practicality of image generation.

How Much VRAM Do You Need?

So, how much VRAM is necessary for image generation AI?

Let’s measure the VRAM usage for various models.

Test Environment

  • RTX 4060 Ti 16GB
  • System RAM: 64GB
  • 1024 x 1024 resolution, Euler, 20 steps

ComfyUI FP16 Format

ComfyUI FP16 VRAM usage.png (800×1131)

First, let’s look at the VRAM usage in FP16 format (blue graph).

Newer models like Flux.1, AuraFlow, and SD 3.5 (large) use 14–15GB of VRAM, meaning you’ll need at least 16GB of VRAM for smooth performance.

Meanwhile, SD 3.5 (medium) and SDXL require just 6GB of VRAM to run.

Using --novram Option to Save VRAM

ComfyUI provides two options for saving VRAM when running low: --lowvram and --novram.

In the purple graph, you can see that the --lowvram option does not significantly reduce VRAM usage.

Anime illustration of a male knight in sheet metal armor 3.png (2048×2048)

In contrast, the red --novram option processes data entirely in system RAM, greatly reducing VRAM usage.

Although --novram processing increases generation time by 1.5–2x, it is still significantly faster than exceeding VRAM capacity.

ComfyUI FP8 Format

ComfyUI FP8 VRAM usage.png (800×1131)

Next, let’s examine VRAM usage in FP8 format, which is less demanding than FP16.

While FP8 format (green graph) generally uses less VRAM, Flux.1 still exceeds 12GB of VRAM, even in FP8 format.

For systems with only 12GB of VRAM, FP8 compression remains insufficient for Flux.1.

Stable Diffusion WebUI Forge

Forge VRAM usage.png (800×1131)

Finally, let’s measure VRAM usage in Stable Diffusion WebUI Forge.

Even here, Flux.1 exceeds 12GB of VRAM usage in both FP16 and FP8 formats.

Using GGUF Format with Flux.1

For systems with less than 12GB of VRAM, it is recommended to use the GGUF format, introduced in August 2024.

Anime illustration of a male knight in sheet metal armor 2.png (2048×2048)

We’ll cover GGUF format in detail in the next article.

Comparing Actual GPUs

Now, let’s dive into the performance and pricing of specific GPUs.

High-End Options if Budget Allows

Here is a lineup of the latest RTX 4000 series GPUs:

GPU Model VRAM Capacity Tensor Cores Power Usage Price
RTX 4090 24GB 336 ~450W ¥350,000
RTX 4080 SUPER 16GB 320 ~320W ¥200,000
RTX 4080 16GB 304 ~320W ¥180,000
RTX 4070 Ti SUPER 16GB 264 ~285W ¥150,000
RTX 4070 Ti 12GB 240 ~285W ¥140,000
RTX 4070 12GB 192 ~285W ¥90,000
RTX 4060 Ti 16GB 16GB 136 ~160W ¥80,000

If budget is not a concern, high-end GPUs are ideal.

Among the RTX 4000 series, the following offer the best VRAM value for their price:

  • RTX 4090 (24GB)
  • RTX 4070 Ti SUPER (16GB)
  • RTX 4060 Ti (16GB)

Mid-Range Options with Ample VRAM

High-end GPUs offer exceptional performance, but if you're looking to save costs, mid-range GPUs are a viable option.

NVIDIA has introduced large VRAM models in the mid-range lineup, including the RTX 4060 Ti and RTX 3060.

GPU Model Release Date Tensor Cores Power Usage Price Used Price
RTX 4060 Ti 16GB July 2023 136 cores 160W ¥80,000 ¥65,000
RTX 3060 12GB February 2021 112 cores 170W ¥55,000 ¥30,000

If you're looking for an affordable GPU for AI applications, these two models are strong contenders.

Consider Certified Used GPUs

GPUs are robust components, and pre-owned units generally work without issues.

Many PC shops sell pre-owned GPUs with return guarantees. Based on my experience, if no issues appear within the first week, the GPU is likely reliable.

Common issues with pre-owned GPUs include cooling fan malfunctions, but replacing the fan with a larger aftermarket unit often improves cooling efficiency.

For the RTX 4060 Ti, pre-owned options are still relatively expensive due to its recent release. However, RTX 3060 units can be found at very affordable prices in the pre-owned market.

Anime illustration of a female knight in sheet metal armor 28.png (2048×2048)

If you're comfortable with more risk, platforms like Mercari can offer even cheaper deals, albeit without warranties.

What About the RTX 5000 Series?

New RTX series GPUs are released approximately every two years. Originally expected this year, the RTX 5000 series has been delayed until next year.

The VRAM lineup may include Samsung’s new GDDR7 3GB chips, which could provide 50% more VRAM capacity compared to the current 2GB chips.

The top-tier RTX 5090 is expected to feature 32GB of VRAM using the current 2GB chips. Products exceeding this capacity are unlikely in this generation.

Expected Pricing of the RTX 5060

NVIDIA is enjoying record-breaking success, particularly in the AI GPU market, where it holds a near monopoly. Consequently, the RTX 5000 series is expected to be priced aggressively.

Currency devaluation has driven up GPU prices in Japan. For example, the RTX 4060 Ti 16GB is priced near what was historically considered high-end.

Model Release Date Price (USD) Price (JPY) USD/JPY Rate
RTX 4060 Ti 16GB July 2023 $499 ¥71,000 142
RTX 3060 12GB February 2021 $329 ¥34,800 106
RTX 2060 January 2019 $349 ¥37,700 108

Given the current economic conditions, waiting for the RTX 5060 might not guarantee better pricing.

How Does AMD’s Radeon Compare?

AMD's Radeon series is the second most widely used GPU line after NVIDIA.

While Radeon GPUs excel at general computation, they lack hardware equivalent to Tensor cores, lagging behind NVIDIA for AI applications.

Anime illustration of a female knight in sheet metal armor 15.png (2048×2048)

AMD has no immediate plans to implement Tensor core-like hardware in consumer GPUs, so NVIDIA’s dominance will likely continue for at least 1–2 years.

Challenges with Intel Arc GPUs

Intel Arc GPUs debuted in April 2022 as newcomers to the market.

However, their software ecosystem, particularly drivers, remains underdeveloped. This results in instability, especially for AI workloads.

Anime illustration of a female knight in sheet metal armor 18.png (2048×2048)

When I tested an Intel Arc GPU on a borrowed PC, frequent memory leaks rendered it nearly unusable for image generation.

For AI applications, NVIDIA GPUs remain the clear recommendation.

Online Services vs Local PCs: Which Is More Affordable?

Online services offer access to high-performance computational resources but require ongoing rental fees.

In general, local GPUs become more cost-effective after about 300 days of use.

Anime illustration of a female knight in sheet metal armor 19.png (2048×2048)

Additionally, many online platforms, including SeaArt AI, optimize for FP8 execution and limit FP16 utilization.

For high-quality AI illustrations, FP16 capability gives local setups an edge.

Trivia from the Blog

  • PCIe 3.0 x4 bandwidth is sufficient for GPU connectivity.
  • System RAM should equal or exceed your VRAM capacity; add more if necessary.
  • The FP8 performance of a single RTX 4090 surpasses the combined computational power of all the world's supercomputers in 2007.

Summary: Prioritize VRAM in Your GPU Purchase

  • Choose NVIDIA GPUs for AI tasks.
  • Aim for 16GB VRAM for models like Flux.1, SD 3.5 large, and AuraFlow.
  • Recommended GPUs: RTX 4060 Ti 16GB or RTX 3060 12GB.

I hope this guide assists you in selecting the right GPU for your AI image generation needs.

Thank you for reading!