Optimize Your Flux.1 Experience! How to Use GGUF Format with Forge and ComfyUI

2024-11-282025-2-14

Anime illustration of a woman performing a waterfall 7.png (1600×1600)

Save memory with the GGUF format.
Choose a compression format that matches your VRAM capacity.
Use flux_tool to create the optimal GGUF format for your needs.

Introduction

Hello, this is Easygoing.

Today, I'll introduce the GGUF format, which allows you to efficiently run Flux.1.

Flux.1 Models Are Large

As discussed in the previous article, the new image generation AI, Flux.1, requires significant VRAM—16GB for FP16/FP8 formats—due to its large model size.

The Best GPUs for Image Generation: Comfortable Use of Flux.1 / SD 3.5 / AuraFlow! | AI image journey

The new GGUF compression format, introduced to address this, reduces memory usage significantly through quantization compression.

This article explains how to use the GGUF format effectively.

Comparing Actual Images

Let’s first look at some actual images.

Image-to-Image
Resolution: 1440 x 1440 => Hires 2576 x 2576

FP16 (16-bit)

Anime illustration of a woman performing a waterfall FP16.png (2576×2576)

FP8 (8-bit)

Anime illustration of a woman performing a waterfall FP8.png (2576×2576) — Finger details are distorted

Q8.gguf (8-bit)

Anime illustration of a woman performing a waterfall GGUF Format Q8.png (2576×2576) — Nearly identical to FP16

Q5_K_M.gguf (5-bit)

Anime illustration of a woman performing a waterfall GGUF Format Q5_K_M.png (2576×2576)

Q3_K_L.gguf (3-bit)

Anime illustration of a woman performing a waterfall GGUF Format Q_3_K_L.png (2576×2576) — Barely usable

Q2_K.gguf (2-bit)

Anime illustration of a woman performing a waterfall GGUF Format Q_2_K.png (2576×2576) — Significant distortion

Features of GGUF Format

The GGUF format, popularized since August 2024, is a new compression method.

While both FP8 and Q8.gguf formats use 8 bits, Q8.gguf generally provides better accuracy and image quality.

However, as the bit count decreases in GGUF formats, image quality degrades, particularly in background details and color richness.

Generally, Q4 to Q5 offers the best balance between size and accuracy.

What Do the Alphabetic Suffixes Mean?

Taking Q4 as an example:

Q4: Original quantization method.
Q4_K: Enhanced with K quantization for improved accuracy.
Q4_K_S: K quantization with smaller size and reduced precision.
Q4_K_M: K quantization with moderate precision and size.
Q4_K_L: K quantization with larger size and higher precision.

The further down the list, the higher the precision.

Let’s Get Started with GGUF!

Here’s how to use GGUF models in practice.

Download the Necessary Files

Flux.1 GGUF models can be downloaded from the following sources:

Models

Text Encoder

T5xxl_v1_1

city96/t5-v1_1-xxl-encoder-gguf at main

Flan-T5xxl

dumb-dev/flan-t5-xxl-gguf at main

Improved version of T5xxl, but with a larger capacity due to its non-distilled model.

Which GGUF Format Should You Choose?

Here are VRAM-specific recommendations for GGUF formats based on my tests.

Stable Diffusion webUI Forge

	Base model	T5xxl
VRAM 16GB	FP16	FP16
VRAM 12GB	Q5_K_S	FP16

ComfyUI

	Base model	T5xxl
VRAM 16GB	FP16	FP16
VRAM 12GB	Q6_K	Q8
VRAM 8GB	Q3_K_L	Q6_K

If you experience VRAM overflow, opt for a smaller format.

Anime illustration of a woman performing a waterfall 5.png (2576×2576)

Also, if your display monitor is connected directly to your GPU, choose a smaller format due to the additional VRAM usage.

Where to Place the Downloaded Files

Place the downloaded files in the following directories:

Stable Diffusion WebUI Forge

Model

Installation folder/Models/StableDiffusion

Text Encoder

Installation folder/Models/CLIP

ComfyUI

Model

Installation folder/Models/Unet

Text Encoder

Installation folder/Models/CLIP

If you use EasyForge or Stability Matrix's model browser, the files are automatically placed in the correct directories.

Anime illustration of a woman performing a waterfall 2.png (2576×2576)

How to Use GGUF Models

Here’s how to actually use the GGUF format models.

Stable Diffusion WebUI Forge: Simple and Straightforward!

In the Stable Diffusion WebUI Forge generation screen:

stable diffusion webui forge image generation screen with English comment.png (3840×2400)

Select the GGUF format model from the model dropdown.
For the text encoder, choose the GGUF format files in the VAE / text encoder section.

Once configured, enter your prompt as usual and hit Generate to create your image.

ComfyUI: Requires a Custom Node

To use GGUF in ComfyUI, install the ComfyUI-GGUF custom node via the ComfyUI Manager:

Screenshot of comfyui browser screen with comment.png (3840×2400)

Launch ComfyUI and open the Manager.

comfyui manager screenshots with comment.png (3840×2400)

Search for "gguf" under Custom Nodes Manager.

comfyui manager search screenshots with comment.png (3840×2280)

Install ComfyUI-GGUF, then restart ComfyUI.

In your workflow, use Unet Loader (GGUF) and Dual Clip Loader (GGUF) to load GGUF models.

If using EasyForge, the ComfyUI-GGUF custom node is pre-installed!

Drawbacks of GGUF Format

While GGUF saves VRAM and RAM, it has limitations in specific cases.

Minimal Speed Gains with Ample VRAM

GGUF format files are generated by applying quantization compression to the original file format.

Example: FP16 → Q8.gguf

The Q8.gguf format performs calculations using quantized integers, which generally makes it faster than the original FP16 format.

However, for critical parts of the process, it reverts back to FP16 for calculations.

Anime illustration of a woman performing a waterfall 9.png (2576×2576)

As a result, if there is sufficient VRAM to perform calculations using the original FP16 format, using the GGUF format may not lead to any speed improvement.

VRAM Overload in ComfyUI!?

ComfyUI optimizes processing speed by keeping models in VRAM whenever possible.

However, when using custom nodes other than the default ones, the memory capacity may not be accurately calculated, resulting in models not being unloaded and causing VRAM to exceed its limit.

To resolve this issue, you can connect an Unload All Models node before the process that causes the VRAM overload.

Example: Place the Unload All Models node between the conditioning and the Sampler nodes.

The Unload All Models custom node can be installed by searching for ComfyUI-Unload-Model in the Manager, just like before.

Recently, both ComfyUI and ComfyUI-GGUF have been receiving frequent updates, and in my environment, the VRAM overload issue appears inconsistently—sometimes occurring, sometimes not, depending on the day.

Anime illustration of a woman performing a waterfall 8.png (2579×2579)

This issue is likely to be resolved eventually, but learning how to use the Unload All Models node can be beneficial, as it is useful in various other situations as well.

Create Your Own GGUF Files!

So far, we’ve covered how to use GGUF format files. Now, let’s dive into how you can create GGUF files yourself.

Use Flux_tool

As mentioned in a previous article, flux_tool—part of the EasyForge package released by Zuntan—includes a GGUF conversion feature.

With flux_tool, you can generate GGUF files in any desired format.

For instructions on installing EasyForge, please refer to this guide.

[EasyForge] Super Easy! How to Install ComfyUI with One Click | AI image journey

How to Use ConvertGguf.bat

Here’s a step-by-step guide on using flux_tool’s ConvertGguf.bat utility.

Prepare the Source Model

Before starting the conversion, ensure you have a high-precision source model ready.

When performing quantization compression for the GGUF format, the higher the precision of the source file, the better the quality of the resulting GGUF file.

Precision Hierarchy:

FP32 > FP16 > BF16 > FP8

Therefore, aim to use the highest-precision model available from this hierarchy.

Screen shot of EasyForge folder opened with comment.png (2235×1675)

Screen shot of flux_tool folder opened with comment.png (2235×1675)

Run ConvertGguf.bat

Next, execute ConvertGguf.bat located in the flux_tool folder within the EasyForge directory.

Screenshot of drag and drop into command prompt with English comment.png (3055×2060)

When you run ConvertGguf.bat, the default output includes two formats: Q8 and Q6_K.

Create Other Formats

By default, running ConvertGguf.bat generates two formats: Q8 and Q6_K.

If you want to create other GGUF formats, edit the file ConvertGgufFormat.txt located in the env folder within flux_tool.

Screenshot of env folder opened with comment.png (1887×1675)

ConvertGgufFormat.txt screenshot.png (1110×880)

Example Entries for ConvertGgufFormat.txt

Q8_0
Q6_K
Q5_K_M
Q4_K_M
Q3_K_L

Save your changes, then re-run ConvertGguf.bat.

With these steps, you can create GGUF files in any desired format!

Using the --novram Option in ComfyUI

If you want to generate high-quality illustrations in a low-VRAM environment using ComfyUI, you can use the --novram option as an alternative to the GGUF format.

[Flux.1] It runs Smoothly with 6GB VRAM! A Recommendation for Using --novram | AI image journey

While using the --novram option increases generation time by 1.5 to 2 times, it allows for high-quality illustrations to be created even on systems with as little as 6GB of VRAM.