Optimize Your Flux.1 Experience! How to Use GGUF Format with Forge and ComfyUI


- Save memory with the GGUF format.
- Choose a compression format that matches your VRAM capacity.
- Use flux_tool to create the optimal GGUF format for your needs.
Introduction
Hello, this is Easygoing.
Today, I'll introduce the GGUF format, which allows you to efficiently run Flux.1.
Flux.1 Models Are Large
As discussed in the previous article, the new image generation AI, Flux.1, requires significant VRAM—16GB for FP16/FP8 formats—due to its large model size.
The new GGUF compression format, introduced to address this, reduces memory usage significantly through quantization compression.
This article explains how to use the GGUF format effectively.
Comparing Actual Images
Let’s first look at some actual images.
- Image-to-Image
- Resolution: 1440 x 1440 => Hires 2576 x 2576
FP16 (16-bit)

FP8 (8-bit)

Q8.gguf (8-bit)

Q5_K_M.gguf (5-bit)

Q3_K_L.gguf (3-bit)

Q2_K.gguf (2-bit)

Features of GGUF Format
The GGUF format, popularized since August 2024, is a new compression method.
While both FP8 and Q8.gguf formats use 8 bits, Q8.gguf generally provides better accuracy and image quality.
However, as the bit count decreases in GGUF formats, image quality degrades, particularly in background details and color richness.
Generally, Q4 to Q5 offers the best balance between size and accuracy.
What Do the Alphabetic Suffixes Mean?
Taking Q4 as an example:
- Q4: Original quantization method.
- Q4_K: Enhanced with K quantization for improved accuracy.
- Q4_K_S: K quantization with smaller size and reduced precision.
- Q4_K_M: K quantization with moderate precision and size.
- Q4_K_L: K quantization with larger size and higher precision.
The further down the list, the higher the precision.
Let’s Get Started with GGUF!
Here’s how to use GGUF models in practice.
Download the Necessary Files
Flux.1 GGUF models can be downloaded from the following sources:
Models
Flux.1[dev]
Flux.1[shnell]
blue_pencil-flux1
Text Encoder
T5xxl_v1_1
Flan-T5xxl
Improved version of T5xxl, but with a larger capacity due to its non-distilled model.
Which GGUF Format Should You Choose?
Here are VRAM-specific recommendations for GGUF formats based on my tests.
Stable Diffusion webUI Forge
Base model | T5xxl | |
---|---|---|
VRAM 16GB | FP16 | FP16 |
VRAM 12GB | Q5_K_S | FP16 |
ComfyUI
Base model | T5xxl | |
---|---|---|
VRAM 16GB | FP16 | FP16 |
VRAM 12GB | Q6_K | Q8 |
VRAM 8GB | Q3_K_L | Q6_K |
If you experience VRAM overflow, opt for a smaller format.

Also, if your display monitor is connected directly to your GPU, choose a smaller format due to the additional VRAM usage.
Where to Place the Downloaded Files
Place the downloaded files in the following directories:
Stable Diffusion WebUI Forge
Model
Installation folder/Models/StableDiffusion
Text Encoder
Installation folder/Models/CLIP
ComfyUI
Model
Installation folder/Models/Unet
Text Encoder
Installation folder/Models/CLIP
If you use EasyForge or Stability Matrix's model browser, the files are automatically placed in the correct directories.

How to Use GGUF Models
Here’s how to actually use the GGUF format models.
Stable Diffusion WebUI Forge: Simple and Straightforward!
In the Stable Diffusion WebUI Forge generation screen:

- Select the GGUF format model from the model dropdown.
- For the text encoder, choose the GGUF format files in the VAE / text encoder section.
Once configured, enter your prompt as usual and hit Generate to create your image.
ComfyUI: Requires a Custom Node
To use GGUF in ComfyUI, install the ComfyUI-GGUF custom node via the ComfyUI Manager:

- Launch ComfyUI and open the Manager.

- Search for "gguf" under Custom Nodes Manager.

- Install ComfyUI-GGUF, then restart ComfyUI.

- In your workflow, use Unet Loader (GGUF) and Dual Clip Loader (GGUF) to load GGUF models.
If using EasyForge, the ComfyUI-GGUF custom node is pre-installed!
Drawbacks of GGUF Format
While GGUF saves VRAM and RAM, it has limitations in specific cases.
Minimal Speed Gains with Ample VRAM
GGUF format files are generated by applying quantization compression to the original file format.
Example: FP16 → Q8.gguf
The Q8.gguf format performs calculations using quantized integers, which generally makes it faster than the original FP16 format.
However, for critical parts of the process, it reverts back to FP16 for calculations.

As a result, if there is sufficient VRAM to perform calculations using the original FP16 format, using the GGUF format may not lead to any speed improvement.
VRAM Overload in ComfyUI!?
ComfyUI optimizes processing speed by keeping models in VRAM whenever possible.
However, when using custom nodes other than the default ones, the memory capacity may not be accurately calculated, resulting in models not being unloaded and causing VRAM to exceed its limit.
To resolve this issue, you can connect an Unload All Models node before the process that causes the VRAM overload.

Example: Place the Unload All Models node between the conditioning and the Sampler nodes.
The Unload All Models custom node can be installed by searching for ComfyUI-Unload-Model in the Manager, just like before.
Recently, both ComfyUI and ComfyUI-GGUF have been receiving frequent updates, and in my environment, the VRAM overload issue appears inconsistently—sometimes occurring, sometimes not, depending on the day.

This issue is likely to be resolved eventually, but learning how to use the Unload All Models node can be beneficial, as it is useful in various other situations as well.
Create Your Own GGUF Files!
So far, we’ve covered how to use GGUF format files. Now, let’s dive into how you can create GGUF files yourself.
Use Flux_tool
As mentioned in a previous article, flux_tool—part of the EasyForge package released by Zuntan—includes a GGUF conversion feature.
With flux_tool, you can generate GGUF files in any desired format.
For instructions on installing EasyForge, please refer to this guide.
How to Use ConvertGguf.bat
Here’s a step-by-step guide on using flux_tool’s ConvertGguf.bat utility.
Prepare the Source Model
Before starting the conversion, ensure you have a high-precision source model ready.
When performing quantization compression for the GGUF format, the higher the precision of the source file, the better the quality of the resulting GGUF file.
Precision Hierarchy:
FP32 > FP16 > BF16 > FP8
Therefore, aim to use the highest-precision model available from this hierarchy.


Run ConvertGguf.bat
Next, execute ConvertGguf.bat located in the flux_tool folder within the EasyForge directory.

When you run ConvertGguf.bat, the default output includes two formats: Q8 and Q6_K.
Create Other Formats
By default, running ConvertGguf.bat generates two formats: Q8 and Q6_K.
If you want to create other GGUF formats, edit the file ConvertGgufFormat.txt located in the env folder within flux_tool.


Example Entries for ConvertGgufFormat.txt
- Q8_0
- Q6_K
- Q5_K_M
- Q4_K_M
- Q3_K_L
Save your changes, then re-run ConvertGguf.bat.
With these steps, you can create GGUF files in any desired format!
Using the --novram Option in ComfyUI
If you want to generate high-quality illustrations in a low-VRAM environment using ComfyUI, you can use the --novram option as an alternative to the GGUF format.
.png)
While using the --novram option increases generation time by 1.5 to 2 times, it allows for high-quality illustrations to be created even on systems with as little as 6GB of VRAM.
Summary: GGUF Format is Amazing!
- Save memory with the GGUF format.
- Choose a compression format that matches your VRAM capacity.
- Use flux_tool to create the optimal GGUF format for your needs.
Although this article focused on Flux.1, the GGUF format is already being adopted by SD 3.5 and SDXL.

The GGUF format is a revolutionary technology that is expected to accelerate the adoption of AI, even on compact devices like smartphones.
In the near future, we may reach a point where anyone can generate images directly on their smartphone.
Thank you for reading to the end!