[Flux.1] Which is the Best Model Format? A Thorough Comparison of FP16, Q_8, Q_4, and NF4!


- Compression formats affect image quality.
- Upscaling is also effective with Flux.1.
- Choose according to your VRAM.
Introduction
Hello, I’m Easygoing.
Today, I’d like to address the dilemma of choosing the right compression format when using Flux.1.
Flux.1 offers a variety of compression formats, so I’ve organized them in a clear way.
Things to Keep in Mind When Comparing Compression Formats!
Before starting the test, there are some important points to keep in mind.
Illyasviel, the creator of Forge and the NF4 format, states that quantization compression techniques vary greatly, so they should not be compared based on a single image.
Additionally, the version of blue_pencil-flux1-v0.0.1
used for this test is a prototype, so it may not be intended for detailed comparisons.
Please note that the content of this article may change significantly with future advancements in technology.
The Model Size is Dramatically Increasing!
As the performance of AI image generation improves, the size of the models is also growing.
I compared the model sizes of the blue_pencil
series in FP16 format, which I usually use.
- SD1.5: blue_pencil-v10 – 2.2 GB
- SDXL: blue_pencil-XL-v7.0.0 – 7 GB
- Flux.1: blue_pencil-flux1-v0.0.1 – 33.7 GB
The Flux.1 model is extremely large, so new compression methods are necessary.
Model Compression Formats
Although blue_pencil-flux1 v0.0.3 is also available, I’ll be using v0.0.1, which I’ve used before, for this test.
The sizes of each compression format are as follows:
- FP16: 33.7 GB
- FP8: 17.1 GB
- Q_8.gguf: 12.7 GB
- Q_4_K_M.gguf: 6.8 GB
- NF4: 16.6 GB
While the NF4 format is not publicly available, it can be created from the FP16 format using Forge’s checkpoint merge feature.
What Do These Abbreviations Mean?
FP16 (Floating Point 16-bit) – 32.7 GB
It’s a common method for generative AI and is standard in models like SD1.5 and SDXL.
FP8 (Floating Point 8-bit) – 17.1 GB
It reduces precision compared to FP16 but cuts the size in half.
Q_8.gguf (8-bit GPT-Generated Unified Format) – 12.7 GB
It offers higher precision than FP8 with a smaller file size.
Q_4_K_M.gguf (4-bit K-quants Mixed-precision GPT-Generated Unified Format) – 6.8 GB
Through algorithmic improvements, it offers higher precision than standard Q_4.
NF4 (Normal Float 4-bit) – 16.6 GB
It is lighter than FP8 and is said to offer superior image quality. It can be processed quickly on GPUs starting from the RTX 2000 series.
Image Comparison
Now, let’s compare some actual images.
FP16

FP8

Q_8.gguf

Q_4_K_M.gguf

NF4

FP16 is Still the Best Quality
When comparing each image, focusing on the neon text and the red paint on the motorcycle makes it easier to see the differences.
While Q_8 has only a third of the size of FP16, it reproduces the neon text closest to the original.
As for the paint, FP16 looks the best, while the other formats simplify the gradient due to compression.
Personally, it reminds me of how increasing JPEG compression fades colors.
Comparing Q_4_K_M and NF4
The NF4 format significantly reduces fidelity to the original.
When compared to Q_4_K_M, the overall texture of Q_4_K_M is better.
However, in my environment, NF4 used slightly less VRAM than Q_4_K_M.
The optimal setting may vary depending on the user, so it’s a good idea to try both and choose what suits your system best.
EasyForge Now Supports gguf Conversion!
As of August 25, 2024, EasyForge added support for gguf conversion.
Now, you can convert the original FP16 format into any gguf format of your choice.
Let’s Find the Best Compression Method!
Here’s a site comparing the precision of the gguf format. I’ve also referenced a clear graph from another site.

According to the graph, Q6_K, Q5_K_M, and Q4_K_M strike the best balance between size and precision in gguf format.
I Compared Each Format!
Here are the results of comparing each format.
Q_8

Q_6_K

Q_5_K_M

Q_4_K_M

Q_8 Retains the Most Detail
Since these are all gguf format, the differences aren’t huge, but Q_8 clearly retains the most detail.
The motorcycle wheels show the most noticeable differences as you increase compression.
Although there are differences, even the lightest Q_4_K_M still manages to represent Flux.1’s high-quality textures at a fairly high level.
VRAM Usage is the Key Factor
The biggest bottleneck when using Flux.1 is VRAM usage.
If the VRAM usage exceeds the limit, generation speed drops significantly, so you should choose a compression format that matches your GPU’s VRAM capacity.
In Stable Diffusion webUI Forge, for mid-range or lower GPUs, Q_4 is often the most appropriate.
Flux.1 and VRAM are summarized in another articles.
In ComfyUI, you can to use Q_8 format with 6GB VRAM.
Is Upscaling Effective?
Next, I’ll try upscaling with Flux.1.
The original resolution of the image this time is 1536 x 1536, but I’ll use Hires.fix to upscale it by 1.55x to 2376 x 2376.
The upscaler used is 4x_NMKD-Superscale-SP_178000_G, which gives the most natural finish.
In the environment using Stable Diffusion webUI Forge, VRAM usage exceeded the limit with Q_6_K, so I’ll compare using Q_5_K_M or lower.
Q_5_K_M Original Image (1536 x 1536)

Q_5_K_M High-Resolution (2376 x 2376)

Q_4_K_M High-Resolution (2376 x 2376)

Flux.1 Becomes Even More Beautiful with Upscaling!
Upscaling further enhances the texture.
However, since Flux.1 is already quite clear, I’ll need to adjust prompts in the future to prevent it from becoming too sharp when upscaled.
When comparing compression formats, there is a noticeable difference in background detail between Q_5_K_M and Q_4_K_M.
With Q_4_K_M, many of the distant buildings are omitted.
Although upscaling makes VRAM management stricter, I will continue to explore the best balance between compression ratio and resolution.
Summary
About the compression formats for Flux.1:
- Compression formats affect image quality.
- Upscaling is also effective with Flux.1.
- Choose according to your VRAM.
AI image generation is progressing at an incredible speed. It’s only been a month since the release of Flux.1, and in that time, new technologies have been released one after another.
- August 1, 2024: Flux.1 released
- August 9, 2024: Forge added support for Flux.1 and NF4 format
- August 11, 2024: EasyForge released
- August 15, 2024: Forge added support for gguf format
- August 25, 2024: EasyForge added gguf conversion support
It’s unbelievable that so much new technology is being released for free on a weekly basis.
Has there ever been a more exciting time?
The journey to explore the potential of Flux.1 is far from over.
Thank you for reading to the end!
Four-Panel Comic Prompt

- A high school boy aspiring to be a manga artist, with brown hair and wearing a white T-shirt with "comic" written in red, is looking at his laptop in his room thinking, "FP16 is beautiful but heavy."
- Next, he looks up and thinks, "Q_8 and Q_4 are much lighter."
- Then, he looks to the side thinking, "NF4 is fast but loses accuracy."
- Finally, with a smile, he concludes, "It all depends on VRAM!"
Generated with Anifusion.