[Flux.1] Which is the Best Model Format? A Thorough Comparison of FP16, Q_8, Q_4, and NF4!

Flux.1 #Flux.1 #GGUF #NF4

2024-9-262025-2-13

Four-frame cartoon comparing FP16, NF4 and gguf formats in Flux.1_result.png (3400×4800)

Compression formats affect image quality.
Upscaling is also effective with Flux.1.
Choose according to your VRAM.

Introduction

Hello, I’m Easygoing.

Today, I’d like to address the dilemma of choosing the right compression format when using Flux.1.

Flux.1 offers a variety of compression formats, so I’ve organized them in a clear way.

Things to Keep in Mind When Comparing Compression Formats!

Before starting the test, there are some important points to keep in mind.

Illyasviel, the creator of Forge and the NF4 format, states that quantization compression techniques vary greatly, so they should not be compared based on a single image.

[GGUF and Flux full fp16 Model] loading T5, CLIP + new VAE UI · lllyasviel/stable-diffusion-webui-forge · Discussion #1050

Additionally, the version of blue_pencil-flux1-v0.0.1 used for this test is a prototype, so it may not be intended for detailed comparisons.

Please note that the content of this article may change significantly with future advancements in technology.

The Model Size is Dramatically Increasing!

As the performance of AI image generation improves, the size of the models is also growing.

I compared the model sizes of the blue_pencil series in FP16 format, which I usually use.

bluepen5805/blue_pencil-flux1 · Hugging Face

SD1.5: blue_pencil-v10 – 2.2 GB
SDXL: blue_pencil-XL-v7.0.0 – 7 GB
Flux.1: blue_pencil-flux1-v0.0.1 – 33.7 GB

The Flux.1 model is extremely large, so new compression methods are necessary.

Model Compression Formats

Although blue_pencil-flux1 v0.0.3 is also available, I’ll be using v0.0.1, which I’ve used before, for this test.

The sizes of each compression format are as follows:

FP16: 33.7 GB
FP8: 17.1 GB
Q_8.gguf: 12.7 GB
Q_4_K_M.gguf: 6.8 GB
NF4: 16.6 GB

While the NF4 format is not publicly available, it can be created from the FP16 format using Forge’s checkpoint merge feature.

What Do These Abbreviations Mean?

FP16 (Floating Point 16-bit) – 32.7 GB

It’s a common method for generative AI and is standard in models like SD1.5 and SDXL.

FP8 (Floating Point 8-bit) – 17.1 GB

It reduces precision compared to FP16 but cuts the size in half.

Q_8.gguf (8-bit GPT-Generated Unified Format) – 12.7 GB

It offers higher precision than FP8 with a smaller file size.

Q_4_K_M.gguf (4-bit K-quants Mixed-precision GPT-Generated Unified Format) – 6.8 GB

Through algorithmic improvements, it offers higher precision than standard Q_4.

NF4 (Normal Float 4-bit) – 16.6 GB

It is lighter than FP8 and is said to offer superior image quality. It can be processed quickly on GPUs starting from the RTX 2000 series.

Image Comparison

Now, let’s compare some actual images.

FP16

FP8

Cyberpunk-style neon and motorcycle illustration FP8_result.png (1536×1536) — Neon text and pose are different

Q_8.gguf

Cyberpunk-style neon and motorcycle illustration Q_8_result.png (1536×1536) — Close to FP16

Q_4_K_M.gguf

Cyberpunk-style neon and motorcycle illustration Q_4_K_M_result.png (1536×1536) — Close to FP8

NF4

Cyberpunk-style neon and motorcycle illustration NF4_result.png (1536×1536) — Quite different

FP16 is Still the Best Quality

When comparing each image, focusing on the neon text and the red paint on the motorcycle makes it easier to see the differences.

While Q_8 has only a third of the size of FP16, it reproduces the neon text closest to the original.

As for the paint, FP16 looks the best, while the other formats simplify the gradient due to compression.

Personally, it reminds me of how increasing JPEG compression fades colors.

Comparing Q_4_K_M and NF4

The NF4 format significantly reduces fidelity to the original.

When compared to Q_4_K_M, the overall texture of Q_4_K_M is better.

However, in my environment, NF4 used slightly less VRAM than Q_4_K_M.

The optimal setting may vary depending on the user, so it’s a good idea to try both and choose what suits your system best.

EasyForge Now Supports gguf Conversion!

As of August 25, 2024, EasyForge added support for gguf conversion.

Zuntan03/EasyForge: EasyForge は簡単・安全に新生 Forge を試せる環境です。

Now, you can convert the original FP16 format into any gguf format of your choice.

Let’s Find the Best Compression Method!

Here’s a site comparing the precision of the gguf format. I’ve also referenced a clear graph from another site.

llama.cpp/examples/perplexity at master · ggerganov/llama.cpp

Graph of comparison of gguf format_result.png (2173×1409) — pRtJv9khlqfT25um2bG9T.png (2173×1409)

According to the graph, Q6_K, Q5_K_M, and Q4_K_M strike the best balance between size and precision in gguf format.

I Compared Each Format!

Here are the results of comparing each format.

Q_8

Illustration of a rainy neon street at night with a motorcycle Q_8_result.png (1536×1536) — Beautiful

Q_6_K

Illustration of a rainy neon street at night with a motorcycle Q_6_K_result.png (1536×1536) — Fewer people

Q_5_K_M

Q_4_K_M

Q_8 Retains the Most Detail

Since these are all gguf format, the differences aren’t huge, but Q_8 clearly retains the most detail.

The motorcycle wheels show the most noticeable differences as you increase compression.

Although there are differences, even the lightest Q_4_K_M still manages to represent Flux.1’s high-quality textures at a fairly high level.

VRAM Usage is the Key Factor

The biggest bottleneck when using Flux.1 is VRAM usage.

If the VRAM usage exceeds the limit, generation speed drops significantly, so you should choose a compression format that matches your GPU’s VRAM capacity.

In Stable Diffusion webUI Forge, for mid-range or lower GPUs, Q_4 is often the most appropriate.

Flux.1 and VRAM are summarized in another articles.

[Flux.1] It runs Smoothly with 6GB VRAM! A Recommendation for Using --novram | AI image journey

In ComfyUI, you can to use Q_8 format with 6GB VRAM.

Is Upscaling Effective?

Next, I’ll try upscaling with Flux.1.

The original resolution of the image this time is 1536 x 1536, but I’ll use Hires.fix to upscale it by 1.55x to 2376 x 2376.

The upscaler used is 4x_NMKD-Superscale-SP_178000_G, which gives the most natural finish.

In the environment using Stable Diffusion webUI Forge, VRAM usage exceeded the limit with Q_6_K, so I’ll compare using Q_5_K_M or lower.

Q_5_K_M Original Image (1536 x 1536)

Cyberpunk world with a large motorcycle seen from below Q_5_K_M-original_result.png (1536×1536) — Original

Q_5_K_M High-Resolution (2376 x 2376)

Cyberpunk world with a large motorcycle seen from below Q_5_K_M-hires_result.png (2376×2376) — Improved texture on the motorcycle

Q_4_K_M High-Resolution (2376 x 2376)

Cyberpunk world with a large motorcycle seen from below Q_4_K_M-hires_result.png (2376×2376) — Less detailed background

Flux.1 Becomes Even More Beautiful with Upscaling!

Upscaling further enhances the texture.

However, since Flux.1 is already quite clear, I’ll need to adjust prompts in the future to prevent it from becoming too sharp when upscaled.

When comparing compression formats, there is a noticeable difference in background detail between Q_5_K_M and Q_4_K_M.

With Q_4_K_M, many of the distant buildings are omitted.

Although upscaling makes VRAM management stricter, I will continue to explore the best balance between compression ratio and resolution.

Summary

About the compression formats for Flux.1:

Compression formats affect image quality.
Upscaling is also effective with Flux.1.
Choose according to your VRAM.

AI image generation is progressing at an incredible speed. It’s only been a month since the release of Flux.1, and in that time, new technologies have been released one after another.

August 1, 2024: Flux.1 released
August 9, 2024: Forge added support for Flux.1 and NF4 format
August 11, 2024: EasyForge released
August 15, 2024: Forge added support for gguf format
August 25, 2024: EasyForge added gguf conversion support

It’s unbelievable that so much new technology is being released for free on a weekly basis.

Has there ever been a more exciting time?

The journey to explore the potential of Flux.1 is far from over.

Thank you for reading to the end!

Four-Panel Comic Prompt

A high school boy aspiring to be a manga artist, with brown hair and wearing a white T-shirt with "comic" written in red, is looking at his laptop in his room thinking, "FP16 is beautiful but heavy."
Next, he looks up and thinks, "Q_8 and Q_4 are much lighter."
Then, he looks to the side thinking, "NF4 is fast but loses accuracy."
Finally, with a smile, he concludes, "It all depends on VRAM!"

Generated with Anifusion.