[Flux.1] It runs Smoothly with 6GB VRAM! A Recommendation for Using --novram

Flux.1 #ComfyUI #Flux.1 #GGUF #VRAM

2024-9-262025-2-13

Columbus 4-frame cartoon_result.png (1133×1600)

Flux.1 runs smoothly with 16GB VRAM
With --novram, it works with just 6GB VRAM
Ideally, have 32GB system RAM

Introduction

Hello, I'm Easygoing.

I write articles about AI image generation, and recently, my main focus has been on Flux.1.

Flux.1 is a high-quality AI image generation model, but its heavy VRAM usage is a drawback.

Today, I will discuss ways to run Flux.1 smoothly on low-spec PCs.

Conclusion: Use the --novram Option

To get straight to the point, Flux.1 runs smoothly with 6GB of VRAM when using the --novram option!

Illustration of a sailing ship navigating the rough Atlantic Ocean_result.png (1600×1600)

Test Environment

The testing was done in the following environment:

ComfyUI
System RAM: 32GB
GPU: RTX 4060 Ti (16GB VRAM)

flowchart LR
A(Rendering<br>1440 x 1440)
B(Upscale<br>x 1.8)
C(Rerender<br>2592 x 2592)
A-->B
B-->C

The workflow I used involved generating images followed by upscaling to create high-resolution illustrations.

I measured VRAM usage at each step and plotted the results in graphs.

Results

First, I checked the VRAM usage in the original FP16 format.

T5xxl-fp16: 9.6 GB
blue_pencil-flux1_v001-fp16: 33GB

The first generation is shown by the blue graph, and the second by the green graph.

In both cases, the VRAM usage reached 15 GB, meaning that if your system has less than 12GB VRAM, the generation time will increase dramatically.

Lowering Resolution Doesn’t Help Save VRAM?

To explore ways to generate images using less VRAM, I tried lowering the resolution.

VRAM usage graph at a resolution of 128 x 128_result.png (800×1131)

When the resolution was set to 128 x 128 (1/64th the original size), the image generation time was reduced, but the VRAM usage remained largely unchanged.

This shows that lowering the resolution does not significantly reduce VRAM usage.

I’ll explore other ways to save VRAM.

--lowvram and --novram Options

ComfyUI has two options for optimizing VRAM usage when VRAM is low.

--lowvram

The --lowvram option transfers part of the model to system RAM to reduce VRAM usage.

It can save VRAM but slows down the overall process.

--novram

The --novram option is more aggressive in saving VRAM.

Although VRAM usage becomes extremely low, the processing speed slows down significantly.

--lowvram is Barely Effective!

First, I checked VRAM usage with the --lowvram option.

T5xxl-fp16: 9.6 GB
blue_pencil-flux1_v001-fp16: 33GB

VRAM usage graph for --lowvram and --novram conditions_result.png (800×1131)

With the --lowvram setting, there was hardly any change in VRAM usage.

When testing --lowvram with SDXL, I saw some VRAM savings, but Flux.1 seems not to be well-optimized yet, perhaps because it’s still new.

--novram is Highly Effective!

In the same graph, the --novram setting significantly reduced VRAM usage, keeping it almost within 6GB.

The only time VRAM usage exceeded 6GB was during the upscaling process, which is short and unnecessary if the resolution is sufficient at 1440 x 1440.

Sailors on watch on a sailing ship_result.png (1600×1600)

However, since --novram shifts VRAM data to system RAM, the system RAM was fully utilized at 32GB.

By optimizing system RAM usage, it might be possible to generate images even faster.

Changing the Checkpoint to gguf Format

To save system RAM, I tried switching to a smaller gguf checkpoint.

T5xxl-fp16: 9.6 GB
blue_pencil-flux1_v001-Q_8.gguf: 12.5 GB (-20GB)

By using a smaller model, system RAM usage stayed below 32GB, and generation time was reduced by 2 minutes.

Illustration of a sailing ship in FP16 format_result.png (1600×1600) — FP16

Illustration of a sailing ship in Q_8.gguf format_result.png (1600×1600) — Q_8

The generation time was almost the same as with 16GB VRAM, and there was little to no degradation in image quality.

The gguf format is very effective in saving RAM.

Reducing System RAM to 16GB

Next, I tested it under conditions where system RAM was reduced.

I removed one stick of RAM, reducing system RAM to 16GB while keeping the --novram setting.

VRAM usage graph for Q_8 checkpoint with --novram and 16GB RAM_result.png (800×1131)

With system RAM maxed out at 16GB, generation time increased by 2 minutes due to memory shortages.

This suggests that further reducing the model size could speed things up again.

Further Model Optimization

To further reduce system RAM usage, I tried lightening the models even more.

T5xxl-Q_5_K_M.gguf: 3.4GB (-6.2 GB)
blue_pencil-flux1_v001-Q_4_K_M.gguf: 6.8GB (-5.7 GB)

VRAM usage graph for Q_5 and Q_4 models with --novram and 16GB RAM_result.png (800×1131)

By lightening the model, the generation time returned to what it was originally.

However, reducing it to Q_4 resulted in less detailed images.

Illustration of a sailing ship at anchor in Q_8.gguf format_result.png (1600×1600) — Q_8

Illustration of a sailing ship at anchor in Q_4_K_M.gguf format_result.png (1600×1600) — Q_4

In a system with 16GB RAM, using Q_4 for normal use and Q_8 for higher quality seems like a good balance.

Summary of Test Results

Here’s a summary of the test results in a table.

Flux.1	t5xxl	checkpoint	Option	Render	Upscale	Rerender	System RAM	Time (min)
RAM 32GB	FP16	FP16	First Draw	15.1	11.8	14.1	24.4	11:01
			Seconed Draw	14.8	11.6	14.1	19.1	09:22
			128 x 128	14.6	14.6	14.6	22.6	03:28
			--lowvram	15	12	14.2	24.4	11:19
			--novram	2.6	6.3	5.8	30.4	12:21
		Q_8	--novram	2.4	6.3	4.9	26.1	10:13
RAM 16GB	FP16	Q_8	--novram	2.8	6.4	5.2	15.1	12:08
RAM 16GB	Q_5_K_M	Q_4_K_M	--novram	2.5	6.5	5.8	11.5	09:28