Is Flux.1 Getting Faster? Speedy Generation with WaveSpeed! What About Quality?

Flux.1 #Dynamic Caching #Flux.1 #Image Difference #WaveSpeed

2025-1-172025-2-17

Flux1_heunpp2_ A girl in a school uniform standing at a train station at night with a train in the background.png (1600×1600)

Dynamic Caching improves speed but can reduce quality.
With the right settings, quality degradation can be minimized.
Using start = 0.2, end = 0.8, max_consecutive_cache_hits = 5 is better than leaving everything to default.

Animated Illustration of A woman in a business suit pointing at a train at night with a cityscape in the background.png (2480×1753)

Introduction

Hello, this is Easygoing.

Today, I’ll be testing WaveSpeed, a groundbreaking technology designed to accelerate image generation.

What Is WaveSpeed?

WaveSpeed is a revolutionary technology for dramatically speeding up image generation, released on January 8, 2025.

chengzeyi/Comfy-WaveSpeed: The all in one inference optimization solution for ComfyUI, universal, flexible, and fast.

WaveSpeed is provided as a custom node for ComfyUI, meaning it’s exclusively compatible with ComfyUI.

It supports image generation models such as Flux.1, SD 3.5, and SDXL, as well as video generation models like HunyuanVideo and LTXV.

How Does WaveSpeed Work?

WaveSpeed achieves its speed boost using two key mechanisms:

Dynamic Caching (First Block Cache)
Enhanced torch.compile

For simplicity, we’ll focus on Dynamic Caching (First Block Cache), which is easy to set up—just install a single custom node.

Let’s evaluate its performance.

Animated Illustration of Three anime characters in school uniforms standing in front of a train at sunset.png (2480×1753)

How Does Dynamic Caching Work?

Dynamic Caching’s mechanism is complex, so I asked ChatGPT to explain it based on the source code:

What Is Dynamic Cache?
Dynamic Cache stores intermediate computation results during inference and reuses them to skip redundant calculations, significantly reducing processing time.

How It Works

Results from the first inference are saved in a cache.

For subsequent inferences with the same conditions, cached results are reused, skipping recalculations.

Benefits

Drastically shortens processing time.

Particularly effective for repeated inferences with identical prompts or settings.

Use Cases

Ideal for scenarios requiring high-speed processing or efficient repeated inference.

In summary, Dynamic Caching skips computations by reusing previous results when conditions suggest minimal changes.

A girl in a school uniform stands in a train station at night holding a briefcase

The Key: Residual Diff Threshold (RDT)

The Residual Diff Threshold (RDT) determines how much deviation is acceptable before skipping calculations.

What Is RDT?

Residual: The difference between actual and predicted/cached values.
Diff: The magnitude of the difference.
Threshold: The boundary value that dictates when caching is applied.

For example:

RDT = 0.1: Allows up to 10% error.
RDT = 1.0: Skips all calculations.

Let’s see how RDT affects generation time and quality.

How Much Faster Does Dynamic Caching Make It?

First, here’s a graph summarizing the test results:

WaveSpeed_start_0_end_1_max_consecutive_cache_hits_-1.png (1200×848)

Horizontal axis: Time taken for image generation. Lower values indicate faster generation.
Vertical axis: Similarity to the original image, measured using MAE (Mean Absolute Error) and SSIM (Structural Similarity Index). Lower values indicate greater quality degradation.

The graph shows that higher RDT values lead to faster generation, but at the cost of reduced image quality.

MAE and SSIM validation tools

Image Difference Checker | AI image journey

Comparing Actual Images

Let’s compare actual images generated with and without caching.

Original (No Caching)

euler
normal
30 steps
1440 x 1440

residual_diff_threshold_0.00.png (1440×1440)

This is the original illustration, generated without using any caching.

Subsequent images will include a difference map (in black) on the right, showing the color differences from the original.

RDT: 0.04

residual_diff_threshold_0.04_start_0_end_1_diff.png (2065×1535)

With RDT set to 0.04, minor differences are visible in the difference map. Some subtle degradation has begun.

RDT: 0.06

residual_diff_threshold_0.06_start_0_end_1_diff.png (2065×1535)

At RDT 0.06, degradation is more noticeable, but still within acceptable limits.

RDT: 0.08

residual_diff_threshold_0.08_start_0_end_1_diff.png (2065×1535)

At RDT 0.08, significant changes appear. The foreground details are simplified, and the background becomes blurry. The overall noise reduction is insufficient.

Although subtle in the smaller image, the degradation is more pronounced at the original resolution (1440 x 1440).

RDT: 0.16

residual_diff_threshold_0.16_start_0_end_1_diff.png (2065×1535)

At RDT 0.16, the illustration becomes heavily blurred and incomplete, making it unsuitable for use.

Results!

Here’s a summary of the measurements:

RDT	time (sec)	MAE	Similarity	SSIM	Similarity
0.00	205	0	100.0 %	1	100.0 %
0.01	205	0	100.0 %	1	100.0 %
0.02	205	0	100.0 %	1	100.0 %
0.04	160	6.19	97.6 %	0.98	98.2 %
0.06	127	16.24	93.7 %	0.92	91.8 %
0.08	102	25.51	90.0 %	0.84	83.9 %
0.10	95	25.12	90.2 %	0.84	84.4 %
0.12	82	25.45	90.1 %	0.84	84.4 %
0.14	76	25.31	90.1 %	0.85	84.6 %
0.16	69	29.41	88.5 %	0.81	80.7 %
0.18	63	29.96	88.3 %	0.8	80.2 %
0.20	63	31.19	87.8 %	0.79	78.9 %
0.30	50	35.33	86.2 %	0.74	74.0 %
0.40	37	38.05	85.1 %	0.7	70.0 %
0.50	37	51.24	83.9 %	0.65	65.1 %
0.60	30	44.27	82.7 %	0.6	59.6 %
0.80	30	46.87	81.7 %	0.55	54.5 %
1.00	24	47.55	81.4 %	0.53	53.3 %

SSIM ≥ 95%: Changes are subtle and mostly tolerable.
SSIM < 85%: Degradation becomes clearly visible.

Based on these results, RDT 0.06 is generally the upper limit for tolerable degradation in most cases.

Dynamic Caching Has More Configuration Options!

The Apply First Block Cache node of Dynamic Caching offers several configuration options beyond RDT.

Apply First Block Cache Node.png (2272×1722)

start
- Defines when caching begins during the generation process.
end
- Specifies when caching ends during the generation process.
max_consecutive_cache_hits
- Limits the number of consecutive cache uses.
- Default is -1 (unlimited).

While RDT automatically decides when caching is applied, these settings allow for manual intervention.

ChatGPT’s Recommended Settings

Since these settings can be complex, I asked ChatGPT for its recommended values.

ChatGPT’s Suggested Configuration

start = 0.2
end = 0.8
max_consecutive_cache_hits = 5

Let’s test these values and see how they affect the results.

RDT: 0.04

residual_diff_threshold_0.04_start_0.2_end_0.8_diff.png (2065×1535)

With RDT = 0.04, image generation takes longer compared to the previous start = 0, end = 1 configuration, but the resulting illustration is closer to the original.

RDT: 0.08

residual_diff_threshold_0.08_start_0.2_end_0.8_diff.png (2065×1535)

Previously, RDT = 0.08 caused noticeable degradation.
Now, with ChatGPT’s suggested settings, MAE and SSIM both exceed 97%, and the degradation is barely noticeable.

RDT: 0.16

residual_diff_threshold_0.16_start_0.2_end_0.8_diff.png (2065×1535)

Previously, RDT = 0.16 resulted in an incomplete illustration.
Now, MAE and SSIM remain in the high 90s, making the illustration usable.

RDT: 1.0

residual_diff_threshold_1.0_start_0.2_end_0.8_diff.png (2065×1535)

At RDT = 1.0, all steps except those manually specified are skipped.

In this configuration, out of 30 total steps, only steps 1–6, 12, 18, and 24–30 perform actual inference, while the rest rely on cached results.

With roughly half the steps skipped, generation time is halved, but MAE and SSIM still exceed 95%, maintaining acceptable quality.

Results

Below are the actual results, compared side-by-side with the start = 0, end = 1 configuration.

WaveSpeed_start_0.2_end_0.8_max_consecutive_cache_hits_5 adjust scale.png (1200×848)

RDF	time (sec)	MAE	Similarity	SSIM	Similarity
0.00	203	0	100.0 %	1	100.0 %
0.01	205	0	100.0 %	1	100.0 %
0.02	205	0	100.0 %	1	100.0 %
0.04	173	3.95	98.5 %	0.99	99.1 %
0.06	153	6.23	97.6 %	0.98	98.2 %
0.08	147	6.8	97.3 %	0.98	98.0 %
0.10	134	8.18	96.8 %	0.98	97.5 %
0.12	127	9.12	96.4 %	0.97	97.1 %
0.14	127	10.41	95.9 %	0.96	96.4 %
0.16	121	10.75	95.8 %	0.96	96.3 %
0.18	121	10.75	95.8 %	0.96	96.3 %
0.20	121	10.83	95.8 %	0.96	96.3 %
0.40	114	10.96	95.7 %	0.96	96.3 %
0.60	114	10.96	95.7 %	0.96	96.3 %
0.80	114	10.96	95.7 %	0.96	96.3 %
1.00	114	10.96	95.7 %	0.96	96.3 %

The comparison shows that ChatGPT’s recommended configuration of start = 0.2, end = 0.8, max_consecutive_cache_hits = 5 achieves a better balance between speed and quality.

It’s Faster but Tricky!

While Dynamic Caching (First Block Cache) undeniably accelerates image generation, it’s a challenging feature to optimize.

Using only a high RDT with default settings can lead to significant quality degradation.

The recommended settings of start = 0.2, end = 0.8, max_consecutive_cache_hits = 5 offer a decent balance, but there’s still room for improvement.

Animated Illustration of A girl in a school uniform standing in front of a train and a train on a railway track.png (2480×1753)

Optimal settings will likely vary between architectures (Flux.1, SD 3.5, SDXL) and custom models.

Dynamic Caching is an innovative technology, but mastering it will require further experimentation and knowledge sharing.

Conclusion: Dynamic Caching Is Challenging

Dynamic Caching improves speed but can reduce quality.
With the right settings, quality degradation can be minimized.
Using start = 0.2, end = 0.8, max_consecutive_cache_hits = 5 is better than leaving everything to default.

This was a particularly complex evaluation.

Animated Illustration of A girl in a school uniform standing in a cityscape at night, with a high-speed train in the background.png (2480×1753)

In my opinion, Flux.1 is already highly optimized for speed through innovations like V-prediction, making groundbreaking speed improvements difficult.

I plan to continue testing to determine the optimal settings in the future.

Thank you for reading until the end!