Keywords Detect Cats Too! Florence-2’s Image Analysis is This Useful!

SDXL #AuraFlow #Florence-2 #Retouch #SDXL

2025-2-142025-2-17

SDXL_Detailer_ an animated female character with short brown hair and blue eyes wears a black hat with golden leaves and a blue ribbon. she holds a wand and stands before a circular window revealing a.png (1600×1600)

SDXL produces soft-tone illustrations
AuraFlow converts them to hard-tone
Florence-2 detects objects for precise retouching

Introduction

Hello, this is Easygoing.

This time, we’ll explore how to make the most of the high-performance image recognition AI Florence-2.

Theme: Black Cat and Witch

The theme for this experiment is a black cat and a witch.

SDXL_Detailer_ an animated female character resembling a witch stands before a window in a forest holding a staff with a blue crystal while a black cat peers from behind her. the scene is bathed in a .png (1600×1600)

Let’s create an illustration of a witch living in a fantasy world with her familiar, a black cat.

What Are Soft-Tone and Hard-Tone Illustrations?

Illustrations can generally be categorized into two types based on their atmosphere: soft-tone and hard-tone.

Soft-Tone

Soft and gentle touch
Low contrast with little difference between light and dark
Creates a warm and dreamy atmosphere

Hard-Tone

Vivid colors with sharp outlines
High contrast with distinct light and dark areas
Suitable for dramatic or intense scenes

Soft-Tone vs. Hard-Tone Varies by Model!

The softness or hardness of an AI-generated image depends on the model used.

Here’s my personal impression of how different models handle these tones:

I personally prefer hard-tone illustrations with a strong sense of presence. However, models like Flux.1, which specialize in hard-tone rendering, tend to lack image variation.

So, in this experiment, I’ll explore ways to convert a soft-tone illustration into a hard-tone one.

Flowchart!

Here is the Flowchart I used for this experiment.


flowchart TB
subgraph SDXL
A1("Original Image<br>(anima_pencil-XL)")
end
subgraph AuraFlow
B1("Hard-Toning<br>(AuraFlow)")
end
subgraph SDXL
C1("Character & Cat Adjustment<br>(anima_pencil-XL)")
end
subgraph SDXL
D1("Individual Fixes for Face, Hands & Cat<br>(blue_pencil-XL)")
end
subgraph Flux.1
E1("Eye Redraw<br>(blue_pencil-flux1)")
end
F1(Completed)
A1-->|HDR Processing<br>superbeast|B1
B1-->|High-Resolution Scaling|C1
C1-->D1
D1-->E1
E1-->|HDR Processing<br>superbeast|F1

The models used in this workflow are as follows:

SDXL

anima_pencil-XL-v5.0.0 (Initial Sketch)
blue_pencil-XL-v7.0.0 (Retouching)

AuraFlow

AuraFlow_0.3

Flux.1

blue_pencil-flux1-v0.0.1

Final Illustrations!

Let’s take a look at the actual illustrations.

Initial Sketch (SDXL)

SDXL_Rough_original — Left: Original Illustration | Right: HDR Processing (SuperBeast)

Rough_ HDR — Left: Original Illustration | Right: HDR Processing (SuperBeast)

First, we start with an SDXL-generated illustration.

SDXL models produce diverse compositions and expressive characters, but the overall tone tends to be soft.

To shift towards a hard-tone, I first applied a simple HDR processing.

From here, we’ll redraw the illustration using different models.

Thick Paint Style (AuraFlow)

Next, I switch to AuraFlow, a model that excels in hard-tone rendering, and recreate the illustration in a thick-painting style.

Flux.1 is the most hard-tone capable model, but using it too early in the process fixes the illustration too rigidly, making further modifications difficult.
For this reason, I chose AuraFlow instead.

For this step, I used AuraFlow with oil painting prompts to create a heavier, richer tone.

Additionally, I intentionally disrupted the image structure by using the heunpp2 sampler instead of the recommended uni_pc.

Previously, I used the SDXL Refiner for this step, but recently I’ve switched to AuraFlow, which provides better depth and color representation.

The True Value of the Refiner! What Emerges from Destructive Creation | AI image journey

High-Resolution Enhancement & Retouching

Next, I enhance the resolution and retouch the illustration.

First, I use an AI upscaler to increase the resolution.

High-Resolution Processing:

Original: 1024 x 1024 → Upscaled: 2457 x 2457 (x2.7 scale)
Upscaler Model: 4x_NMKD-Superscale-SP_178000_G

Now that the resolution is improved, I’ll move on to retouching the illustration using Florence-2, which I introduced in my previous post.

Detecting and Retouching with Florence-2!

Previously, I used Florence-2 to generate image captions.
This time, I will use it to detect objects by specifying keywords.

First, I will lightly redraw the illustration's main subjects: the cat and the woman.

Detection Keywords

girl
cat

body_detect — Left: Object Detection | Right: Retouched with SDXL

body_redraw — Left: Object Detection | Right: Retouched with SDXL

It may be a bit hard to see, but in the left image, the keywords girl and cat triggered detection, and you can see the detected areas marked with red boxes.

By applying slight retouching with SDXL on the detected areas, I was able to refine the main subjects while preserving the overall hard tone.

Retouching Specific Parts

Next, I will make individual corrections to important parts of the illustration.

Retouching the Cat

Keyword: cat

cat_detect — Left: Cat Detected | Right: Retouched with SDXL

cat_redraw — Left: Cat Detected | Right: Retouched with SDXL

Similar to the previous step, I detected the cat using the keyword and applied stronger retouching this time.

Since the image is small, it's hard to see, but when zoomed in, the refinements to the cat become quite clear.

Retouching the Face

Keyword: human face

face_detect — Left: Face Detected | Right: Retouched with SDXL

face_redraw — Left: Face Detected | Right: Retouched with SDXL

Next, I refined the character’s face.

For the base illustration, I used my usual anima_pencil-XL model, but for detailed adjustments, I switched to blue_pencil-XL, which is more stable and closer to SDXL_Base.

As you can see in the right image, the character’s face has been significantly improved.

Retouching the Hands

Keyword: hand

hand_detect — Left: Hand Detected | Right: Retouched with SDXL

hand_redraw — Left: Hand Detected | Right: Retouched with SDXL

I used blue_pencil-XL again to refine the character’s hands.

AI-generated illustrations often struggle with hands, but by individually detecting and correcting them, we can significantly improve their accuracy.

Retouching the Eyes

Keyword: eye

Finally, I refined the eyes, which are the most important part of an illustration.
For this, I used the Flux.1 model, which provides the clearest details.

By redrawing the eyes with Flux.1, the pupils became sharper, significantly improving the character’s overall impression!

The Flux.1[dev] model offers incredible texture quality, but since it cannot be used commercially, I use blue_pencil-XL instead when commercial use is required.

Final HDR Processing and Completion!

Finally, I applied HDR processing once more to complete the illustration.

The result retains the softness and composition of SDXL while achieving a sharper hard-tone expression.

Additionally, by intentionally distorting the entire illustration with AuraFlow before refining the main subjects, I was able to draw the viewer’s attention to key elements like the character and cat.

The improved textures and refined facial expressions make this a satisfying result!

Florence-2 Is Amazing!

This time, I used Florence-2 for all object detection.

In the past, I relied on YOLO (You Only Look Once) models, which are specialized AI models for individual part detection.
However, using YOLO required **separate models for each part **to be detected.

YOLO models have many pre-trained versions available for detecting faces, eyes, and hands, but there are no pre-trained models for detecting cats.

a young woman with long brown hair and blue eyes wears a blue dress and hat with a blue flower holding a golden staff with a black cat perched on her hat in a serene outdoor

Florence-2 is slightly less precise than dedicated YOLO models, but it offers the huge advantage of recognizing any object just by specifying a keyword.
This makes it far more versatile!

Florence-2 Is Secure!

Another major advantage of Florence-2 is security.

Since Florence-2 was developed and released by Microsoft, it is a trusted and safe model.

In contrast, YOLO models are created by various developers and uploaded to Hugging Face, where not all models are guaranteed to be safe.

huggingface's yolo11 model download page with comment) — YOLO11 download page – Unsafe models are marked in red

In December 2024, a cryptocurrency mining malware was found in the Ultralytics package, which is used to run YOLOv8 models in ComfyUI.

This malware wasn't inside the YOLO model itself, but was injected during the distribution process via pip packages.

As AI tools become more widely used, it is more important than ever to download models from trusted sources.

Workflow!

Now, let’s look at the Florence-2 workflow.

Since this workflow is complex and uses multiple models, I’m providing a simplified SDXL-only version that focuses on fixing the cat, face, and hands.