Keywords Detect Cats Too! Florence-2’s Image Analysis is This Useful!

SDXL_Detailer_ an animated female character with short brown hair and blue eyes wears a black hat with golden leaves and a blue ribbon. she holds a wand and stands before a circular window revealing a.png (1600×1600)
  • SDXL produces soft-tone illustrations
  • AuraFlow converts them to hard-tone
  • Florence-2 detects objects for precise retouching

Introduction

Hello, this is Easygoing.

This time, we’ll explore how to make the most of the high-performance image recognition AI Florence-2.

Theme: Black Cat and Witch

The theme for this experiment is a black cat and a witch.

SDXL_Detailer_ an animated female character resembling a witch stands before a window in a forest holding a staff with a blue crystal while a black cat peers from behind her. the scene is bathed in a .png (1600×1600)

Let’s create an illustration of a witch living in a fantasy world with her familiar, a black cat.

What Are Soft-Tone and Hard-Tone Illustrations?

Illustrations can generally be categorized into two types based on their atmosphere: soft-tone and hard-tone.

Soft-Tone

Rough_original
SDXL
  • Soft and gentle touch
  • Low contrast with little difference between light and dark
  • Creates a warm and dreamy atmosphere

Hard-Tone

Flux1_redraw
Flux.1
  • Vivid colors with sharp outlines
  • High contrast with distinct light and dark areas
  • Suitable for dramatic or intense scenes

Soft-Tone vs. Hard-Tone Varies by Model!

The softness or hardness of an AI-generated image depends on the model used.

Here’s my personal impression of how different models handle these tones:

Soft Tone Hard Tone SD1.5 SDXL Midjourney SD3.5 Imagen3 AuraFlow Flux1

I personally prefer hard-tone illustrations with a strong sense of presence. However, models like Flux.1, which specialize in hard-tone rendering, tend to lack image variation.

So, in this experiment, I’ll explore ways to convert a soft-tone illustration into a hard-tone one.

Flowchart!

Here is the Flowchart I used for this experiment.


flowchart TB
subgraph SDXL
A1("Original Image<br>(anima_pencil-XL)")
end
subgraph AuraFlow
B1("Hard-Toning<br>(AuraFlow)")
end
subgraph SDXL
C1("Character & Cat Adjustment<br>(anima_pencil-XL)")
end
subgraph SDXL
D1("Individual Fixes for Face, Hands & Cat<br>(blue_pencil-XL)")
end
subgraph Flux.1
E1("Eye Redraw<br>(blue_pencil-flux1)")
end
F1(Completed)
A1-->|HDR Processing<br>superbeast|B1
B1-->|High-Resolution Scaling|C1
C1-->D1
D1-->E1
E1-->|HDR Processing<br>superbeast|F1

The models used in this workflow are as follows:

SDXL

  • anima_pencil-XL-v5.0.0 (Initial Sketch)
  • blue_pencil-XL-v7.0.0 (Retouching)

AuraFlow

  • AuraFlow_0.3

Flux.1

  • blue_pencil-flux1-v0.0.1

Final Illustrations!

Let’s take a look at the actual illustrations.

Initial Sketch (SDXL)

SDXL_Rough_original Rough_ HDR
Left: Original Illustration | Right: HDR Processing (SuperBeast)

First, we start with an SDXL-generated illustration.

SDXL models produce diverse compositions and expressive characters, but the overall tone tends to be soft.

To shift towards a hard-tone, I first applied a simple HDR processing.

From here, we’ll redraw the illustration using different models.

Thick Paint Style (AuraFlow)

Next, I switch to AuraFlow, a model that excels in hard-tone rendering, and recreate the illustration in a thick-painting style.

Flux.1 is the most hard-tone capable model, but using it too early in the process fixes the illustration too rigidly, making further modifications difficult.
For this reason, I chose AuraFlow instead.

AuraFlow_ redraw

For this step, I used AuraFlow with oil painting prompts to create a heavier, richer tone.

Additionally, I intentionally disrupted the image structure by using the heunpp2 sampler instead of the recommended uni_pc.

Previously, I used the SDXL Refiner for this step, but recently I’ve switched to AuraFlow, which provides better depth and color representation.

High-Resolution Enhancement & Retouching

Next, I enhance the resolution and retouch the illustration.

First, I use an AI upscaler to increase the resolution.

High-Resolution Processing:

  • Original: 1024 x 1024 → Upscaled: 2457 x 2457 (x2.7 scale)
  • Upscaler Model: 4x_NMKD-Superscale-SP_178000_G

Now that the resolution is improved, I’ll move on to retouching the illustration using Florence-2, which I introduced in my previous post.

Detecting and Retouching with Florence-2!

Previously, I used Florence-2 to generate image captions.
This time, I will use it to detect objects by specifying keywords.

First, I will lightly redraw the illustration's main subjects: the cat and the woman.

Detection Keywords

  • girl
  • cat
body_detect body_redraw
Left: Object Detection | Right: Retouched with SDXL

It may be a bit hard to see, but in the left image, the keywords girl and cat triggered detection, and you can see the detected areas marked with red boxes.

By applying slight retouching with SDXL on the detected areas, I was able to refine the main subjects while preserving the overall hard tone.

Retouching Specific Parts

Next, I will make individual corrections to important parts of the illustration.

Retouching the Cat

  • Keyword: cat
cat_detect cat_redraw
Left: Cat Detected | Right: Retouched with SDXL

Similar to the previous step, I detected the cat using the keyword and applied stronger retouching this time.

Since the image is small, it's hard to see, but when zoomed in, the refinements to the cat become quite clear.

Retouching the Face

  • Keyword: human face
face_detect face_redraw
Left: Face Detected | Right: Retouched with SDXL

Next, I refined the character’s face.

For the base illustration, I used my usual anima_pencil-XL model, but for detailed adjustments, I switched to blue_pencil-XL, which is more stable and closer to SDXL_Base.

As you can see in the right image, the character’s face has been significantly improved.

Retouching the Hands

  • Keyword: hand
hand_detect hand_redraw
Left: Hand Detected | Right: Retouched with SDXL

I used blue_pencil-XL again to refine the character’s hands.

AI-generated illustrations often struggle with hands, but by individually detecting and correcting them, we can significantly improve their accuracy.

Retouching the Eyes

  • Keyword: eye

Finally, I refined the eyes, which are the most important part of an illustration.
For this, I used the Flux.1 model, which provides the clearest details.

eye_detect eye_redraw

By redrawing the eyes with Flux.1, the pupils became sharper, significantly improving the character’s overall impression!

The Flux.1[dev] model offers incredible texture quality, but since it cannot be used commercially, I use blue_pencil-XL instead when commercial use is required.

Final HDR Processing and Completion!

Finally, I applied HDR processing once more to complete the illustration.

Illustration completed with retouching

The result retains the softness and composition of SDXL while achieving a sharper hard-tone expression.

Additionally, by intentionally distorting the entire illustration with AuraFlow before refining the main subjects, I was able to draw the viewer’s attention to key elements like the character and cat.

The improved textures and refined facial expressions make this a satisfying result!

Florence-2 Is Amazing!

This time, I used Florence-2 for all object detection.

In the past, I relied on YOLO (You Only Look Once) models, which are specialized AI models for individual part detection.
However, using YOLO required **separate models for each part **to be detected.

YOLO models have many pre-trained versions available for detecting faces, eyes, and hands, but there are no pre-trained models for detecting cats.

a young woman with long brown hair and blue eyes wears a blue dress and hat with a blue flower holding a golden staff with a black cat perched on her hat in a serene outdoor

Florence-2 is slightly less precise than dedicated YOLO models, but it offers the huge advantage of recognizing any object just by specifying a keyword.
This makes it far more versatile!

Florence-2 Is Secure!

Another major advantage of Florence-2 is security.

Since Florence-2 was developed and released by Microsoft, it is a trusted and safe model.

In contrast, YOLO models are created by various developers and uploaded to Hugging Face, where not all models are guaranteed to be safe.

huggingface's yolo11 model download page with comment)
YOLO11 download page – Unsafe models are marked in red

In December 2024, a cryptocurrency mining malware was found in the Ultralytics package, which is used to run YOLOv8 models in ComfyUI.

This malware wasn't inside the YOLO model itself, but was injected during the distribution process via pip packages.

As AI tools become more widely used, it is more important than ever to download models from trusted sources.

Workflow!

Now, let’s look at the Florence-2 workflow.

Since this workflow is complex and uses multiple models, I’m providing a simplified SDXL-only version that focuses on fixing the cat, face, and hands.

Florence-2_Detailer workflow 2025.2.14

Download

Florence-2_Detailer 2025.2.14.json

One-Click Free Generation on SeaArt AI (Requires registration)

I will explain each node and its setup in detail in a future article!

Conclusion: Florence-2 Is Amazing!

  • SDXL produces soft-tone illustrations
  • AuraFlow converts them to hard-tone
  • Florence-2 detects objects for precise retouching

This time, I used Florence-2 for both object detection and illustration retouching.

Florence-2 is easy to use and highly versatile.
I believe it will soon become an essential tool for anyone working in image editing.

a woman in a witchs hat stands before an ornate circular portal in a dimly lit stone - walled chamber illuminated by a warm golden hue from the setting or rising sun.

Enhancing SDXL Textures!

AI-generated art is evolving rapidly. In the future, a more powerful and user-friendly AI will likely surpass Florence-2.

However, the experiments and techniques developed using YOLO and Florence-2 will always be valuable.

I encourage everyone to try out Florence-2’s object detection and explore its possibilities!

Thank you for reading to the end!