SEAL Project Page

TL;DR: A plug-and-play semantic adaptation module for single-image sticker personalization, designed to mitigate visual entanglement and structural rigidity in test-time fine-tuning pipelines.

CoRe-based Results

Reference

Output 1

Output 2

S*, sitting at desk, hands on table, sticker style, green background

Reference

Output 1

Output 2

S*, standing, close-up front view, animation style, cityscape background

Reference

Output 1

Output 2

S*, front view, anime style, forest background

UnZipLoRA-based Results

Reference

Output 1

Output 2

S*, leaning over bed, sticker style, room background

Reference

Output 1

Output 2

S*, arms stretched, front view, sticker style, room background

Reference

Output 1

Output 2

S*, holding canned food, close-up front view, sticker style, cityscape background

Synthesizing a target concept from a single reference image remains challenging in diffusion-based personalized text-to-image generation, particularly when prompts require explicit attribute edits. In the sticker domain, test-time fine-tuning methods often overfit to the reference image and suffer from visual entanglement and structural rigidity. We introduce SEAL, a plug-and-play, architecture-agnostic adaptation module that combines a Semantic-guided Spatial Attention Loss, a Split-merge Token Strategy, and a Structure-aware Layer Restriction. We also introduce StickerBench, a large-scale sticker dataset with six structured attributes for controlled evaluation of identity disentanglement and contextual controllability in single-image sticker personalization.

Method

SEAL is a plug-and-play module for existing personalization pipelines.

1

Semantic-guided Spatial Attention Loss

Aligns the concept-token cross-attention map with the object region predicted by SAM, suppressing background leakage and improving identity disentanglement.

2

Split-merge Token Strategy

Optimizes multiple auxiliary embeddings and merges them into one concept embedding, improving optimization stability under single-image supervision.

3

Structure-aware Layer Restriction

Applies spatial supervision only to semantically informative cross-attention layers, reducing overfitting to low-level layout patterns.

Key Insights

Structural Rigidity and Background Entanglement

SEAL is designed to reduce the two dominant failure modes in single-image sticker personalization.

Structural Rigidity

Existing methods often memorize the reference-specific layout, which reduces flexibility under action, pose, and composition edits.

REFERENCE

:

BASELINE

+ SEAL

Ours

Background Entanglement

Existing methods may absorb background cues into the learned concept representation, causing identity and context to become entangled.

REFERENCE

:

BASELINE

+ SEAL

Ours

Results

Qualitative and quantitative results across representative personalization pipelines.

Table 1.

Qualitative comparison across baseline methods and SEAL-integrated variants.

Figure 1.

Visual ablation study of SEAL on StickerBench for single-image sticker personalization using CoRe.

Table 2.

Ablation study of the proposed adaptation module on StickerBench for single-image sticker personalization, using CoRe.

Figure 2.

Visual analysis of structural rigidity with respect to Structure-aware Layer Restriction during embedding adaptation.

Figure 3.

Detailed analysis of cross-attention maps at the end of embedding adaptation(250 steps) and at inference.

Figure 4.

Inference-time visualization of cross-attention maps across different K values.

Table 3.

Ablation study on the split token count K for the Split-merge Token Strategy.

Figure 5.

Visual analysis of optimization stability with respect to the number of split tokens K.

Table 4.

Ablation study on prompt representations for training and inference on StickerBench for single-image sticker personalization

Dataset

StickerBench

StickerBench is a large-scale sticker dataset built for controlled evaluation of single-image sticker personalization. Each sample is annotated with structured tags under a six-attribute schema: Appearance, Emotion, Action, Camera Composition, Style, and Background.

This tag-based representation supports systematic prompt editing while keeping the target identity fixed, making it especially suitable for evaluating identity disentanglement and contextual controllability.

261K sticker images

Six structured attributes

Tag-based prompt interface

Built for controlled evaluation

BibTeX

@article{seal2026,
  title   = {SEAL: Semantic-aware Single-image Sticker Personalization with a Large-scale Sticker-tag Dataset},
  author  = {Author Names},
  journal = {Expert Systems with Applications},
  year    = {2026}
}