Chung-Ang University
NAVER Cloud
Lunit Inc.












Synthesizing a target concept from a single reference image remains challenging in diffusion-based personalized text-to-image generation, particularly when prompts require explicit attribute edits. In the sticker domain, test-time fine-tuning methods often overfit to the reference image and suffer from visual entanglement and structural rigidity. We introduce SEAL, a plug-and-play, architecture-agnostic adaptation module that combines a Semantic-guided Spatial Attention Loss, a Split-merge Token Strategy, and a Structure-aware Layer Restriction. We also introduce StickerBench, a large-scale sticker dataset with six structured attributes for controlled evaluation of identity disentanglement and contextual controllability in single-image sticker personalization.
SEAL is a plug-and-play module for existing personalization pipelines.
Aligns the concept-token cross-attention map with the object region predicted by SAM, suppressing background leakage and improving identity disentanglement.
Optimizes multiple auxiliary embeddings and merges them into one concept embedding, improving optimization stability under single-image supervision.
Applies spatial supervision only to semantically informative cross-attention layers, reducing overfitting to low-level layout patterns.
SEAL is designed to reduce the two dominant failure modes in single-image sticker personalization.
Existing methods often memorize the reference-specific layout, which reduces flexibility under action, pose, and composition edits.
Existing methods may absorb background cues into the learned concept representation, causing identity and context to become entangled.
Qualitative and quantitative results across representative personalization pipelines.
Qualitative comparison across baseline methods and SEAL-integrated variants.
Visual ablation study of SEAL on StickerBench for single-image sticker personalization using CoRe.
Ablation study of the proposed adaptation module on StickerBench for single-image sticker personalization, using CoRe.
Visual analysis of structural rigidity with respect to Structure-aware Layer Restriction during embedding adaptation.
Detailed analysis of cross-attention maps at the end of embedding adaptation(250 steps) and at inference.
Inference-time visualization of cross-attention maps across different K values.
Ablation study on the split token count K for the Split-merge Token Strategy.
Visual analysis of optimization stability with respect to the number of split tokens K.
Ablation study on prompt representations for training and inference on StickerBench for single-image sticker personalization
StickerBench is a large-scale sticker dataset built for controlled evaluation of single-image sticker personalization. Each sample is annotated with structured tags under a six-attribute schema: Appearance, Emotion, Action, Camera Composition, Style, and Background.
This tag-based representation supports systematic prompt editing while keeping the target identity fixed, making it especially suitable for evaluating identity disentanglement and contextual controllability.
@article{seal2026,
title = {SEAL: Semantic-aware Single-image Sticker Personalization with a Large-scale Sticker-tag Dataset},
author = {Author Names},
journal = {Expert Systems with Applications},
year = {2026}
}