TL;DR
CHIMERA enables smooth and semantically consistent zero-shot image morphing through Adaptive Cache Injection (ACI) and Semantic Anchor Prompting (SAP), along with GLCS, a new morphing-oriented metric for evaluating transition quality.
Results of CHIMERA




















Abstract
Diffusion models exhibit remarkable generative ability, yet achieving smooth and semantically consistent image morphing remains a challenge. Existing approaches often yield abrupt transitions or over-saturated appearances due to the lack of adaptive structural and semantic alignments. We propose CHIMERA, a zero-shot diffusion-based framework that formulates morphing as a cached inversion–guided denoising process. To handle large semantic and appearance disparities, we propose Adaptive Cache Injection and Semantic Anchor Prompting. Adaptive Cache Injection (ACI) caches down, mid, and up blocks’ features from both inputs during DDIM inversion and re-injects them adaptively during denoising in depth- and timestep-adaptive manners, enabling natural feature fusion and smooth transitions. Semantic Anchor Prompting (SAP) leverages a vision–language model to generate a shared anchor prompt that serves as a semantic anchor, bridging dissimilar inputs and guiding the denoising process toward coherent results. Finally, we introduce the Global-Local Consistency Score (GLCS), a morphing-oriented metric that simultaneously evaluates the global harmonization of the two inputs and the smoothness of the local morphing transition. Extensive experiments and user studies show that Chimera achieves smoother and more semantically aligned transitions than existing methods, establishing a new state-of-the-art in image morphing. The code and project page will be publicly released.
Motivation & Observation
Frequency analysis of the diffusion U-Net and the denoising timesteps
Diffusion features
tend to contain more low-frequency information in the
mid layers and more high-frequency information in the up layers. In addition,
early denoising timesteps
mainly encode low-frequency information, while late timesteps contain more
high-frequency information. Based on these properties,
ACI injects diffusion features that match the characteristics of each denoising timestep.
Figure 3. Qualitative examples illustrating how CHIMERA and previous models differ in their ability to preserve smoothness, domain consistency, and perceptual quality.
The proposed CHIMERA shows a well-balanced improvement over previous methods in terms of smoothness, domain consistency, and perceptual quality.
Proposed Method
Proposed Metric
Figure 5. Qualitative examples demonstrating the effectiveness of GLCS. GLCS consists of GCS and LCS, and the qualitative results illustrate how well each component aligns with human perception.
Quantitative Results
Qualitative Results
Figure 6. IMPUS [ICLR’24] shows good domain consistency with the input image pair, but it contains abrupt transitions and therefore lacks smoothness. DiffMorpher [CVPR’24] provides smoother transitions, but its domain consistency is weak, with objects disappearing or becoming unstable. FreeMorph [ICCV’25] produces overly saturated colors, which are common artifacts in diffusion-based generation. In contrast, the proposed CHIMERA maintains both smoothness and domain consistency.
Figure 7. This qualitative evaluation presents the more challenging 14-image morphing results. Consistent with Fig. 6, CHIMERA maintains both smoothness and domain consistency in this extended setting.
BibTeX citation
TBD;