Project Icon

MoRel


Long-Range Flicker-Free 4D Motion Modeling Anchor Relay-based Bidirectional Blending via with Hierarchical Densification

Etri Icon 1ETRI, South Korea       CAU Icon 2Chung-Ang University, South Korea
*Equal contribution     Corresponding author

TL;DR

MoRel is a 4D Gaussian Splatting framework that achieves temporally coherent, flicker-free long-range motion scene reconstruction under bounded memory via anchor relay-based bidirectional blending. It further employs feature-variance-guided hierarchical densification to allocate representation capacity where needed, improving reconstruction quality without sacrificing efficiency.

Long-Range 4D Reconstruction

Long-Range 4D Reconstruction

All-at-Once vs MoRel

LocalDyGS (ICCV'25)
MoRel (OURS)

Compared to all-at-once methods, MoRel provides more faithful long-range motion reconstruction and improved temporal consistency, while still avoiding GPU memory explosion.

Chunk-Based vs MoRel

GIFStream (CVPR'25)
MoRel (OURS)

While chunk-based baselines suffer from boundary flicker and appearance shifts between segments, MoRel’s anchor relay and bidirectional blending yield stable, temporally coherent long-range 4D reconstruction.

Slide to compare MoRel against All-at-once and Chunk-based baselines on long-range dynamic scenes.

Abstract

Recent advances in 4D Gaussian Splatting (4DGS) have extended the high-speed rendering capability of 3D Gaussian Splatting (3DGS) into the temporal domain, enabling real-time rendering of dynamic scenes. However, one of the major remaining challenges lies in modeling long-range motion-contained dynamic videos, where a naïve extension of existing methods leads to severe memory explosion, temporal flickering, and failure to handle appearing or disappearing occlusions over time.

To address these challenges, we propose a novel 4DGS framework characterized by an Anchor Relay-based Bidirectional Blending (ARBB) mechanism, named MoRel, which enables temporally consistent and memory-efficient modeling of long-range dynamic scenes. Our method progressively constructs locally canonical anchor spaces at key-frame time index and models inter-frame deformations at the anchor level, enhancing temporal coherence.

By learning bidirectional deformations between KfA and adaptively blending them through learnable opacity control, our approach mitigates temporal discontinuities and flickering artifacts. We further introduce a Feature-variance-guided Hierarchical Densification (FHD) scheme that effectively densifies KfA's while keeping rendering quality, based on an assigned level of feature-variance.

To effectively evaluate our model's capability to handle real-world long-range 4D motion, we newly compose long-range 4D motion-contained dataset, called SelfCapLR. It has larger average dynamic motion magnitude, captured at spatially wider spaces, compared to previous dynamic video datasets.

Overall, our MoRel achieves temporally coherent and flicker-free long-range 4D reconstruction while maintaining bounded memory usage, demonstrating both scalability and efficiency in dynamic Gaussian-based representations.

Motivation & Analysis

Why Is Long-Range 4D Motion Still Challenging?

Long-range dynamic scenes with thousands of frames expose fundamental limitations of existing 4D Gaussian Splatting pipelines. All-at-once training suffers from unbounded memory growth, chunk-based optimization introduces temporal flicker at chunk boundaries, and prior variants trade off temporal consistency, disocclusion handling, and random access in different ways. MoRel is designed by revisiting these failure modes and jointly addressing them.

1. All-at-once Training: Memory Explosion & Limited Fidelity

All-at-once training vs. MoRel

The first analysis compares all-at-once 4DGS with our approach as the sequence length increases. All-at-once training exhibits almost linear growth in memory usage with respect to the number of frames and eventually overflows on 3k-frame sequences, while our anchor relay design keeps memory bounded. At the same time, our method preserves higher representation fidelity for long-range motion.

2. Chunk-based Training: Temporal Flicker at Boundaries

Chunk-based training vs. MoRel

The second analysis visualizes the frame-wise temporal optical flow (tOF) and 1D temporal profiles for chunk-based baselines. Unidirectional chunk training produces periodic spikes in tOF and discontinuities in the temporal profile, manifesting as visible flickering around chunk boundaries. In contrast, our bidirectional anchor relay yields stable tOF curves and smooth temporal transitions.

3. Limitations of Existing Paradigms for Long-Range 4D Motion

Challenges for modeling long-range 4D motion

The third figure summarizes representative 4DGS paradigms in terms of GPU memory, disocclusion handling, temporal consistency, system complexity, and temporal random access. Existing approaches satisfy only a subset of these requirements, revealing structural limitations. Our Anchor Relay-based Bidirectional Blending (MoRel) is designed to simultaneously achieve bounded memory, robust disocclusion, strong temporal coherence, and random access with moderate system complexity.

Method Overview

Figure 1
Figure 2
Figure 3

Global Canonical Anchor (GCA)

MoRel first trains a single Canonical Anchor (GCA) over the entire sequence from an initial point cloud. The GCA provides a globally consistent appearance prior and anchor features whose variances are later used to assign frequency-aware levels for FHD.

Key-frame Anchors & Anchor Relay

Periodically placed Key-frame Anchors (KfAs) are initialized from the level-assigned GCA and locally refined around their key-frame indices. Each KfA acts as a local canonical space for its temporal neighborhood, forming an anchor relay that supports long-range modeling and random temporal access under bounded memory.

Bidirectional Deformation & Blending

In the Progressive Windowed Deformation (PWD) stage, each KfA learns forward and backward deformation fields within a local bidirectional deformation window, with anchors loaded on demand to keep GPU memory bounded. The Intermediate Frame Blending (IFB) stage then learns temporal opacity weights to fuse neighboring KfAs, yielding smooth transitions and consistent long-range motion.

Feature-variance-guided Hierarchical Densification

Feature-variance-guided Hierarchical Densification (FHD) uses the variance of anchor features as a proxy for local frequency, assigning levels and modulating gradient-based densification accordingly. Low-frequency structure is stabilized early, while high-frequency regions are refined later, improving reconstruction quality without exceeding the memory budget.

Datasets

Table 1: Dataset statistics
Figure: Dataset visualization

Results

Quantitative Results on Real-World Sequences

Quantitative results comparison on our newly composed SelfCapLR. Group denotes (a) all-at-once training methods, (b) chunk-based approaches including our unidirectional deformation variant, and (c) our MoRel model. Red and blue denote the best and second-best performances, respectively. Each 3-metric block is reported as PSNR (dB)↑ / SSIM↑ / LPIPS↓.

Quantitative Results on Real-World Sequences

Quantitative results comparison on the DyCheck-iPhone datasets. Each block element denotes mPSNR (dB)↑ / mSSIM↑ / mLPIPS↓ / Storage (MB)↓.

Demo

Long-range 4D Reconstruction Demo

MoRel reconstructs long-range motion sequences with stable appearance and flicker-free rendering under a bounded memory budget.

BibTeX

@article{Kwak2025MoRel,
  title   = {MoRel: Long-Range Flicker-Free 4D Motion Modeling via Anchor Relay-based Bidirectional Blending with Hierarchical Densification},
  author  = {Sangwoon, Kwak and Weeyoung, Kwon and Jun Young, Jeong and Geonho, Kim and Won-Sik, Cheong and ihyong, Oh},
  journal = {arXiv preprint arXiv:xxxx.xxxxx},
  year    = {2025}
}