FLAIR: Frequency- and Locality-Aware Implicit Neural Representations

arXiv 2025

Sukhun Ko Chung-Ang University Dahyeon Kye Chung-Ang University Kyle Min Intel Labs Chanho Eom Chung-Ang University Jihyong Oh Chung-Ang University

Chung-Ang University Chung-Ang University, South Korea Creative Vision and Multimedia Lab Creative Vision and Multimedia Lab Intel Labs Intel Labs, USA

{looloo330, rpekgus, cheom, jihyongoh}@cau.ac.kr,   kyle.min@intel.com

Paper Paper Code Code

TL;DR: We introduce FLAIR, combining novel RC-GAUSS and Wavelet-Energy-Guided Encoding (WEGE) to achieve frequency selectivity and spatial localization, effectively reducing spectral bias in implicit neural representations.

Abstract

RC-GAUSS Diagram

Implicit Neural Representations (INRs) leverage neural networks to map coordinates to corresponding signals, enabling continuous and compact representations. This paradigm has driven significant advances in various vision tasks. However, existing INRs lack frequency selectivity, spatial localization, and sparse representations, leading to an over-reliance on redundant signal components. Consequently, they exhibit spectral bias, tending to learn low-frequency components early while struggling to capture fine high-frequency details.

To address these issues, we propose FLAIR (Frequency- and Locality-Aware Implicit Neural Representations), which incorporates two key innovations. The first is RC-GAUSS, a novel activation designed for explicit frequency selection and spatial localization under the constraints of the time-frequency uncertainty principle (TFUP). The second is Wavelet-Energy-Guided Encoding (WEGE), which leverages the discrete wavelet transform (DWT) to compute energy scores and explicitly guide frequency information to the network.

Our method consistently outperforms existing INRs in 2D image representation and restoration, as well as 3D reconstruction.

Demo

Super-Resolution

LR input SR output
Input
FLAIR

Upscaling factor: ×4

Denoising

Noisy input Denoised output
Input
FLAIR

Noise: Poisson(30.0) & Gaussian(2.0)

Image Fitting

2D Image Fitting

The image fitting task aims to learn a 2D function mapping pixel coordinates to RGB values, minimizing the L2 distance between predictions and ground truth. Visualization results are shown for a 512×512 DIV2K image. FLAIR demonstrates the most fine-grained and detailed reconstruction capability, with quantitative results reported in the order PSNR (dB) ↑ / SSIM ↑ / LPIPS ↓, showing that it achieves state-of-the-art performance across all metrics.

Method

Overall Architecture

FLAIR’s overall architecture (a) comprises the proposed nonlinear function RC-GAUSS (b) and the Wavelet-Energy-Guided Encoding (WEGE) module (c).

Module Details

The first is RC-GAUSS (b), a novel activation designed for explicit frequency selection and spatial localization under the constraints of the time-frequency uncertainty principle (TFUP). It integrates the raised cosine (RC) function to achieve band-limited frequency selectivity (red box). While the RC provides clear frequency discrimination, it also suffers from infinite oscillations inherent to band-limited functions. These oscillations are effectively attenuated by multiplying the RC with a Gaussian envelope, enabling stable time localization (blue box).

The second is WEGE (c), which leverages the discrete wavelet transform (DWT) to compute energy scores and explicitly guide frequency information to the network. The computed score map, exhibiting edge-aware properties, is concatenated with the coordinates (x, y) to form (x, y, Wb) for encoding. However, pixel-wise concatenation of the score map often leads to discontinuities, and to mitigate this, we apply a filtering operation to the scores.

Analysis

Fast Fourier Transform Analysis of Learned Neurons

FFT Analysis
Sparsity Analysis

FLAIR neurons after FFT exhibit distinct frequency responses in the Fourier domain, enabling sparse representations. To validate this, we conducted image fitting experiments with hidden features set to 64, 128, and 256, as shown in the second row. Compared with the state-of-the-art FINER model, FLAIR achieved higher PSNR and SSIM, and lower LPIPS across all settings, with the performance gap widening at 64 and 128 hidden features. This demonstrates that FLAIR can represent images more efficiently with sparse neurons, supported by the diverse frequency responses revealed through FFT.

Short-Time Fourier Transform (STFT) experiments

STFT Analysis

Short-Time Fourier Transform (STFT) experiments are conducted to analyze how well a model preserves localized frequency content along both the spatial and frequency axes. In this setting, a 1D horizontal scan line from the original image (y = 256) is processed using a 256-sample Hann window with 75% overlap (hop size = 64), where the y-axis of the STFT plot represents normalized frequency, with larger values indicating higher frequencies. While FINER fails to accurately capture the high-frequency components present in the ground truth, FLAIR produces an expression closely resembling the ground truth, demonstrating better preservation of localized high-frequency details.