AlignCVC: Aligning Cross-View Consistency for Single-Image-to-3D Generation

Xinyue Liang*, Zhiyuan Ma*, Lingchen Sun, Yanjun Guo, Lei Zhang
Department of Computing, The Hong Kong Polytechnic University, Hong Kong, China
* Equal contribution
Corresponding Author

arXiv 2025

AlignCVC Teaser

AlignCVC aligns cross-view consistency for high-quality single-image-to-3D generation

Method Overview

AlignCVC Method

Our AlignCVC framework employs a soft-hard alignment strategy to improve cross-view consistency in single-image-to-3D generation, aligning both generated and reconstructed multi-view distributions toward the ground-truth distribution.

Abstract

Single-image-to-3D models typically follow a sequential generation and reconstruction workflow. However, intermediate multi-view images synthesized by pre-trained generation models often lack cross-view consistency (CVC), significantly degrading 3D reconstruction performance. While recent methods attempt to refine CVC by feeding reconstruction results back into the multi-view generator, these approaches struggle with noisy and unstable reconstruction outputs that limit effective CVC improvement.

We introduce AlignCVC, a novel framework that fundamentally re-frames single-image-to-3D generation through distribution alignment rather than relying on strict regression losses. Our key insight is to align both generated and reconstructed multi-view distributions toward the ground-truth multi-view distribution, establishing a principled foundation for improved CVC. Observing that generated images exhibit weak CVC while reconstructed images display strong CVC due to explicit rendering, we propose a soft-hard alignment strategy with distinct objectives for generation and reconstruction models. This approach not only enhances generation quality but also dramatically accelerates inference to as few as 4 steps.

As a plug-and-play paradigm, our method, namely AlignCVC, seamlessly integrates various multi-view generation models with 3D reconstruction models. Extensive experiments demonstrate the effectiveness and efficiency of AlignCVC for single-image-to-3D generation.

Results

AlignCVC Results

Qualitative Comparisons

Comparison with Ouroboros3D

Comparison with Ouroboros3D

Comparison with Gen-3Dffusion

Comparison with Gen-3Dffusion

BibTeX


@misc{liang2025aligncvcaligningcrossviewconsistency,
  title={AlignCVC: Aligning Cross-View Consistency for Single-Image-to-3D Generation}, 
  author={Xinyue Liang and Zhiyuan Ma and Lingchen Sun and Yanjun Guo and Lei Zhang},
  year={2025},
  eprint={2506.23150},
  archivePrefix={arXiv},
  primaryClass={cs.CV},
  url={https://arxiv.org/abs/2506.23150}, 
}