TrackGS: Optimizing COLMAP-Free 3D Gaussian Splatting with Global Track Constraints

Dongbo Shi1†,    Shen Cao2†,    Lubin Fan2‡,    Bojian Wu2,    Jinhui Guo2†,    Renjie Chen1‡,    Ligang Liu1,    Jieping Ye2   
1University of Science and Technology of China     2Independent Researcher    
Equal contributions.
Corresponding authors.

Abstract

While 3D Gaussian Splatting (3DGS) has advanced ability on novel view synthesis, it still depends on accurate pre-computaed camera parameters, which are hard to obtain and prone to noise. Previous COLMAP-Free methods optimize camera poses using local constraints, but they often struggle in complex scenarios. To address this, we introduce TrackGS, which incorporates feature tracks to globally constrain multi-view geometry. We select the Gaussians associated with each track, which will be trained and rescaled to an infinitesimally small size to guarantee the spatial accuracy. We also propose minimizing both reprojection and backprojection errors for better geometric consistency. Moreover, by deriving the gradient of intrinsics, we unify camera parameter estimation with 3DGS training into a joint optimization framework, achieving SOTA performance on challenging datasets with severe camera movements.



Method

Figure: Pipeline.

Given a set of images \(\mathcal{I}=\{I_{i}\}_{i=1}^{M}\), the extrinsic matrix for each image \(I_{i}\) and the intrinsic matrix are denoted by \(T_{cw,i}\) and \(K\), respectively. Our method aims to simultaneously obtain both camera intrinsics and extrinsics, as well as a 3DGS model, as demonstrated in figure. Due to the incorporation of additional variables, i.e., camera parameters, we enhance the original 3DGS with several key designs.

Our key approach is to leverage the global track constraint to explicitly capture and enforce multi-view geometric consistency, which serves as the foundation for accurately estimating both the 3DGS model and the camera parameters. During initialization, we construct Maximum Spanning Tree based on 2D matched feature points and extract global tracks. Then we initialize both the camera parameters and subsequent 3D Gaussians with the estimated 3D track points. Building on this, we propose an effective joint optimization method with three loss terms: 2D track loss, 3D track loss, and scale loss. The 2D and 3D track losses are minimized to ensure multi-view geometric consistency. The scale loss constrains the track Gaussians remain aligned with the scene's surface while preserving the expressive capability of the 3DGS model. We derive and implement the differentiable components of the camera parameters, including both the extrinsic and intrinsic matrices. This allows us to apply the chain rule, enabling seamless joint optimization of the 3DGS model and the camera parameters.





Comparisons vs CF-3DGS

We compare with CF-3DGS on Tanks and Temples and CO3D V2 datasets. The novel view synthesis results and corresponding depth maps are shown below.

Table shows the quantitative comparisons of novel view synthesis and pose accuracy of our method with baselines(HT-3DGS,CF-3DGS).

Figure: Quantitative comparisons of novel view synthesis and pose accuracy on T&T and CO3D V2.


Synthetic Dataset

We also create a \( Synthetic Dataset \) with 4 scenes to test our method's ability on camera intrinsic and extrinsic. The novel view synthesis results and corresponding depth maps are shown below. Table hows the estimation errors from CF-3DGS, COLMAP, and our approach.

Figure: Quantitative comparisons of NVS on our Synthetic dataset.
Figure: Quantitative comparisons of camera parameter accuracy on our Synthetic dataset.


Note

This work is an updated version of arXiv:2502.19800 v1.

BibTeX


@misc{shi2025trackgsoptimizingcolmapfree3d,
    title={TrackGS: Optimizing COLMAP-Free 3D Gaussian Splatting with Global Track Constraints}, 
    author={Dongbo Shi and Shen Cao and Lubin Fan and Bojian Wu and Jinhui Guo and Renjie Chen and Ligang Liu and Jieping Ye},
    year={2025},
    eprint={2502.19800},
    archivePrefix={arXiv},
}