TrackGS: Optimizing COLMAP-Free 3D Gaussian Splatting with Global Track Constraints

Dongbo Shi^1†, Shen Cao^2†, Lubin Fan^2‡, Bojian Wu², Jinhui Guo^2†, Renjie Chen^1‡, Ligang Liu¹, Jieping Ye²

¹University of Science and Technology of China ²Independent Researcher

^†Equal contributions.

^‡Corresponding authors.

Arxiv DataSets video pdf Code (Comming Soon)

Abstract

While 3D Gaussian Splatting (3DGS) has advanced ability on novel view synthesis, it still depends on accurate pre-computaed camera parameters, which are hard to obtain and prone to noise. Previous COLMAP-Free methods optimize camera poses using local constraints, but they often struggle in complex scenarios. To address this, we introduce TrackGS, which incorporates feature tracks to globally constrain multi-view geometry. We select the Gaussians associated with each track, which will be trained and rescaled to an infinitesimally small size to guarantee the spatial accuracy. We also propose minimizing both reprojection and backprojection errors for better geometric consistency. Moreover, by deriving the gradient of intrinsics, we unify camera parameter estimation with 3DGS training into a joint optimization framework, achieving SOTA performance on challenging datasets with severe camera movements.

Method

Figure: Pipeline.

Given a set of images \(\mathcal{I}=\{I_{i}\}_{i=1}^{M}\), the extrinsic matrix for each image \(I_{i}\) and the intrinsic matrix are denoted by \(T_{cw,i}\) and \(K\), respectively. Our method aims to simultaneously obtain both camera intrinsics and extrinsics, as well as a 3DGS model, as demonstrated in figure. Due to the incorporation of additional variables, i.e., camera parameters, we enhance the original 3DGS with several key designs.

Our key approach is to leverage the global track constraint to explicitly capture and enforce multi-view geometric consistency, which serves as the foundation for accurately estimating both the 3DGS model and the camera parameters. During initialization, we construct Maximum Spanning Tree based on 2D matched feature points and extract global tracks. Then we initialize both the camera parameters and subsequent 3D Gaussians with the estimated 3D track points. Building on this, we propose an effective joint optimization method with two loss terms: 2D track loss and 3D track loss, which are minimized to ensure multi-view geometric consistency. We derive and implement the differentiable components of the camera parameters, including both the extrinsic and intrinsic matrices. This allows us to apply the chain rule, enabling seamless joint optimization of the 3DGS model and the camera parameters.

Comparisons vs CF-3DGS

We compare with CF-3DGS on Tanks and Temples and CO3D V2 datasets. The novel view synthesis results and corresponding depth maps are shown below.

Table shows the quantitative comparisons of novel view synthesis and pose accuracy of our method with baselines(HT-3DGS,CF-3DGS).

Figure: Quantitative comparisons of novel view synthesis and pose accuracy on T&T and CO3D V2.

Synthetic Dataset

We also create a \( Synthetic Dataset \) with 4 scenes to test our method's ability on camera intrinsic and extrinsic. The novel view synthesis results and corresponding depth maps are shown below. Table hows the estimation errors from CF-3DGS, COLMAP, and our approach.

Figure: Quantitative comparisons of NVS on our Synthetic dataset.

Figure: Quantitative comparisons of camera parameter accuracy on our Synthetic dataset.

TrackGS: Optimizing COLMAP-Free 3D Gaussian Splatting with Global Track Constraints

Abstract

Method

Comparisons vs CF-3DGS

Synthetic Dataset

Note

BibTeX