Given a set of images \(\mathcal{I}=\{I_{i}\}_{i=1}^{M}\), the extrinsic matrix for each image \(I_{i}\) and the intrinsic matrix are denoted by \(T_{cw,i}\) and \(K\), respectively. Our method aims to simultaneously obtain both camera intrinsics and extrinsics, as well as a 3DGS model, as demonstrated in figure.
Due to the incorporation of additional variables, i.e., camera parameters, we enhance the original 3DGS with several key designs.
Our key approach is to leverage the global track constraint to explicitly capture and enforce multi-view geometric consistency, which serves as the foundation for accurately estimating both the 3DGS model and the camera parameters.
During initialization, we construct Maximum Spanning Tree based on 2D matched feature points and extract global tracks.
Then we initialize both the camera parameters and subsequent 3D Gaussians with the estimated 3D track points.
Building on this, we propose an effective joint optimization method with three loss terms: 2D track loss, 3D track loss, and scale loss.
The 2D and 3D track losses are minimized to ensure multi-view geometric consistency.
The scale loss constrains the track Gaussians remain aligned with the scene's surface while preserving the expressive capability of the 3DGS model.
We derive and implement the differentiable components of the camera parameters, including both the extrinsic and intrinsic matrices.
This allows us to apply the chain rule, enabling seamless joint optimization of the 3DGS model and the camera parameters.