当前位置: 首页 > news >正文

用于密集视觉冲击的紧凑三维高斯散射Compact 3D Gaussian Splatting For Dense Visual SLAM

Compact 3D Gaussian Splatting For Dense Visual SLAM

Tianchen Deng  邓天辰11Yaohui Chen  陈耀辉11Leyan Zhang  张乐妍11Jianfei Yang  杨健飞22Shenghai Yuan  圣海元22Danwei Wang  王丹伟22Weidong Chen  陈卫东11
Abstract 摘要      Compact 3D Gaussian Splatting For Dense Visual SLAM

Recent work has shown that 3D Gaussian-based SLAM enables high-quality reconstruction, accurate pose estimation, and real-time rendering of scenes. However, these approaches are built on a tremendous number of redundant 3D Gaussian ellipsoids, leading to high memory and storage costs, and slow training speed. To address the limitation, we propose a compact 3D Gaussian Splatting SLAM system that reduces the number and the parameter size of Gaussian ellipsoids. A sliding window-based masking strategy is first proposed to reduce the redundant ellipsoids. Then we observe that the covariance matrix (geometry) of most 3D Gaussian ellipsoids are extremely similar, which motivates a novel geometry codebook to compress 3D Gaussian geometric attributes, i.e., the parameters. Robust and accurate pose estimation is achieved by a global bundle adjustment method with reprojection loss. Extensive experiments demonstrate that our method achieves faster training and rendering speed while maintaining the state-of-the-art (SOTA) quality of the scene representation.
最近的工作表明,基于3D高斯的SLAM能够实现场景的高质量重建、精确姿态估计和实时渲染。然而,这些方法都是建立在大量冗余的3D高斯椭球体上,导致内存和存储成本高,训练速度慢。为了解决这个问题,我们提出了一个紧凑的三维高斯溅射SLAM系统,减少了高斯椭球的数量和参数大小。首先提出了一种基于滑动窗口的掩模策略来减少冗余椭球。然后,我们观察到大多数3D高斯椭球的协方差矩阵(几何形状)非常相似,这促使一种新的几何码本来压缩3D高斯几何属性,即,参数。鲁棒性和准确的姿态估计是通过一个全球光束法平差方法与重投影损失。 大量的实验表明,我们的方法实现了更快的训练和渲染速度,同时保持了最先进的(SOTA)质量的场景表示。

Refer to caption

Figure 1:Our framework minimizes storage and accelerates rendering while maintaining the SOTA image reconstruction performance. The proposed framework eliminates unnecessary 3D Gaussian ellipsoids without affecting performance. We highlight and enlarge some areas to show the significant reduction of 3D Gaussian points.

1Introduction 1介绍

Simultaneous localization and mapping (SLAM) has been a fundamental computer vision problem with wide applications such as autonomous driving, robotics, and virtual/augmented reality [7, 28]. Several traditional methods, including ORBSLAM [24, 25], VINS [27], etc. [6, 37, 38], have been introduced over the years, representing scenes with sparse point cloud maps. However, due to the sparse nature of the point cloud, it proves ineffective for navigation or other purposes. Attention has turned to dense scene reconstruction, exemplified by DTAM [26], Kintinuous [35], and ElasticFusion [36]. However, their accuracy remains unsatisfactory due to high memory costs, slow processing speeds, and other real-time running limitations.
同时定位和映射(SLAM)一直是一个基本的计算机视觉问题,具有广泛的应用,如自动驾驶,机器人和虚拟/增强现实[ 7,28]。几种传统的方法,包括ORBSLAM [ 24,25],VINS [ 27]等[ 6,37,38],多年来已经引入,用稀疏点云图表示场景。然而,由于点云的稀疏性,它被证明对于导航或其他目的是无效的。注意力已经转向密集场景重建,例如DTAM [ 26],Kintinuous [ 35]和ElasticFusion [ 36]。然而,由于内存成本高、处理速度慢和其他实时运行限制,它们的精度仍然不令人满意。

Nowadays, with the proposal of Neural Radiance Fields (NeRF) [22], there are many following works on different areas [4]. Many works focus on combining implicit scene representation with SLAM systems. iMAP [32] is the first method to use a single MLP to represent the scene. NICE-SLAM [45], ESLAM [11], Co-SLAM [34], and PLGSLAM [5] further improve the scene representation with the hybrid feature grids, axis-aligned feature planes, joint coordinate-parametric encoding, and progressive scene representation. To further improve the accuracy of rendering, recent methods have started to explore 3D Gaussian Splatting(GS) [13] integration with SLAM, such as SplaTAM [12], GS-SLAM [39], etc [42, 21]. GS-based SLAM methods leverage a point-based representation associated with 3D Gaussian attributes and adopt the rasterization pipeline to render the images, achieving fast rendering speed and promising image quality. However, the original GS-based scene representation entails a substantial number of 3D Gaussian ellipsoids to maintain high-fidelity reconstruction, leading to high memory usage and storage requirements. GS-based SLAM systems usually need more than 500MB to represent a small room-sized scene. Moreover, the running speed of GS-based SLAM systems is significantly slower than NeRF-based methods, which hinders practical deployment, especially on resource-constrained devices.
如今,随着神经辐射场(NeRF)的提出[ 22],在不同领域有许多以下工作[ 4]。许多工作集中于将隐式场景表示与SLAM系统相结合。iMAP [ 32]是第一种使用单个MLP来表示场景的方法。NICE-SLAM [ 45],ESLAM [ 11],Co-SLAM [ 34]和PLGSLAM [ 5]进一步改进了混合特征网格,轴对齐特征平面,联合坐标参数编码和渐进式场景表示的场景表示。为了进一步提高渲染的准确性,最近的方法已经开始探索3D高斯溅射(GS)[ 13]与SLAM的集成,例如SplaTAM [ 12],GS-SLAM [ 39]等[ 42,21]。基于高斯的SLAM方法利用与3D高斯属性相关联的基于点的表示,并采用光栅化流水线来渲染图像,从而实现快速的渲染速度和有希望的图像质量。 然而,原始的基于GS的场景表示需要大量的3D高斯椭球来保持高保真度重建,从而导致高的内存使用和存储要求。基于GS的SLAM系统通常需要超过500MB来表示一个小房间大小的场景。此外,基于GS的SLAM系统的运行速度明显慢于基于NeRF的方法,这阻碍了实际部署,特别是在资源受限的设备上。

To this end, we propose a compact 3D Gaussian scene representation method to address the critical high memory demand and slow training speed issue in GS-based SLAM systems. Our method notably enhances storage efficiency while delivering high-quality reconstruction, fast training speed, and real-time rendering capabilities. First, we design a novel sliding window-based online masking method to remove the millions of redundant and unnecessary 3D Gaussian ellipsoids created during the SLAM system operation. With the proposed masking method, a compact 3D Gaussian scene representation is learned, achieving faster rendering speed and efficient memory usage since the computational complexity is linearly proportional to the number of 3D Gaussian points.

Second, we observe that the majority of Gaussian points exhibit similar geometry information in scale and rotation attributes. To this end, a codebook-based method is designed to compress the geometry of each Gaussian point. It learns to find the similarities and geometry shared across the scene. We only store the codebook index for each 3D Gaussian ellipsoid, obtaining compact scene representation.

Third, the camera tracking accuracy of GS-based SLAM is relatively low compared with other SLAM systems. A global BA method with reprojection loss is proposed to achieve robust and accurate pose estimation. Our method maintains a global keyframe database and performs bundle adjustment with all the historical observations, which can effectively eliminate the cumulative error. Overall, our contributions are shown as follows:

  • • 

    We propose a novel GS-based SLAM system with compact Gaussian scene representation, achieving fast training and rendering speed, accurate pose estimation, and significantly enhancing storage efficiency.

  • • 

    A novel sliding window-based online masking method is proposed to remove the number of redundant Gaussian ellipsoids while achieving high-fidelity performance during training.

  • • 

    We observe and analyze the geometry similarities of 3D Gaussian ellipsoids and propose a codebook-based method to efficiently restore the geometry of each Gaussian point during the SLAM system operation. A keyframe-based global BA method with reprojection loss is proposed to improve the relative low performance of camera tracking.

  • • 

    We conduct comprehensive experiments on different datasets and achieve nearly 176% increase in rendering speed and over 1.97× compression on memory usage.

    ·我们在不同的数据集上进行了全面的实验,实现了渲染速度提高近176%,内存使用量压缩超过1.97 × 。

Refer to caption

Figure 2:The pipeline of our GS-based SLAM system. The input of our system is the current RGB-D frame. We start the SLAM system by initializing the 3D Gaussian map construct. Then, we update our 3D Gaussian map by adding new Gaussians and using the learnable mask to reduce the redundant 3D Gaussian ellipsoids. We incorporate a codebook-based vector quantization method to compress the scene representation. For camera tracking, we maintain a global keyframe database for global BA and use reprojection loss for robust pose estimation.

2Related Work 2相关工作

Dense Visual SLAM and Localization. SLAM [2, 19] and localization [20] has become an active field for the past two decades. DTAM [26] is the first method to achieve dense scene reconstruction. Kinectfusion [10] uses projective iterative-closet-point (ICP) for camera tracking. Some learning-based methods integrate traditional geometry frameworks with deep learning networks for accurate camera tracking and mapping, such as DROID-SLAM [33].
密集视觉SLAM和定位。SLAM [ 2,19]和本地化[ 20]在过去的二十年中已经成为一个活跃的领域。DTAM [ 26]是实现密集场景重建的第一种方法。Kinectfusion [ 10]使用投影迭代最近点(ICP)进行摄像机跟踪。一些基于学习的方法将传统的几何框架与深度学习网络相结合,以实现准确的相机跟踪和映射,例如DROID-SLAM [ 33]。

NeRF-based SLAM. With the proposal of Neural radiance fields (NeRF) [22], many researchers explore taking advantage of the implicit method into SLAM systems. iMAP [32] is the first method to use a single multi-layer perceptron (MLP) to represent the scene, and NICE-SLAM [45]uses learnable hierarchical feature grids. Vox-Fusion [40] employs octree architecture for dynamic map scalability. ESLAM [11] and Co-SLAM [34] further improve the scene representation with tri-planes and joint coordinate-parametric encoding. [44, 18, 17] use semantic feature embedding to improve scene representation. Point-SLAM [29] uses neural point clouds for the scene representation. Instead of representing maps with neural implicit features, our method utilizes the explicit 3D Gaussian representation, which can significantly improve the rendering speed using splatting-based rasterization.
基于NeRF的SLAM随着神经辐射场(NeRF)[ 22]的提出,许多研究人员探索将隐式方法用于SLAM系统。iMAP [ 32]是第一种使用单个多层感知器(MLP)来表示场景的方法,NICE-SLAM [ 45]使用可学习的分层特征网格。Vox-Fusion [ 40]采用八叉树架构实现动态地图可扩展性。ESLAM [ 11]和Co-SLAM [ 34]进一步改进了具有三平面和联合坐标参数编码的场景表示。[ 44,18,17]使用语义特征嵌入来改进场景表示。Point-SLAM [ 29]使用神经点云进行场景表示。我们的方法不是用神经隐式特征表示地图,而是利用显式3D高斯表示,这可以显着提高基于splatting的光栅化的渲染速度。

GS-based SLAM. Recently, 3D Gaussian Splatting (3DGS) [13] using 3D Gaussians as primitives for real-time neural rendering. 3DGS utilizes highly optimized custom CUDA kernels and novel algorithmic approaches, which achieve significant improvements in rendering speed without sacrificing image quality. SplaTAM [12], GS-SLAM [39], Gaussian-SLAM [42], Gaussian Splatting SLAM [21] are the pioneer works that successfully combine the advantages of 3D Gaussian Splatting with SLAM. These methods achieve fast rendering speed and high-fidelity reconstruction performance. However, the training speed is relatively slow, which is crucial for SLAM as it is an online operation system. Memory and storage usage are also heavy, which makes them difficult to use in real-world scenarios and with handheld devices.
基于GS的SLAM。最近,3D高斯溅射(3DGS)[ 13]使用3D高斯作为实时神经渲染的基元。3DGS利用高度优化的自定义CUDA内核和新颖的算法方法,在不牺牲图像质量的情况下显著提高了渲染速度。SplaTAM [ 12],GS-SLAM [ 39],Gaussian-SLAM [ 42],Gaussian Splatting SLAM [ 21]是成功联合收割机3D Gaussian Splatting与SLAM优点的先驱作品。这些方法实现了快速的绘制速度和高保真的重建性能。然而,训练速度相对较慢,这对于SLAM来说至关重要,因为它是一个在线操作系统。内存和存储使用量也很大,这使得它们难以在现实世界中使用,也难以与手持设备一起使用。

3Method 3方法

The pipeline of our system is shown in Fig. 2. The input of our system is sequential RGB-D frames {��,��}�=1� with known camera intrinsic �∈�3×3. Our system simultaneously reconstructs a dense scene map and estimates camera poses {��|��}�=1�. For the mapping thread, a compact 3D Gaussian scene representation (Sec. 3.1) is designed to represent the environments with sliding window-based masks (Sec. 3.2) and geometry codebook (Sec. 3.3). For the camera tracking thread, a global bundle adjustment method (Sec. 3.4) is designed for robust and accurate pose estimation. The network is incrementally updated with the SLAM system operation.
我们系统的流水线如图2所示。我们的系统的输入是具有已知相机固有 �∈�3×3 的顺序RGB-D帧 {��,��}�=1� 。我们的系统同时重建了一个密集的场景地图,并估计相机姿势 {��|��}�=1� 。对于映射线程,使用紧凑的3D高斯场景表示(Sec. 3.1)被设计为用基于滑动窗口的掩模来表示环境(Sec. 3.2)和几何码本(Sec. 3.3)。对于摄像机跟踪线程,全局光束法平差方法(Sec. 3.4)是专为鲁棒和准确的姿态估计。网络随着SLAM系统操作而递增地更新。

3.13D Gaussian Scene Representation

Inspired by [13], We represent the entire scene as a set of 3D Gaussian ellipsoids. Each 3D Gaussian is associated with 3D attributes (positions, opacity, scale, and rotation). Our Gaussian ellipsoids are defined by a full 3D covariance matrix 𝚺 defined in world space (normalized):
受[ 13]的启发,我们将整个场景表示为一组3D高斯椭球。每个3D高斯都与3D属性(位置、不透明度、缩放和旋转)相关联。我们的高斯椭圆由在世界空间中定义的全3D协方差矩阵 𝚺 定义(归一化):


where �∈[0,1] is the opacity value. 𝑺 is the scaling matrix, and 𝑹 is the rotation matrix.
其中 �∈[0,1] 是不透明度值。 𝑺 是缩放矩阵, 𝑹 是旋转矩阵。

Then we use the 3D Gaussian ellipsoids to render 2D images with the technique of splatting [14, 41]. Then we can formulate the covariance matrix 𝚺′ in camera coordinates:
然后,我们使用3D高斯椭球体来渲染2D图像与飞溅技术[14,41]。然后,我们可以在相机坐标中公式化协方差矩阵 𝚺′ :


where 𝑾 denotes the view direction, 𝑱 denotes the projection transformation matrix. For each pixel �, the color and opacity of all Gaussian ellipsoids are computed and blended using this formula:
其中, 𝑾 表示观看方向, 𝑱 表示投影变换矩阵。对于每个像素 � ,使用以下公式计算并混合所有高斯椭圆的颜色和不透明度:


where �� denotes the color of Gaussian ellipsoids. We also propose a similar depth rendering formula:
其中 �� 表示高斯椭圆的颜色。我们还提出了一个类似的深度渲染公式:


We also render a silhouette image to determine visibility:


Refer to caption

Figure 3:The left figure shows the learnable mask strategy. We perform frustum selection and sliding widow reset to remove redundant Gaussian ellipsoids while maintaining the reconstruction accuracy efficiently. The dashed lines represent the removed 3D Gaussian ellipsoids. The right figure shows the varying count of Gaussian ellipsoids during the SLAM system operation. These two curves show the distinction between our system with and without masks. Our mask strategy achieves 1.97 × compression on the number of 3D Gaussians.
图3:左图显示了可学习的掩码策略。我们执行截头体选择和滑动窗口重置,以去除冗余的高斯椭球,同时保持重建精度有效。虚线表示移除的3D高斯椭圆。右图示出了在SLAM系统操作期间高斯椭圆的变化计数。这两条曲线显示了我们的系统在有掩模和没有掩模的情况下的区别。我们的掩模策略在3D高斯的数量上实现了1.97 × 压缩。

3.2Sliding Window-based Mask

The existing GS-based SLAM systems, such as SplaTAM [12] and GS-SLAM [39], directly use the original 3DGS for scene representation, achieving promising image quality. However, we observe that the 3DGS creates a number of redundant 3D Gaussian ellipsoids with the SLAM system operation(×1.52 Gaussian ellipsoids show similar performance in Fig. 3), while both of them fail to discover this. This finally results in poor performance in training speed, memory, and storage usage, which is crucial for online SLAM systems. Some methods [16, 23, 9] propose novel Gaussian pruning and self-organizing methods to compact the 3DGS attributes. However, all of these strategies are not suitable for GS-based SLAM systems as they have to obtain all the images, pose, and the corresponding point cloud at the beginning, while SLAM systems are incrementally optimized.
现有的基于GS的SLAM系统,如SplaTAM [ 12]和GS-SLAM [ 39],直接使用原始3DGS进行场景表示,实现了有希望的图像质量。然而,我们观察到,3DGS利用SLAM系统操作创建了多个冗余的3D高斯椭圆(图3中的 × 1.52高斯椭圆示出了类似的性能),而它们两者都未能发现这一点。这最终导致训练速度、内存和存储使用方面的性能低下,而这对于在线SLAM系统至关重要。一些方法[ 16,23,9]提出了新的高斯修剪和自组织方法来压缩3DGS属性。然而,所有这些策略都不适合于基于GS的SLAM系统,因为它们必须在开始时获得所有图像、姿态和对应的点云,而SLAM系统是增量优化的。

To this end, we propose a learnable sliding window-based mask strategy to remove the redundant 3D Gaussian ellipsoids with the SLAM system operation. Compared to the original densification method, which only considers the opacity, our method takes into account both the volume � and opacity �∈[0,1] of Gaussian ellipsoids. The volume calculation is �=43​�​�​�​�, where �​�​� are the three dimensions of the scale 𝑺. We introduce a learnable mask parameter �∈�� and a corresponding binary mask �∈{0,1}�, � is the number of Gaussian ellipsoids.
为此,我们提出了一个可学习的滑动窗口为基础的掩模策略,以消除冗余的三维高斯椭球的SLAM系统操作。与只考虑不透明度的原始致密化方法相比,我们的方法同时考虑了高斯椭球的体积 � 和不透明度 �∈[0,1] 。体积计算是 �=43​�​�​�​� ,其中 �​�​� 是刻度 𝑺 的三维。我们引入了一个可学习的掩码参数 �∈�� 和一个相应的二进制掩码 �∈{0,1}� , � 是高斯椭球的数量。


where � is the index of the Gaussian ellipsoids, � denotes the mask threshold. Inspired by [1], we employ the stop gradient operator �​�​(⋅) to calculate gradients from binary masks. 𝕀 and �​�​�​(⋅) denote the indicator and sigmoid function. This formulation of mask strategy allows us to effectively combine the influence of volume and opacity of Gaussian ellipsoids. We formulate the loss function �� of our mask:
其中 � 是高斯椭圆的索引, � 表示掩码阈值。受[ 1]的启发,我们采用停止梯度算子 �​�​(⋅) 来计算二进制掩码的梯度。 𝕀 和 �​�​�​(⋅) 表示指示符和sigmoid函数。这种掩模策略的制定使我们能够有效地联合收割机的体积和不透明度的高斯椭球的影响。我们用公式表示掩码的损失函数 �� :


Refer to caption

Figure 4:The R-VQ process to represent the scale and rotation of Gaussian ellipsoids. In the first stage, we cluster the scale and rotation vectors and randomly select codebook initialization with the closest code. In the subsequent stage, the residual between the original vector and the result from the first stage is stored in another codebook. This iterative process continues through to the ultimate stage, at which point, the collectively chosen indices and codebook from each stage provide a representation of the original vector.

In order to better fit the online updating SLAM systems, we further improve the masking strategy by adding frustum culling and sliding window-based reset strategy, shown in Fig. 3. Our frustum culling strategy allows us to optimize only the mask within the current viewing frustum while keeping the rest of the 3D Gaussian ellipsoids fixed. It will not only preserve the previously reconstructed geometry but also significantly reduce the number of parameters during optimization. Different from the original densification strategy performed on every frame, we only perform mask on the keyframe (each ��​ℎ frame) for efficiency and accuracy. We maintain a local sliding window and perform sliding window reset to avoid the continuous optimization and accumulated gradient of masks which will ultimately eliminate all Gaussian ellipsoids. The sliding window consists of the current frame, the most relevant keyframe, and �−2 previous keyframes, which have the highest overlap with the current frame. Overlap is evaluated by analyzing the point cloud of the current frame’s depth map and tallying points within the frustum of each keyframe. This can also ensure the consistency of the mask within the local sliding window. This approach allows us to continuously mask out unnecessary Gaussians during online SLAM system operation, effectively reducing computation overhead and ensuring efficient memory usage on GPU.
为了更好地适应在线更新的SLAM系统,我们通过添加截头体剔除和基于滑动窗口的重置策略来进一步改进掩蔽策略,如图3所示。我们的截头体剔除策略允许我们仅优化当前视锥体内的掩模,同时保持3D高斯椭球体的其余部分固定。它不仅保留了以前重建的几何形状,而且大大减少了优化过程中的参数数量。与原始的对每帧执行的致密化策略不同,为了效率和准确性,我们只对关键帧(每个 ��​ℎ 帧)执行掩模。我们保持一个局部滑动窗口,并执行滑动窗口重置,以避免不断优化和累积梯度的面具,最终消除所有高斯椭球。滑动窗口由当前帧、最相关的关键帧和与当前帧重叠最多的 �−2 先前关键帧组成。 通过分析当前帧的深度图的点云并计算每个关键帧的截头锥体内的点来评估重叠。这也可以确保局部滑动窗口内的掩码的一致性。这种方法允许我们在在线SLAM系统操作期间不断屏蔽不必要的高斯,有效地减少计算开销并确保GPU上的高效内存使用。

Refer to caption

Figure 5:The KL divergence distribution of the Gaussian ellipsoids with the online training of the SLAM system on different time steps (500, 1000, 1500, 2000). We can observe that the similarity in geometry consistently remains at a high level of GS-based SLAM system.

3.3Geometry Codebook 3.3几何代码手册

In this section, we analyze and observe the geometry similarities of the Gaussian ellipsoids created by SLAM systems. Then, we propose a learnable codebook and employ a residual vector quantization method to reduce computational complexity and memory usage and further improve the training and rendering speed.

For the GS-based SLAM system, a scene is composed of a number of small Gaussian ellipsoids with 3D geometry attributes (scale and rotation matrix 𝑺,𝑹). Consider that the 3D Gaussian ellipsoids �1,�2 conform to an unbiased Gaussian distribution 𝒩​(0,𝚺1),𝒩​(0,𝚺2), we adopt the Kullback-Leibler divergence [15] to analysis the geometry similarities of 3D Gaussian ellipsoids:
对于基于GS的SLAM系统,场景由具有3D几何属性(缩放和旋转矩阵 𝑺,𝑹 )的多个小的高斯椭圆体组成。考虑到3D高斯椭球 �1 、 �2 符合无偏高斯分布 𝒩​(0,𝚺1),𝒩​(0,𝚺2) ,我们采用Kullback-Leibler散度[ 15]来分析3D高斯椭球的几何相似性:


where � is the dimension of the covariance matrix. We conduct extensive experiments and present our results on Tab. 1 and Fig. 5. We can see that the percentage of 3D Gaussian ellipsoids is significantly elevated in a small range of KL divergence, which demonstrates the similarities of the 3D Gaussian ellipsoids shared across the scene. Our experiments also show that the similarities of 3D Gaussians of the GS-based SLAM system are greater than the original 3DGS. This is probably due to the online optimization strategy that the SLAM system only uses the current frame and history keyframes to optimize the 3D Gaussian attributes, which will exacerbate the geometry similarity. Based on the similarity, we propose a learnable codebook to compress the geometry attributes (scale and rotation), shown in Fig. 4. Inspired by SoundStream [43] and Encodec [8], we incorporate the residual vector quantization (R-VQ) to compress the scale and rotation. It cascades L stages of VQ and is formulated as follows:
其中 � 是协方差矩阵的维度。我们进行了广泛的实验,并提出了我们的结果在Tab。1和图5。我们可以看到,3D高斯椭球的百分比在KL发散的小范围内显著升高,这证明了场景中共享的3D高斯椭球的相似性。我们的实验还表明,基于GS的SLAM系统的三维高斯的相似性大于原始的3DGS。这可能是由于SLAM系统仅使用当前帧和历史关键帧来优化3D高斯属性的在线优化策略,这将加剧几何相似性。基于相似性,我们提出了一个可学习的码本来压缩几何属性(缩放和旋转),如图4所示。受SoundStream [ 43]和Encodec [ 8]的启发,我们结合了残差矢量量化(R-VQ)来压缩缩放和旋转。 它级联VQ的L级,公式如下:


where �∈ℛ�×4 is the scale vector, �^�∈ℛ�×4 is the output scale vector after � stages quantization. � denotes the index of the Gaussian ellipsoids. 𝒞� denotes the codebook at the stage l. 𝒞� represents the vector at index i of the codebook 𝒞. The formulation of the rotation vector is the same. Then, the loss function is defined as:
其中, �∈ℛ�×4 是尺度矢量, �^�∈ℛ�×4 是经过 � 级量化后的输出尺度矢量。 � 表示高斯椭球的索引。 𝒞� 表示在阶段l的码本。 𝒞� 表示码本 𝒞 的索引i处的向量。旋转矢量的公式是相同的。然后,损失函数被定义为:


where � is the size of codebook, �​�​[⋅] is the stop gradient operator. After this, we can only store the codebook compressed scale and rotation vector, which can significantly reduce storage and memory usage.
其中 � 是码本的大小, �​�​[⋅] 是停止梯度算子。在此之后,我们可以只存储码本压缩的尺度和旋转向量,这可以大大减少存储和内存使用。

Table 1:The KL divergence analysis of GS-based SLAM and original 3DGS.

SplaTAM [12] 3DGS [13]

3.4Tracking and Global Bundle Adjustment

Our tracking and bundle adjustment are performed via minimizing our objective functions. The camera pose is initialized for a new time step by a constant velocity forward projection of the pose parameters. The color and depth loss is defined as:


where �� is the set of rays that have a valid depth observation. The reprojection error is common in traditional SLAM methods based on sparse point clouds [25]. Since 3D Gaussian is also based on a point cloud representation, we introduce this loss for the first time to improve the scene’s geometric representation and consistency further. We formulate reprojection errors with SIFT features:
其中 �� 是具有有效深度观测的射线集合。重投影误差在基于稀疏点云的传统SLAM方法中很常见[ 25]。由于3D高斯也是基于点云表示的,我们首次引入这种损失,以进一步提高场景的几何表示和一致性。我们用SIFT特征公式化重投影误差:


where Π​(��→�′​��+��→�′) represents the reprojection of 3D point �� to the corresponding pixel (��′,��′) in image �′. The tracking loss is formulated as follows:
其中 Π​(��→�′​��+��→�′) 表示3D点 �� 到图像 �′ 中的对应像素 (��′,��′) 的再投影。跟踪损失公式如下:


We use the rendered visibility silhouette to select the well-optimized pixels for camera tracking, which can improve the tracking accuracy for the new frames.

Global Bundle Adjustment. For global consistency and accuracy, our system maintains a significantly larger global keyframe database than other GS-based SLAM systems. We randomly sample a total number of N rays from our global keyframe database to optimize our scene representation as well as camera poses. This phase optimizes a loss similar to tracking loss, and we also add an SSIM loss to RGB rendering. The global bundle adjustment is performed to optimize the scene representation with the camera pose. Our global BA method can effectively reduce cumulative errors and enhance the robustness of pose estimation, especially for long sequences and large scenes.

Refer to caption

Figure 6:The rendering visualization results on the Replica dataset [30] of the proposed GS-based SLAM system compared with other SOTA methods. We present the rendering PSNR and FPS on the image. Our method can achieve faster rendering speed and high-quality image reconstruction performance compared with other methods.


用于密集视觉冲击的紧凑三维高斯散射Compact 3D Gaussian Splatting For Dense Visual SLAM

Compact 3D Gaussian Splatting For Dense Visual SLAM 用于密集视觉冲击的紧凑三维高斯散射 Tianchen Deng 邓天辰11Yaohui Chen 陈耀辉11Leyan Zhang 张乐妍11Jianfei Yang 杨健飞22Shenghai Yuan 圣海元22Danwei Wang 王丹伟22Weidong Chen 陈卫东11 Abstract 摘要 …...


ChatGPT无限次数:点击直达 ChatGPT揭秘:高效论文写作的秘籍 引言 在当今信息爆炸的时代,高效撰写论文对于研究者和学术工作者至关重要。随着人工智能技术的不断发展,ChatGPT等自然语言处理工具的出现为论文写作提供了全新的思路和工具。本文…...


目录 一、问题说明 二、解决方案 一、问题说明 内网的设备能互联,内网的各个设备无法连外网。 电脑在检测网络时,出现以下提示: 二、解决方案 首先重启光猫(我们是电信宽带)。 如果还是有问题,再重启…...

云计算: OVN 集群 部署分布式交换机

目录 一、实验 1.环境 2.OVN 集群 部署云主机 3.中心端添加DVS分布式大二层交换机 二、问题 1.南向控制器查看主机名只显示localhost 2.中心端如何添加DVR分布式⼤三层路由器 一、实验 1.环境 (1) 主机 表1 宿主机 主机架构软件主要服务IP备注ovn_central中心端 ovn…...


最近在做一个C/S架构的项目预研 过程中遇到 Electron 与 Vue3 通讯的问题,费劲巴力的在网上找方案,发现都不理想,最终攻克之后,计划将过程写下来,供有需求的同学白嫖! 开始之前,先说一件重要的…...


简单 代码 <!DOCTYPE html> <html lang"en"><head><meta charset"UTF-8"><meta name"viewport" content"widthdevice-width, initial-scale1.0"><title>Document</title><style>div…...

ubuntu手动编译opencv 4.9.0遇到的问题汇总

ubuntu手动编译opencv 4.9.0遇到的问题汇总 编译流程 以4.9.0版本为例&#xff0c;可参考&#xff1a;https://docs.opencv.org/4.9.0/d2/de6/tutorial_py_setup_in_ubuntu.html 编译加速 https://blog.csdn.net/tfb760/article/details/104030841 ippicv_2021.10.0_lnx_i…...


初创企业需要建站的朋友看这篇文章&#xff0c;谢谢支持&#xff1a;我给不会敲代码又想搭建网站的人建议 &#xff08;接上一篇。。。&#xff09; 排名 经过搜索引擎蜘蛛抓取页面&#xff0c;索引程序计算得到倒排索引后&#xff0c;搜索引擎就准备好可以随时处理用户搜索了…...




1. 栈和局部变量操作 1.1 将常量压入栈的指令aconst_null 将null对象引用压入栈 iconst_m1 将int类型常量-1压入栈 iconst_0 将int类型常量0压入栈 iconst_1 将int类型常量1压入栈 iconst_2 将int类型常量2压入栈 iconst_3 将int类型常量3压入栈 iconst_4 将int类型常量4压入…...


1.了解jsQR jsQR是一个纯javascript脚本实现的二维码识别库&#xff0c;不仅可以在浏览器端使用&#xff0c;而且支持后端node.js环境。jsQR使用较为简单&#xff0c;有着不错的识别率。 2.效果图 3.二维码 4.下载jsqr包 npm i -d jsqr5.代码 <!-- index.wxml --> &l…...

【verilog 设计】 reg有没有必要全部赋初值?

一、前言 在知乎发现“reg有没有必要全部赋初值”这个问题&#xff0c;与自己近期对Verilog reg的进一步学习相契合&#xff0c;此文对这个问题进行总结。 二、reg的初值赋值方式 就语法意义赋初值而言&#xff0c;就是在声明reg时对其赋值。在工程中&#xff0c;对于数字系…...

NLP问答系统:使用 Deepset SQUAD 和 SQuAD v2 度量评估

目录 一、说明 二、Deepset SQUAD是个啥&#xff1f; 三、问答系统&#xff08;QA系统&#xff09;&#xff0c;QA系统在各行业的应用及基本原理 3.1 医疗 3.2 金融 3.3 顾客服务 3.4 教育 3.5 制造业 3.6 法律 3.7 媒体 3.8 政府 四、在不同行业使用QA系统的基本原理 五、关于…...


要防止抓包工具伪造请求&#xff0c;采取一系列的技术和策略来增强应用程序的安全性。以下是一些关键步骤和最佳实践&#xff1a; 1. 使用HTTPS 确保应用程序使用HTTPS协议进行通信。HTTPS通过TLS/SSL加密客户端和服务器之间的数据传输&#xff0c;这使得抓包工具捕获到的数据…...

密码学 | 椭圆曲线数字签名方法 ECDSA(下)

目录 10 ECDSA 算法 11 创建签名 12 验证签名 13 ECDSA 的安全性 14 随机 k 值的重要性 15 结语 ⚠️ 原文&#xff1a;Understanding How ECDSA Protects Your Data. ⚠️ 写在前面&#xff1a;本文属于搬运博客&#xff0c;自己留着学习。同时&#xff0c;经过几…...


拟态个人主页 效果图源代码领取源码 效果图 PC端 移动端 源代码 index.php <!DOCTYPE html> <html lang"zh-CN"> <head><meta charset"UTF-8"><title>孤客 |佩恩</title><meta name"keywords" co…...


移动硬盘作为现代生活中重要的数据存储工具&#xff0c;承载着我们大量的文件和数据。然而&#xff0c;有时我们会遇到移动硬盘无法打开的情况&#xff0c;这往往让人焦虑不已。那么&#xff0c;当移动硬盘无法打开时&#xff0c;我们应该如何应对呢&#xff1f; 移动硬盘无法打…...


用得多Ubuntu&#xff0c;今天用Windows重新更新anaconda出问题&#xff0c;重新安装之后&#xff0c;打开pycharm发现打开终端之后&#xff0c;刚开始是ps的状态&#xff0c;后面试了网上改cmd的方法&#xff0c;终端变成c盘开头了 切换到虚拟环境如下&#xff1a;目前的shell…...


类加载器&双亲委派 什么是类加载器 类加载器是一个负责加载器类的对象&#xff0c;用于实现类加载的过程中的加载这一步。每个Java类都有一个引用指向加载它的ClassLoader。而数组类是由JVM直接生成的&#xff08;数组类没有对应的二进制字节流&#xff09; 类加载器有哪…...


一共8884张图片 xml .txt格式都有 Yolo可直接训练 已跑通 动作类别一共8类。 全部为教室监控真实照片&#xff0c;没有网络爬虫滥竽充数的图片&#xff0c;可直接用来训练。以上图片均一一手工标注&#xff0c;标签格式为VOC格式。适用于YOLO算法、SSD算法等各种目标检测算法…...


MVCC是一种用来解决读写冲突的无锁并发控制&#xff0c;也就是为事务分配单项增长的时间戳&#xff0c;为每个修改保存一个版本&#xff0c;版本与事务时间戳关联&#xff0c;读操作只读该事务开始前的数据库的快照 MVCC&#xff0c;全称Multi-Version Concurrency Control&am…...


力扣75.颜色分类 给定一个包含红色、白色和蓝色、共 n 个元素的数组 nums &#xff0c;原地对它们进行排序&#xff0c;使得相同颜色的元素相邻&#xff0c;并按照红色、白色、蓝色顺序排列。 我们使用整数 0、 1 和 2 分别表示红色、白色和蓝色。 必须在不使用库内置的 sor…...


ES6解构赋值是一种简洁的为变量赋值的方式&#xff0c;它允许我们从数组或对象中提取值并赋给对应的变量。 解构赋值在ES6中被引入&#xff0c;主要目的是为了简化代码&#xff0c;提高代码的可读性。以下是解构赋值的基本用法&#xff1a; 数组解构&#xff1a;当我们需要从数…...


问题 前端也想用Jenkins的CI/CD工作流。 步骤 Jenkins安装NodeJS插件 安装完成&#xff0c;记得重启Jenkins。 全局配置nodejs Jenksinfile pipeline {agent anytools {nodejs "18.15.0"}stages {stage(Check tool version) {steps {sh node -vnpm -vnpm config…...


入门指南 区分白蚁与蚂蚁 日常生活中&#xff0c;人们常常会把白蚁与蚂蚁搞混淆&#xff0c;其实这两者是有很大区别的&#xff0c;养殖方式差别也很大。白蚁主要食用木质纤维&#xff0c;会给家庭房屋带来较大危害&#xff0c;而蚂蚁主要采食甜食和蛋白质类食物&#xff0c;不…...


个人评价 难度还是有的&#xff0c;中等难度吧&#xff0c;可能是因为项目使用的是物流项目&#xff0c;该项目本来就比较庞大难度比较高&#xff0c;流的八股文我真的是一点不会&#xff0c;还需要加强&#xff0c;reidis的多路io复用模型没有深问&#xff0c;要是问了就寄了&…...


目录 一、软件简介 二、软件下载 一、软件简介 DataGrip是由JetBrains公司开发的一款强大的关系数据库集成开发环境&#xff08;IDE&#xff09;&#xff0c;专为数据库开发人员和数据库管理员设计。它提供了一个统一的界面&#xff0c;用于管理和开发各种关系型数据库&#x…...

【InternLM 实战营第二期-笔记4】XTuner 微调个人小助手认知

书生浦语是上海人工智能实验室和商汤科技联合研发的一款大模型,很高兴能参与本次第二期训练营&#xff0c;我也将会通过笔记博客的方式记录学习的过程与遇到的问题&#xff0c;并为代码添加注释&#xff0c;希望可以帮助到你们。 记得点赞哟(๑ゝω╹๑) XTuner 微调个人小助手…...

<计算机网络自顶向下> CDN

视频服务挑战 规模性异构性&#xff1a;不同用户有不同的能力&#xff08;比如有线接入和移动用户&#xff1b;贷款丰富和受限用户&#xff09;解决方法是&#xff1a;分布式的应用层面的基础设施CDN 多媒体&#xff1a;视频 视频是固定速度显示的一系列图像的序列&#xff…...

【Git教程】(十二)工作流之项目设置 — 何时使用工作流,工作流的结构,项目设置概述、执行过程及其实现 ~

Git教程 工作流之项目设置 1️⃣ 何时使用工作流2️⃣ 工作流的结构3️⃣ 概述4️⃣ 使用要求5️⃣ 执行过程及其实现5.1 基于项目目录创建一个新的版本库5.2 以文件访问的方式共享版本库5.3 用 Git daemon 来共享版本库5.4 用 HTTP 协议来共享版本库5.5 用 SSH 协议来共享版…...


API技巧集&#xff08;一&#xff09; 一、拖动无标题窗体&#xff1a; 包含头文件&#xff1a; #include $#60;winuser.h$#62; 在窗体或组件的 OnMouseDown 事件中加入以下代码: if(Button mbLeft) { ReleaseCapture(); SendMessage( Handle, WM_NCLBUTTONDOWN, HTCAPTION, 0…...


二维数组由若干个一维数组组成。 在C中&#xff0c;组成二维数组的一维数组长度必须相等。在C#中却可以不相等。 C#二维数组有两种&#xff1a; 1&#xff0c;普通二维数组&#xff1a; int [,] arr2d new int[3,2]; int[,] scroes2d2 new int[3, 2] { { 1, 2 }, { 3, 4 }, {…...


编译 | AI科技大本营&#xff08;ID:rgznai100&#xff09;许多组织都在尝试收集和利用尽可能多的数据&#xff0c;以改善其经营方式&#xff0c;增加收入和提升影响力。因此&#xff0c;数据科学家面对50GB甚至500GB大小的数据集情况变得越来越普遍。不过&#xff0c;这类数据…...


思路&#xff1a; 啥是蓄水池算法&#xff1a; 参考链接&#xff1a;https://www.jianshu.com/p/7a9ea6ece2af 给定一个数据流&#xff0c;数据流长度N很大&#xff0c;且N直到处理完所有数据之前都不可知&#xff0c;请问如何在只遍历一遍数据&#xff08;O(N)&#xff09;的情…...


动图&#xff0c;也叫“影图”&#xff0c;英文称为 Cinemagraph 或 Motionimage。文件格式常为传统的 GIF Graphics Interchange Format文件。动图的初衷就是为静态图片添加一些细微的、局部的运动&#xff0c;给人一种“世间静谧&#xff0c;唯它悄动”的感觉。◆ ◆ ◆历史…...

