当前位置：首页 > news >正文

单目3D目标检测——MonoDLE 模型训练 | 模型推理

news 2025/10/28 21:57:21

本文分享 MonoDLE 的模型训练、模型推理、可视化3D检测结果。

模型原理，参考我这篇博客：【论文解读】单目3D目标检测 MonoDLE（CVPR2021）_一颗小树x的博客-CSDN博客

源码地址：https://github.com/xinzhuma/monodle

一、环境搭建

二、准备数据集

三、训练模型

四、模型推理

4.1 使用刚才训练的权重推理

4.2 使用预训练权重推理

五、可视化3D检测结果

一、环境搭建

1.1 需要用到Conda来搭建环境，首先创建一个MonoCon环境；

conda create --name MonoDLE python=3.8
conda activate MonoDLE

1.2 下载代码到本地；

git clone https://github.com/xinzhuma/monodle
cd monocon-pytorch-main

1.3 安装pytorch和对应CUDA，这里以为示例；

conda install pytorch==1.12.0 torchvision==0.13.0 torchaudio==0.12.0 cudatoolkit=11.3 -c pytorch

其他版本安装，或使用pip安装的，参考pytorch官网：Previous PyTorch Versions | PyTorch

1.4 安装MonoCon的依赖库；

cd monodle-main

pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple

在 pip 命令中使用 -i 参数来指定清华镜像地址，加速安装。

二、准备数据集

官网链接：The KITTI Vision Benchmark Suite

需要下载的文件：

Download left color images of object data set (12 GB) 这是图片，包括训练集和测试集
Download camera calibration matrices of object data set (16 MB) 这是相机的标定相关的文件
Download training labels of object data set (5 MB) 这是图片训练集对应的标签

下载后的文件放在dataset目录中，存放的目录结构：

data/KITTI/object
│
├── training
│   ├── calib
│   │   ├── 000000.txt
│   │   ├── 000001.txt
│   │   └── ...
│   ├── image_2
│   │   ├── 000000.png
│   │   ├── 000001.png
│   │   └── ...
│   └── label_2
│       ├── 000000.txt
│       ├── 000001.txt
│       └── ...
│
└── testing├── calib└── image_2

存放好数据集后，目录结构如下所示：

三、训练模型

训练模型的配置在experiments/example/kitti_example.yaml：

batch_size: 16 ，可以根据显存大小调整，默认是16
writelist: ['Car'] ，这里是训练那些类别；默认只有Car一种；如果是3种类别：writelist: ['Car', 'Pedestrian', 'Cyclist']
数据增强，random_flip、random_crop、scale、shif
max_epoch: 140，最大训练轮数
gpu_ids: 0,1，使用那些GPU训练；如果只有一张显卡：gpu_ids: 0,0
save_frequency: 5=10，间隔多少轮，保存模型权重，默认是10轮保存一次

示例代码如下

random_seed: 444dataset:type: &dataset_type 'KITTI'batch_size: 8 # 16use_3d_center: Trueclass_merging: Falseuse_dontcare: Falsebbox2d_type: 'anno'   # 'proj' or 'anno'meanshape: False      # use predefined anchor or notwritelist: ['Car', 'Pedestrian', 'Cyclist']random_flip: 0.5random_crop: 0.5scale: 0.4shift: 0.1model:type: 'centernet3d'backbone: 'dla34'neck: 'DLAUp'num_class: 3optimizer:type: 'adam'lr: 0.00125weight_decay: 0.00001lr_scheduler:warmup: True  # 5 epoches, cosine warmup, init_lir=0.00001 in defaultdecay_rate: 0.1decay_list: [90, 120]trainer:max_epoch: 140gpu_ids: 0,0  # 0,1save_frequency: 5 # checkpoint save interval (in epoch) 10# resume_model: 'checkpoints/checkpoint_epoch_70.pth'tester:type: *dataset_typemode: single   # 'single' or 'all'checkpoint: '../../checkpoints/checkpoint_epoch_5.pth'  # for 'single' modecheckpoints_dir: '../../checkpoints'  # for 'all' modelthreshold: 0.2  # confidence filter

然后执行命令，开始训练。

 cd experiments/examplepython ../../tools/train_val.py --config kitti_example.yaml

训练会打印一些信息

(MonoDLE) root@8677bec7ab74:/guopu/monodle-main/experiments/example# python ../../tools/train_val.py --config kitti_example.yaml
2023-10-15 13:14:09,144 INFO ################### Training ##################
2023-10-15 13:14:09,146 INFO Batch Size: 8
2023-10-15 13:14:09,146 INFO Learning Rate: 0.001250

epochs: 8%|█████████▌ | 11/140 [1:23:27<16:47:33, 468.63s/it]

.......

训练中会有模型的验证结果，和保存模型权重

权重：experiments/example/checkpoints/checkpoint_epoch_5.pth

experiments/example/checkpoints/checkpoint_epoch_10.pth

experiments/example/checkpoints/checkpoint_epoch_15.pth

......

experiments/example/checkpoints/checkpoint_epoch_140.pth

日志信息：experiments/example/train.log.20231015_144054

四、模型推理

4.1 使用刚才训练的权重推理

首先修改配置文件experiments/example/kitti_example.yaml

tester:

type: *dataset_type

mode: single # 'single' or 'all'

checkpoint: './checkpoints/checkpoint_epoch_50.pth' # for 'single' mode

checkpoints_dir: '../../checkpoints' # for 'all' model

threshold: 0.2 # confidence filter

然后执行命令，模型推理示例：

python ../../tools/train_val.py --config kitti_example.yaml --e

4.2 使用预训练权重推理

首先下载预训练权重：https://drive.google.com/file/d/1jaGdvu_XFn5woX0eJ5I2R6wIcBLVMJV6/view

下载好的权重名称为：checkpoint_epoch_140.pth，新建一个文件夹monocon-pytorch-main/checkpoints/，存放权重

然后修改配置文件experiments/example/kitti_example.yaml

tester:

type: *dataset_type

mode: single # 'single' or 'all'

checkpoint: '../../checkpoints/checkpoint_epoch_140.pth' # for 'single' mode

checkpoints_dir: '../../checkpoints' # for 'all' model

threshold: 0.2 # confidence filter

最后执行命令，模型推理示例：

python ../../tools/train_val.py --config kitti_example.yaml --e

会打印信息：

(MonoDLE) root@8677bec7ab74:/guopu/monodle-main/experiments/example# python ../../tools/train_val.py --config kitti_example.yaml --e
2023-10-15 14:12:24,658 INFO ################### Evaluation Only ##################
2023-10-15 14:12:24,658 INFO ==> Loading from checkpoint '../../checkpoints/checkpoint_epoch_140.pth'
2023-10-15 14:12:27,092 INFO ==> Done
Evaluation Progress: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 472/472 [03:26<00:00, 2.29it/s]
2023-10-15 14:15:54,147 INFO ==> Saving ...
2023-10-15 14:15:54,649 INFO ==> Loading detections and GTs...
2023-10-15 14:15:55,746 INFO ==> Evaluating (official) ...

.....

2023-10-15 14:16:25,506 INFO Car AP@0.70, 0.70, 0.70:
bbox AP:90.1217, 88.3670, 79.8853
bev AP:31.2712, 24.7619, 23.4836
3d AP:23.7493, 20.7087, 17.9959
aos AP:89.09, 87.18, 78.04

Car AP_R40@0.70, 0.70, 0.70:
bbox AP:95.9642, 91.8784, 84.7531
bev AP:25.8910, 20.8330, 18.1531
3d AP:18.2593, 14.5657, 12.9989
aos AP:94.80, 90.55, 82.54

Car AP@0.70, 0.50, 0.50:
bbox AP:90.1217, 88.3670, 79.8853
bev AP:61.6387, 50.2435, 44.7139
3d AP:57.7730, 44.3736, 42.4333
aos AP:89.09, 87.18, 78.04

Car AP_R40@0.70, 0.50, 0.50:
bbox AP:95.9642, 91.8784, 84.7531
bev AP:61.4324, 47.3653, 41.9808
3d AP:56.0393, 42.8401, 38.6675
aos AP:94.80, 90.55, 82.54
.....

推理完成后，结果存放在experiments/example/outputs/data

五、可视化3D检测结果

由于开源代码，没有可视化推理结果，首先观察 experiments/example/outputs/data 目录的txt文件，以为000002.txt例

Car 0.0 0 1.28 661.70 192.01 701.36 225.01 1.54 1.61 3.64 2.93 2.22 30.01 1.38 0.05

其实生成的结果，和kitii标签格式是一致的。

然后准备kitti的val集相机标定参数，和图片。这里新建一个vis目录，用于可视化3D检测结果。

在vis目录包括：

dataset 存放相机标定数据、图片、推理结果

save_3d_output 存放可视化图片

kitti_3d_vis.py 可视化运行此代码

kitti_util.py 依赖代码

主代码 kitti_3d_vis.py

# kitti_3d_vis.pyfrom __future__ import print_functionimport os
import sys
import cv2
import random
import os.path
import shutil
from PIL import Image
BASE_DIR = os.path.dirname(os.path.abspath(__file__))
ROOT_DIR = os.path.dirname(BASE_DIR)
sys.path.append(BASE_DIR)
sys.path.append(os.path.join(ROOT_DIR, 'mayavi'))
from kitti_util import *def visualization():import mayavi.mlab as mlabdataset = kitti_object(r'./dataset/')path = r'./dataset/testing/label_2/'Save_Path = r'./save_3d_output/'files = os.listdir(path)for file in files:name = file.split('.')[0]save_path = Save_Path + name + '.png'data_idx = int(name)# Load data from datasetobjects = dataset.get_label_objects(data_idx)img = dataset.get_image(data_idx)img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)calib = dataset.get_calibration(data_idx)print(' ------------ save image with 3D bounding box ------- ')print('name:', name)show_image_with_boxes(img, objects, calib, save_path, True)if __name__=='__main__':visualization()

依赖代码 kitti_util.py

# kitti_util.pyfrom __future__ import print_functionimport os
import sys
import cv2
import numpy as np
from PIL import Image
import matplotlib.pyplot as plt
BASE_DIR = os.path.dirname(os.path.abspath(__file__))
ROOT_DIR = os.path.dirname(BASE_DIR)
sys.path.append(os.path.join(ROOT_DIR, 'mayavi'))class kitti_object(object):def __init__(self, root_dir, split='testing'):self.root_dir = root_dirself.split = splitself.split_dir = os.path.join(root_dir, split)if split == 'training':self.num_samples = 7481elif split == 'testing':self.num_samples = 7518else:print('Unknown split: %s' % (split))exit(-1)self.image_dir = os.path.join(self.split_dir, 'image_2')self.calib_dir = os.path.join(self.split_dir, 'calib')self.label_dir = os.path.join(self.split_dir, 'label_2')def __len__(self):return self.num_samplesdef get_image(self, idx):assert(idx<self.num_samples) img_filename = os.path.join(self.image_dir, '%06d.png'%(idx))return load_image(img_filename)def get_calibration(self, idx):assert(idx<self.num_samples) calib_filename = os.path.join(self.calib_dir, '%06d.txt'%(idx))return Calibration(calib_filename)def get_label_objects(self, idx):# assert(idx<self.num_samples and self.split=='training') label_filename = os.path.join(self.label_dir, '%06d.txt'%(idx))return read_label(label_filename)def show_image_with_boxes(img, objects, calib, save_path, show3d=True):''' Show image with 2D bounding boxes '''img1 = np.copy(img) # for 2d bboximg2 = np.copy(img) # for 3d bboxfor obj in objects:if obj.type=='DontCare':continuecv2.rectangle(img1, (int(obj.xmin),int(obj.ymin)), (int(obj.xmax),int(obj.ymax)), (0,255,0), 2) # 画2D框box3d_pts_2d, box3d_pts_3d = compute_box_3d(obj, calib.P) # 获取3D框-图像(8*2)、3D框-相机坐标系(8*3)img2 = draw_projected_box3d(img2, box3d_pts_2d) # 在图像上画3D框if show3d:Image.fromarray(img2).save(save_path) # 保存带有3D框的图像# Image.fromarray(img2).show()else:Image.fromarray(img1).save(save_path) # 保存带有2D框的图像# Image.fromarray(img1).show()class Object3d(object):''' 3d object label '''def __init__(self, label_file_line):data = label_file_line.split(' ')data[1:] = [float(x) for x in data[1:]]# extract label, truncation, occlusionself.type = data[0] # 'Car', 'Pedestrian', ...self.truncation = data[1] # truncated pixel ratio [0..1]self.occlusion = int(data[2]) # 0=visible, 1=partly occluded, 2=fully occluded, 3=unknownself.alpha = data[3] # object observation angle [-pi..pi]# extract 2d bounding box in 0-based coordinatesself.xmin = data[4] # leftself.ymin = data[5] # topself.xmax = data[6] # rightself.ymax = data[7] # bottomself.box2d = np.array([self.xmin,self.ymin,self.xmax,self.ymax])# extract 3d bounding box informationself.h = data[8] # box heightself.w = data[9] # box widthself.l = data[10] # box length (in meters)self.t = (data[11],data[12],data[13]) # location (x,y,z) in camera coord.self.ry = data[14] # yaw angle (around Y-axis in camera coordinates) [-pi..pi]def print_object(self):print('Type, truncation, occlusion, alpha: %s, %d, %d, %f' % \(self.type, self.truncation, self.occlusion, self.alpha))print('2d bbox (x0,y0,x1,y1): %f, %f, %f, %f' % \(self.xmin, self.ymin, self.xmax, self.ymax))print('3d bbox h,w,l: %f, %f, %f' % \(self.h, self.w, self.l))print('3d bbox location, ry: (%f, %f, %f), %f' % \(self.t[0],self.t[1],self.t[2],self.ry))class Calibration(object):''' Calibration matrices and utils3d XYZ in <label>.txt are in rect camera coord.2d box xy are in image2 coordPoints in <lidar>.bin are in Velodyne coord.y_image2 = P^2_rect * x_recty_image2 = P^2_rect * R0_rect * Tr_velo_to_cam * x_velox_ref = Tr_velo_to_cam * x_velox_rect = R0_rect * x_refP^2_rect = [f^2_u,  0,      c^2_u,  -f^2_u b^2_x;0,      f^2_v,  c^2_v,  -f^2_v b^2_y;0,      0,      1,      0]= K * [1|t]image2 coord:----> x-axis (u)||v y-axis (v)velodyne coord:front x, left y, up zrect/ref camera coord:right x, down y, front zRef (KITTI paper): http://www.cvlibs.net/publications/Geiger2013IJRR.pdfTODO(rqi): do matrix multiplication only once for each projection.'''def __init__(self, calib_filepath, from_video=False):if from_video:calibs = self.read_calib_from_video(calib_filepath)else:calibs = self.read_calib_file(calib_filepath)# Projection matrix from rect camera coord to image2 coordself.P = calibs['P2'] self.P = np.reshape(self.P, [3,4])# Rigid transform from Velodyne coord to reference camera coordself.V2C = calibs['Tr_velo_to_cam']self.V2C = np.reshape(self.V2C, [3,4])self.C2V = inverse_rigid_trans(self.V2C)# Rotation from reference camera coord to rect camera coordself.R0 = calibs['R0_rect']self.R0 = np.reshape(self.R0,[3,3])# Camera intrinsics and extrinsicsself.c_u = self.P[0,2]self.c_v = self.P[1,2]self.f_u = self.P[0,0]self.f_v = self.P[1,1]self.b_x = self.P[0,3]/(-self.f_u) # relative self.b_y = self.P[1,3]/(-self.f_v)def read_calib_file(self, filepath):''' Read in a calibration file and parse into a dictionary.'''data = {}with open(filepath, 'r') as f:for line in f.readlines():line = line.rstrip()if len(line)==0: continuekey, value = line.split(':', 1)# The only non-float values in these files are dates, which# we don't care about anywaytry:data[key] = np.array([float(x) for x in value.split()])except ValueError:passreturn datadef read_calib_from_video(self, calib_root_dir):''' Read calibration for camera 2 from video calib files.there are calib_cam_to_cam and calib_velo_to_cam under the calib_root_dir'''data = {}cam2cam = self.read_calib_file(os.path.join(calib_root_dir, 'calib_cam_to_cam.txt'))velo2cam = self.read_calib_file(os.path.join(calib_root_dir, 'calib_velo_to_cam.txt'))Tr_velo_to_cam = np.zeros((3,4))Tr_velo_to_cam[0:3,0:3] = np.reshape(velo2cam['R'], [3,3])Tr_velo_to_cam[:,3] = velo2cam['T']data['Tr_velo_to_cam'] = np.reshape(Tr_velo_to_cam, [12])data['R0_rect'] = cam2cam['R_rect_00']data['P2'] = cam2cam['P_rect_02']return datadef cart2hom(self, pts_3d):''' Input: nx3 points in CartesianOupput: nx4 points in Homogeneous by pending 1'''n = pts_3d.shape[0]pts_3d_hom = np.hstack((pts_3d, np.ones((n,1))))return pts_3d_hom# =========================== # ------- 3d to 3d ---------- # =========================== def project_velo_to_ref(self, pts_3d_velo):pts_3d_velo = self.cart2hom(pts_3d_velo) # nx4return np.dot(pts_3d_velo, np.transpose(self.V2C))def project_ref_to_velo(self, pts_3d_ref):pts_3d_ref = self.cart2hom(pts_3d_ref) # nx4return np.dot(pts_3d_ref, np.transpose(self.C2V))def project_rect_to_ref(self, pts_3d_rect):''' Input and Output are nx3 points '''return np.transpose(np.dot(np.linalg.inv(self.R0), np.transpose(pts_3d_rect)))def project_ref_to_rect(self, pts_3d_ref):''' Input and Output are nx3 points '''return np.transpose(np.dot(self.R0, np.transpose(pts_3d_ref)))def project_rect_to_velo(self, pts_3d_rect):''' Input: nx3 points in rect camera coord.Output: nx3 points in velodyne coord.''' pts_3d_ref = self.project_rect_to_ref(pts_3d_rect)return self.project_ref_to_velo(pts_3d_ref)def project_velo_to_rect(self, pts_3d_velo):pts_3d_ref = self.project_velo_to_ref(pts_3d_velo)return self.project_ref_to_rect(pts_3d_ref)def corners3d_to_img_boxes(self, corners3d):""":param corners3d: (N, 8, 3) corners in rect coordinate:return: boxes: (None, 4) [x1, y1, x2, y2] in rgb coordinate:return: boxes_corner: (None, 8) [xi, yi] in rgb coordinate"""sample_num = corners3d.shape[0]corners3d_hom = np.concatenate((corners3d, np.ones((sample_num, 8, 1))), axis=2)  # (N, 8, 4)img_pts = np.matmul(corners3d_hom, self.P.T)  # (N, 8, 3)x, y = img_pts[:, :, 0] / img_pts[:, :, 2], img_pts[:, :, 1] / img_pts[:, :, 2]x1, y1 = np.min(x, axis=1), np.min(y, axis=1)x2, y2 = np.max(x, axis=1), np.max(y, axis=1)boxes = np.concatenate((x1.reshape(-1, 1), y1.reshape(-1, 1), x2.reshape(-1, 1), y2.reshape(-1, 1)), axis=1)boxes_corner = np.concatenate((x.reshape(-1, 8, 1), y.reshape(-1, 8, 1)), axis=2)return boxes, boxes_corner# =========================== # ------- 3d to 2d ---------- # =========================== def project_rect_to_image(self, pts_3d_rect):''' Input: nx3 points in rect camera coord.Output: nx2 points in image2 coord.'''pts_3d_rect = self.cart2hom(pts_3d_rect)pts_2d = np.dot(pts_3d_rect, np.transpose(self.P)) # nx3pts_2d[:,0] /= pts_2d[:,2]pts_2d[:,1] /= pts_2d[:,2]return pts_2d[:,0:2]def project_velo_to_image(self, pts_3d_velo):''' Input: nx3 points in velodyne coord.Output: nx2 points in image2 coord.'''pts_3d_rect = self.project_velo_to_rect(pts_3d_velo)return self.project_rect_to_image(pts_3d_rect)# =========================== # ------- 2d to 3d ---------- # =========================== def project_image_to_rect(self, uv_depth):''' Input: nx3 first two channels are uv, 3rd channelis depth in rect camera coord.Output: nx3 points in rect camera coord.'''n = uv_depth.shape[0]x = ((uv_depth[:,0]-self.c_u)*uv_depth[:,2])/self.f_u + self.b_xy = ((uv_depth[:,1]-self.c_v)*uv_depth[:,2])/self.f_v + self.b_ypts_3d_rect = np.zeros((n,3))pts_3d_rect[:,0] = xpts_3d_rect[:,1] = ypts_3d_rect[:,2] = uv_depth[:,2]return pts_3d_rectdef project_image_to_velo(self, uv_depth):pts_3d_rect = self.project_image_to_rect(uv_depth)return self.project_rect_to_velo(pts_3d_rect)def rotx(t):''' 3D Rotation about the x-axis. '''c = np.cos(t)s = np.sin(t)return np.array([[1,  0,  0],[0,  c, -s],[0,  s,  c]])def roty(t):''' Rotation about the y-axis. '''c = np.cos(t)s = np.sin(t)return np.array([[c,  0,  s],[0,  1,  0],[-s, 0,  c]])def rotz(t):''' Rotation about the z-axis. '''c = np.cos(t)s = np.sin(t)return np.array([[c, -s,  0],[s,  c,  0],[0,  0,  1]])def transform_from_rot_trans(R, t):''' Transforation matrix from rotation matrix and translation vector. '''R = R.reshape(3, 3)t = t.reshape(3, 1)return np.vstack((np.hstack([R, t]), [0, 0, 0, 1]))def inverse_rigid_trans(Tr):''' Inverse a rigid body transform matrix (3x4 as [R|t])[R'|-R't; 0|1]'''inv_Tr = np.zeros_like(Tr) # 3x4inv_Tr[0:3,0:3] = np.transpose(Tr[0:3,0:3])inv_Tr[0:3,3] = np.dot(-np.transpose(Tr[0:3,0:3]), Tr[0:3,3])return inv_Trdef read_label(label_filename):lines = [line.rstrip() for line in open(label_filename)]objects = [Object3d(line) for line in lines]return objectsdef load_image(img_filename):return cv2.imread(img_filename)def load_velo_scan(velo_filename):scan = np.fromfile(velo_filename, dtype=np.float32)scan = scan.reshape((-1, 4))return scandef project_to_image(pts_3d, P):'''将3D坐标点投影到图像平面上，生成2D坐pts_3d是一个nx3的矩阵，包含了待投影的3D坐标点（每行一个点），P是相机的投影矩阵，通常是一个3x4的矩阵。函数返回一个nx2的矩阵，包含了投影到图像平面上的2D坐标点。'''''' Project 3d points to image plane.Usage: pts_2d = projectToImage(pts_3d, P)input: pts_3d: nx3 matrixP:      3x4 projection matrixoutput: pts_2d: nx2 matrixP(3x4) dot pts_3d_extended(4xn) = projected_pts_2d(3xn)=> normalize projected_pts_2d(2xn)<=> pts_3d_extended(nx4) dot P'(4x3) = projected_pts_2d(nx3)=> normalize projected_pts_2d(nx2)'''n = pts_3d.shape[0] # 获取3D点的数量pts_3d_extend = np.hstack((pts_3d, np.ones((n,1)))) # 将每个3D点的坐标扩展为齐次坐标形式（4D），通过在每个点的末尾添加1，创建了一个nx4的矩阵。# print(('pts_3d_extend shape: ', pts_3d_extend.shape))pts_2d = np.dot(pts_3d_extend, np.transpose(P)) # 将扩展的3D坐标点矩阵与投影矩阵P相乘，得到一个nx3的矩阵，其中每一行包含了3D点在图像平面上的投影坐标。每个点的坐标表示为[x, y, z]。pts_2d[:,0] /= pts_2d[:,2] # 将投影坐标中的x坐标除以z坐标，从而获得2D图像上的x坐标。pts_2d[:,1] /= pts_2d[:,2] # 将投影坐标中的y坐标除以z坐标，从而获得2D图像上的y坐标。return pts_2d[:,0:2] # 返回一个nx2的矩阵,其中包含了每个3D点在2D图像上的坐标。def compute_box_3d(obj, P):'''计算对象的3D边界框在图像平面上的投影输入: obj代表一个物体标签信息,  P代表相机的投影矩阵-内参。输出: 返回两个值, corners_3d表示3D边界框在 相机坐标系 的8个角点的坐标-3D坐标。corners_2d表示3D边界框在 图像上 的8个角点的坐标-2D坐标。'''# compute rotational matrix around yaw axis# 计算一个绕Y轴旋转的旋转矩阵R，用于将3D坐标从世界坐标系转换到相机坐标系。obj.ry是对象的偏航角R = roty(obj.ry)    # 3d bounding box dimensions# 物体实际的长、宽、高l = obj.l;w = obj.w;h = obj.h;# 3d bounding box corners# 存储了3D边界框的8个角点相对于对象中心的坐标。这些坐标定义了3D边界框的形状。x_corners = [l/2,l/2,-l/2,-l/2,l/2,l/2,-l/2,-l/2];y_corners = [0,0,0,0,-h,-h,-h,-h];z_corners = [w/2,-w/2,-w/2,w/2,w/2,-w/2,-w/2,w/2];# rotate and translate 3d bounding box# 1、将3D边界框的角点坐标从对象坐标系转换到相机坐标系。它使用了旋转矩阵Rcorners_3d = np.dot(R, np.vstack([x_corners,y_corners,z_corners]))# 3D边界框的坐标进行平移corners_3d[0,:] = corners_3d[0,:] + obj.t[0];corners_3d[1,:] = corners_3d[1,:] + obj.t[1];corners_3d[2,:] = corners_3d[2,:] + obj.t[2];# 2、检查对象是否在相机前方，因为只有在相机前方的对象才会被绘制。# 如果对象的Z坐标（深度）小于0.1，就意味着对象在相机后方，那么corners_2d将被设置为None，函数将返回None。if np.any(corners_3d[2,:]<0.1):corners_2d = Nonereturn corners_2d, np.transpose(corners_3d)# project the 3d bounding box into the image plane# 3、将相机坐标系下的3D边界框的角点，投影到图像平面上，得到它们在图像上的2D坐标。corners_2d = project_to_image(np.transpose(corners_3d), P);return corners_2d, np.transpose(corners_3d)def compute_orientation_3d(obj, P):''' Takes an object and a projection matrix (P) and projects the 3dobject orientation vector into the image plane.Returns:orientation_2d: (2,2) array in left image coord.orientation_3d: (2,3) array in in rect camera coord.'''# compute rotational matrix around yaw axisR = roty(obj.ry)# orientation in object coordinate systemorientation_3d = np.array([[0.0, obj.l],[0,0],[0,0]])# rotate and translate in camera coordinate system, project in imageorientation_3d = np.dot(R, orientation_3d)orientation_3d[0,:] = orientation_3d[0,:] + obj.t[0]orientation_3d[1,:] = orientation_3d[1,:] + obj.t[1]orientation_3d[2,:] = orientation_3d[2,:] + obj.t[2]# vector behind image plane?if np.any(orientation_3d[2,:]<0.1):orientation_2d = Nonereturn orientation_2d, np.transpose(orientation_3d)# project orientation into the image planeorientation_2d = project_to_image(np.transpose(orientation_3d), P);return orientation_2d, np.transpose(orientation_3d)def draw_projected_box3d(image, qs, color=(0,60,255), thickness=2):'''qs: 包含8个3D边界框角点坐标的数组, 形状为(8, 2)。图像坐标下的3D框, 8个顶点坐标。'''''' Draw 3d bounding box in imageqs: (8,2) array of vertices for the 3d box in following order:1 -------- 0/|         /|2 -------- 3 .| |        | |. 5 -------- 4|/         |/6 -------- 7'''qs = qs.astype(np.int32) # 将输入的顶点坐标转换为整数类型，以便在图像上绘制。# 这个循环迭代4次，每次处理一个边界框的一条边。for k in range(0,4):# Ref: http://docs.enthought.com/mayavi/mayavi/auto/mlab_helper_functions.html# 定义了要绘制的边的起始点和结束点的索引。在这个循环中，它用于绘制边界框的前四条边。i,j=k,(k+1)%4cv2.line(image, (qs[i,0],qs[i,1]), (qs[j,0],qs[j,1]), color, thickness)# 定义了要绘制的边的起始点和结束点的索引。在这个循环中，它用于绘制边界框的后四条边，与前四条边平行i,j=k+4,(k+1)%4 + 4cv2.line(image, (qs[i,0],qs[i,1]), (qs[j,0],qs[j,1]), color, thickness)# 定义了要绘制的边的起始点和结束点的索引。在这个循环中，它用于绘制连接前四条边和后四条边的边界框的边。i,j=k,k+4cv2.line(image, (qs[i,0],qs[i,1]), (qs[j,0],qs[j,1]), color, thickness)return image

运行后会在save_3d_output中保存可视化的图像。

模型推理结果可视化效果：

分享完成~

【数据集】单目3D目标检测：

3D目标检测数据集 KITTI（标签格式解析、3D框可视化、点云转图像、BEV鸟瞰图）_kitti标签_一颗小树x的博客-CSDN博客

3D目标检测数据集 DAIR-V2X-V_一颗小树x的博客-CSDN博客

【论文解读】单目3D目标检测：

【论文解读】SMOKE 单目相机 3D目标检测（CVPR2020）_相机smoke-CSDN博客

【论文解读】单目3D目标检测 MonoDLE（CVPR2021）_一颗小树x的博客-CSDN博客

【论文解读】单目3D目标检测 MonoCon（AAAI2022）_一颗小树x的博客-CSDN博客

【实践应用】

单目3D目标检测——SMOKE 环境搭建|模型训练_一颗小树x的博客-CSDN博客

单目3D目标检测——SMOKE 模型推理 | 可视化结果-CSDN博客

单目3D目标检测——MonoCon 模型训练 | 模型推理-CSDN博客

后面计划分享，实时性的单目3D目标检测：MonoFlex、MonoEF、MonoDistillI、GUPNet、DEVIANT等

单目3D目标检测——MonoDLE 模型训练 | 模型推理

本文分享 MonoDLE 的模型训练、模型推理、可视化3D检测结果。模型原理，参考我这篇博客：【论文解读】单目3D目标检测 MonoDLE（CVPR2021）_一颗小树x的博客-CSDN博客源码地址：https://github.com/xinzhuma/monodle 目…...

编程日记 2023/10/16 20:50:05

CSS悬停卡片翻转明信片效果源码附注释

运行效果演示： HTML页面代码： <!DOCTYPE html> <html lang="en" > <head>...

编程日记 2023/10/16 20:49:04

使用kaliber与imu_utils进行IMU、相机+IMU联合标定

目录 1 标定工具编译 1.1 IMU标定工具 imu_utils 1.2 相机标定工具 kaliber 2 标定数据录制 3 开始标定 3.1 IMU标定 3.2 相机标定 3.3 相机IMU联合标定 4 将参数填入ORBSLAM的文件中 1 标定工具编译 1.1 IMU标定工具 imu_utils 标定IMU我们使用imu_utils软件进行标定…...

编程日记 2023/10/16 20:48:02

统一观测丨使用 Prometheus 监控 SQL Server 最佳实践

作者：啃唯 SQL Server 简介 SQL Server 是什么？ Microsoft SQL Server 是 Microsoft 推出的关系型数据库解决方案，支持企业 IT 环境中的各种事务处理、商业智能和分析应用程序。Microsoft SQL Server 是市场领先的数据库技术之一。 SQL S…...

编程日记 2023/10/16 20:47:01

最短无序连续子数组

题目链接最短无序连续子数组题目描述注意点找出符合题意的最短子数组，并输出它的长度-100000 < nums[i] < 100000 解答思路本题的数组可以分为三段，左段中段和右段，如下图所示观察规律可知，左段元素始终比中段…...

编程日记 2023/10/16 20:46:00

更新 | 持续开源迅为RK3568驱动指南第十二篇-GPIO子系统

《iTOP-RK3568开发板驱动开发指南》更新，本次更新内容对应的是驱动（第十二期_GPIO子系统-全新升级）视频，后续资料会不断更新，不断完善，帮助用户快速入门，大大提升研发速度。文档教程更新至第十…...

编程日记 2023/10/16 20:43:57

centos7安装erlang23.3.4.11及rabbitmq3.9.16版本

rpm包有系统版本要求，el是Red Hat Enterprise Linux(EL)的缩写。 EL7是Red Hat 7.x，Centos 7.x EL8是Red Hat 8.x, Centos 8.x 所以我们在安装erlang及rabbitmq时需要选择与自己的服务器相对应的rpm包 # rabbitmq的rpm安装包 https://github.com/rabbi…...

编程日记 2023/10/16 20:41:54

VMware和Debian下载

文章目录 ⭐️写在前面的话⭐️一、VMware二、Debain三、建立虚拟机🚀 先看后赞，养成习惯！🚀🚀 先看后赞，养成习惯！🚀 ⭐️写在前面的话⭐️ CSDN主页：程序员好冰目前在…...

编程日记 2023/10/16 20:40:52

mysql面试题48：MySQL中 Innodb的事务与日志的实现方式

该文章专注于面试，面试只要回答关键点即可，不需要对框架有非常深入的回答，如果你想应付面试，是足够了，抓住关键点面试官： Innodb的事务与日志的实现方式以下是InnoDB事务和日志的实现方式的详细说明：事务日志（Transaction Log）： InnoDB使用事务日志来保证事务的…...

编程日记 2023/10/16 20:39:52

数据结构优先级队列(堆)

数据结构优先级队列(堆) 文章目录数据结构优先级队列(堆)1. 优先级队列1.1 概念 2. 优先级队列的模拟实现2.1 堆的概念2.2 堆的存储方式2.3 堆的创建2.3.1 堆向下调整2.3.2 堆的创建2.3.3 建堆的时间复杂度 2.4 堆的插入与删除2.4.1 堆的插入2.4.2 堆的删除 2.5 用堆模拟实现…...

编程日记 2023/10/16 20:38:50

如何在edge浏览器中给PDF添加文字批注

我用的edge浏览器是目前最新版的（一般自动更新到最新版） 最近，我喜欢用edge浏览器查看PDF，节省电脑资源，快捷且方便。但edge对PDF的标注种类较少，主要是划线和涂色，文字批注功能尚未出现在工具…...

编程日记 2023/10/16 20:37:49

集成学习的小九九

集成学习（Ensemble Learning）是一种机器学习的方法，通过结合多个基本模型的预测结果来进行决策或预测。集成学习的目标是通过组合多个模型的优势，并弥补单个模型的不足，从而提高整体性能。集成学习的主要策略在集成…...

编程日记 2023/10/16 20:36:49

Scrapy是什么 An open source and collaborative framework for extracting the data you need from websites. In a fast, simple, yet extensible way. Scrapy是适用于Python的一个快速、简单、功能强大的web爬虫框架，通常用于抓取web站点并从页面中提取结构化的数…...

编程日记 2023/10/16 20:34:46

想做WMS仓库管理系统，找了好久才找到云表

公司内部仓库管理原方式均基于人工电子表格管理方式来实现收发存管理，没有流程化管理，无法保证数据的准确性和及时性，同时现场操作和数据核对会出现不同步的情况，无法提高仓库的运作效率，因此，我们基于云表…...

编程日记 2023/10/16 20:33:45

公司销售个人号如何管理？

微信管理系统可以帮助企业解决哪些问题呢？ 一、解决聊天记录监管问题 1.聊天记录的保存，让公司的管理者可以随时查看公司任意销售与客户的聊天记录，不用一个一个员工逐一去看，方便管理； 2.敏感词监控，管理者…...

编程日记 2023/10/16 20:30:43

COLE HERSEE 48408 工业4.0、制造业X和元宇宙

COLE HERSEE 48408 工业4.0、制造业X和元宇宙需要数据来释放工业4.0的全部潜力——价值链中的所有公司都可以访问大量数据。一个新的互联数据生态系统旨在提供解决方案:制造业x。在德国联邦经济事务和气候行动部以及BDI、VDMA和ZVEI贸易协会的密切合作下，实施制…...

编程日记 2023/10/16 20:29:42

【Vue基础-数字大屏】加载动漫效果

一、需求描述当网页正在加载而处于空白页面状态时，可以在该页面上显示加载动画提示。二、步骤代码 1、全局下载npm install -g json-server npm install -g json-server 2、在src目录下新建文件夹mock，新建文件data.json存放模拟数据 {"one&…...

编程日记 2023/10/16 20:27:40

CSS 样式简写

在CSS中有许多简写的样式，它们被广泛使用。简写最好按照如下顺序进行书写 font font: font-style font-weight font-size/line-height font-familyfont-style italic//斜体 normal//正常字体(默认)font-weight 一般填写数字 400 normal(默认值) 700 bold(默认值)f…...

编程日记 2023/10/16 20:26:39

SQL Server创建数据库

简单创建写法默认初始大小为5MB,增长速度为2MB create database DBTEST自定义用户创建的数据库都被存放在sys.database中，每个数据库在表中占一行，name字段存放的数据库的名称，具体字段可以看此博客sys.database系统表详细说明所以判断…...

编程日记 2023/10/16 20:24:35

树莓派安装.NET 6.0

首先安装.Net Core依赖（未使用） sudo apt install -y libunwind8 libunwind8-dev gettext libicu-dev liblttng-ust-dev libcurl4 libcurl4-openssl-dev libssl-dev uuid-dev unzip libgdiplus libc6-dev libkrb5-3 需要安装的依赖微软官方文档已经列出…...

编程日记 2023/10/16 20:23:34

网络六边形受到攻击

大家读完觉得有帮助记得关注和点赞！！！ 抽象现代智能交通系统 （ITS） 的一个关键要求是能够以安全、可靠和匿名的方式从互联车辆和移动设备收集地理参考数据。Nexagon 协议建立在 IETF 定位器/ID 分离协议 （…...

编程新知 2025/10/28 7:46:25

深入剖析AI大模型：大模型时代的 Prompt 工程全解析

今天聊的内容，我认为是AI开发里面非常重要的内容。它在AI开发里无处不在，当你对 AI 助手说 "用李白的风格写一首关于人工智能的诗"，或者让翻译模型 "将这段合同翻译成商务日语" 时，输入的这句话就是 Prompt。…...

编程新知 2025/10/26 18:53:34

日语学习-日语知识点小记-构建基础-JLPT-N4阶段（33）：にする

日语学习-日语知识点小记-构建基础-JLPT-N4阶段（33）：にする 1、前言（1）情况说明（2）工程师的信仰2、知识点（1）　にする1，接续：名词＋にする2，接续：疑问词＋にする3，（A）は（B）にする。（2）復習：（1）复习句子（2）ために　＆　ように（３）そう（４）にする3、…...

编程新知 2025/10/22 16:53:23

mongodb源码分析session执行handleRequest命令find过程

mongo/transport/service_state_machine.cpp已经分析startSession创建ASIOSession过程，并且验证connection是否超过限制ASIOSession和connection是循环接受客户端命令，把数据流转换成Message，状态转变流程是：State::Created 》 St…...

编程新知 2025/8/16 7:00:37

MySQL 8.0 OCP 英文题库解析（十三）

Oracle 为庆祝 MySQL 30 周年，截止到 2025.07.31 之前。所有人均可以免费考取原价245美元的MySQL OCP 认证。从今天开始，将英文题库免费公布出来，并进行解析，帮助大家在一个月之内轻松通过OCP认证。本期公布试题111~120 试题1…...

编程新知 2025/10/24 18:50:54

Go语言多线程问题

打印零与奇偶数（leetcode 1116） 方法1：使用互斥锁和条件变量 package mainimport ("fmt""sync" )type ZeroEvenOdd struct {n intzeroMutex sync.MutexevenMutex sync.MutexoddMutex sync.Mutexcurrent int…...

编程新知 2025/9/23 8:19:28

GitHub 趋势日报 (2025年06月06日)

📊 由 TrendForge 系统生成 | 🌐 https://trendforge.devlive.org/ 🌐 本日报中的项目描述已自动翻译为中文 📈 今日获星趋势图今日获星趋势图 590 cognee 551 onlook 399 project-based-learning 348 build-your-own-x 320 ne…...

编程新知 2025/9/7 2:12:04

android13 app的触摸问题定位分析流程

一、知识点一般来说，触摸问题都是app层面出问题，我们可以在ViewRootImpl.java添加log的方式定位；如果是touchableRegion的计算问题，就会相对比较麻烦了，需要通过adb shell dumpsys input > input.log指令，且通过打印堆栈的方式，逐步定位问题，并找到修改方案。问题…...

编程新知 2025/10/28 1:08:56

十九、【用户管理与权限 - 篇一】后端基础：用户列表与角色模型的初步构建

【用户管理与权限 - 篇一】后端基础：用户列表与角色模型的初步构建前言准备工作第一部分：回顾 Django 内置的 `User` 模型第二部分：设计并创建 `Role` 和 `UserProfile` 模型第三部分：创建 Serializers第四部分：创建 ViewSets第五部分：注册 API 路由第六部分：后端初步测…...

编程新知 2025/9/24 1:06:11

《Docker》架构

文章目录架构模式单机架构应用数据分离架构应用服务器集群架构读写分离/主从分离架构冷热分离架构垂直分库架构微服务架构容器编排架构什么是容器，docker，镜像，k8s 架构模式单机架构单机架构其实就是应用服务器和单机服务器都部署在同一…...

编程新知 2025/10/17 17:59:51

单目3D目标检测——MonoDLE 模型训练 | 模型推理

一、环境搭建

二、准备数据集

三、训练模型

四、模型推理

4.1 使用刚才训练的权重推理

4.2 使用预训练权重推理

五、可视化3D检测结果

相关文章：

单目3D目标检测——MonoDLE 模型训练 | 模型推理

CSS悬停卡片翻转明信片效果源码附注释

使用kaliber与imu_utils进行IMU、相机+IMU联合标定

统一观测丨使用 Prometheus 监控 SQL Server 最佳实践

最短无序连续子数组

更新 | 持续开源迅为RK3568驱动指南第十二篇-GPIO子系统

centos7安装erlang23.3.4.11及rabbitmq3.9.16版本

VMware和Debian下载

mysql面试题48：MySQL中 Innodb的事务与日志的实现方式

数据结构优先级队列(堆)

如何在edge浏览器中给PDF添加文字批注

集成学习的小九九

深入理解Scrapy

想做WMS仓库管理系统，找了好久才找到云表

公司销售个人号如何管理？

COLE HERSEE 48408 工业4.0、制造业X和元宇宙

【Vue基础-数字大屏】加载动漫效果

CSS 样式简写

SQL Server创建数据库

树莓派安装.NET 6.0

网络六边形受到攻击

深入剖析AI大模型：大模型时代的 Prompt 工程全解析

日语学习-日语知识点小记-构建基础-JLPT-N4阶段（33）：にする

mongodb源码分析session执行handleRequest命令find过程

MySQL 8.0 OCP 英文题库解析（十三）

Go语言多线程问题

GitHub 趋势日报 (2025年06月06日)

android13 app的触摸问题定位分析流程

十九、【用户管理与权限 - 篇一】后端基础：用户列表与角色模型的初步构建

《Docker》架构