Technical details of the implementation
The ResNet-based Keypoint Feature Pyramid Network (KFPN) that was proposed in RTM3D paper.
The unofficial implementation of the RTM3D paper by using PyTorch is here
Input:
(H, W, 3)
.Outputs:
(H/S, W/S, C)
where S=4
(the down-sample ratio), and C=3
(the number of classes)(H/S, W/S, 2)
(H/S, W/S, 2)
. The model estimates the imaginary and the real fraction (sin(yaw)
and cos(yaw)
values).(H/S, W/S, 3)
z
coordinate: (H/S, W/S, 1)
Targets: 7 degrees of freedom (7-DOF) of objects: (cx, cy, cz, l, w, h, θ)
cx, cy, cz
: The center coordinates.l, w, h
: length, width, height of the bounding box.θ
: The heading angle in radians of the bounding box.Objects: Cars, Pedestrians, Cyclists.
For main center heatmap: Used focal loss
For heading angle (yaw): The im
and re
fractions are directly regressed by using l1_loss
For z coordinate
and 3 dimensions
(height, width, length), I used balanced l1 loss
that was proposed by the paper
Libra R-CNN: Towards Balanced Learning for Object Detection
=1.0
for all)cosine
, initial learning rate: 0.001.16
(on a single GTX 1080Ti).3 × 3
max-pooling operation was applied on the center heat map, then only 50
predictions whosearctan
(imaginary fraction / real fraction)