Vision-Only Robot Navigation in a Neural Radiance World

Abstract

Neural Radiance Fields (NeRFs) have recently emerged as a powerful paradigm for the representation of natural, complex 3D scenes. NeRFs represent continuous volumetric density and RGB values in a neural network, and generate photo-realistic images from unseen camera viewpoints through ray tracing. We propose an algorithm for navigating a robot through a 3D environment represented as a NeRF using only an on-board RGB camera for localization. We assume the NeRF for the scene has been pre-trained offline, and the robot's objective is to navigate through unoccupied space in the NeRF to reach a goal pose. We introduce a trajectory optimization algorithm that avoids collisions with high-density regions in the NeRF based on a discrete time version of differential flatness that is amenable to constraining the robot's full pose and control inputs. We also introduce an optimization based filtering method to estimate 6DoF pose and velocities for the robot in the NeRF given only an onboard RGB camera. We combine the trajectory planner with the pose filter in an online replanning loop to give a vision-based robot navigation pipeline. We present simulation results with a quadrotor robot navigating through a jungle gym environment, the inside of a church, and Stonehenge using only an RGB camera. We also demonstrate an omnidirectional ground robot navigating through the church, requiring it to reorient to fit through the narrow gap.

[abstract]

Trajectory Optimizer

We propose a trajectory optimizer which uses a NeRF representation of the environment to encode obstacle geometry. As shown above, we initialize our planner with an A* search, and then use gradient descent to refine the trajectory to one which is dynamically feasible and minimizes both collision and control effort.

State Estimator

We also propose a NeRF-based recursive state estimator, which uses the neural rendering process as a differentiable measurement model. Our method shows fast convergence to the true state (shown above, as the transparent “estimated” image, generated via neural rendering, converges to the opaque ground truth image), and is able to estimate the full robot state, including both 6D pose as well as linear and angular rates.

MPC Controller

A drone flies through Stonehenge. Using our proposed method, it can replan online to account for unexpected disturbances and avoid collision.

We can combine the trajectory optimiser and state estimator into a controller capable of localising and dynamically re-optimising the trajectory to reject noise and disturbances.