Conformal Decision Theory:
Safe Autonomous Decisions Without Distributions

University of California, Berkeley

Robot planner using a conformal controller on the Stanford Drone Dataset. Future trajectories of humans are predicted online by a machine learning algorithm (not visualized). The robot planner finds an optimal spline through the scene and is penalized for being close to humans. This penalty is proportional to a conformal control variable, \(\lambda_t\), which is adjusted online by the conformal controller so the average distance from a human is no more than 2 meters. The orange, pink, and blue curves are the robot trajectory with different planners: the conformal controller, an aggressive planner (i.e., no reward for avoiding humans), and a conservative planner (i.e., a large reward for avoiding humans). The darkness of the lines indicates the passage of time. Illustrative pedestrian trajectories are plotted as arrows; only the yellow pedestrians affect the spline planner.

Abstract

We introduce Conformal Decision Theory, a framework for producing safe autonomous decisions despite imperfect machine learning predictions. Examples of such decisions are ubiquitous, from robot planning algorithms that rely on pedestrian predictions, to calibrating autonomous manufacturing to be high throughput but low error, to the choice of trusting a nominal policy versus switching to a safe backup policy at run-time. The decisions produced by our algorithms are safe in the sense that they come with provable statistical guarantees of having low risk without any assumptions on the world model whatsoever; the observations need not be I.I.D. and can even be adversarial. The theory extends results from conformal prediction to calibrate decisions directly, without requiring the construction of prediction sets. Experiments demonstrate the utility of our approach in robot motion planning around humans, automated stock trading, and robot manufacturing.

Stanford Drone Dataset Experiments

Here we study the problem of robot navigation around people, which must balance safety (i.e., not colliding with humans) and efficiency (i.e., robot making progress towards goal). To ensure that the risk of collision is low while still making progress to the goal, the robot will constantly calibrate its reward function at run-time using a conformal controller.

The robot plans via model predictive control and at each timestep fits a maximum-reward spline subject to the robot's dynamics constraints, which is a non-linear Dubins car. Results are on the nexus_4 scenario from the Stanford Drone Dataset. The robot's goal is to cross the nexus while avoiding pedestrians. Safety was violated if the robot collided with a human. At all learning rates \(\eta\), the conformal controller is more efficient at navigation than ACI in terms of time. It remains safe so long as the learning rate is set high enough so that the robot planner can quickly adapt to nearby humans; when the learning rate is set too low, (near zero), proximity to humans is effectively not penalized, leading to collisions.

Conformal Controller

Calibrates the decision risk by adapting the conformal control variable: \(\lambda_{t+1} = \lambda_{t} + \eta \left( \epsilon - \ell_t \right) \)

ACI [1]

First calibrates the prediction sets via adapting the \(\alpha_t\) using ACI, then plans with respect to these sets.

Conservative

Keeps a constant maximum value of the conformal control variable: \( \lambda_t = 1 \quad \forall t \in [T] \)

Quantitative Results


Results on the nexus_4 scenario from SDD. Robot's goal is to cross the nexus while avoiding pedestrians. Safety was violated if the robot collided with a human. At all learning rates \(\eta\), the conformal controller is more efficient at navigation than ACI in terms of time. It remains safe so long as the learning rate is set high enough so that the robot planner can quickly adapt to nearby humans; when the learning rate is set too low, (near zero), proximity to humans is effectively not penalized, leading to collisions.

Qualitative Analysis


Visualization of interaction over time (left to right). (top) With our conformal controller (CC), the robot always makes progress towards its goal while remaining safe, even when blocked by crowds of people. (bottom) The ACI baseline calibrates the prediction sets. As soon as a mis-prediction happens, ACI expands the prediction sets to obtain coverage, but this frequently blocks the robot from moving anywhere (see \(t=10 s\)) even though the mis-predictions happened for a pedestrian very far away that wasn't interfering with the robot's plan.



Parameter plot for visualization above. (top) \(\lambda_t\) (calibrated by CC) and \(\alpha_t\) (used by ACI to calibrate prediction sets) over time. When \(\alpha_t \leq 0\), ACI returns infinite set and the robot stops. (Bottom) Distance to the nearest human over time. \(\lambda_t\) is large when the robot is close to a human, while \(\alpha_t\) is unrelated. \(\lambda_t\) trajectory is shorter because it reaches the goal faster.

BibTeX

@article{lekeufack2024decision,
  author    = {Lekeufack, Jordan, and Angelopoulos, Anastasios N, and Bajcsy, Andrea, and Jordan, Michael I., and Malik, Jitendra},
  title     = {Conformal Decision Theory: Safe Autonomous Decisions Without Distributions},
  journal   = {arXiv},
  year      = {2024},
}