oai:arXiv.org:2408.03093
Computer Science
2024
06-11-2024
We present a data-driven approach for producing policies that are provably robust across unknown stochastic environments.
Existing approaches can learn models of a single environment as an interval Markov decision processes (IMDP) and produce a robust policy with a probably approximately correct (PAC) guarantee on its performance.
However these are unable to reason about the impact of environmental parameters underlying the uncertainty.
We propose a framework based on parametric Markov decision processes (MDPs) with unknown distributions over parameters.
We learn and analyse IMDPs for a set of unknown sample environments induced by parameters.
The key challenge is then to produce meaningful performance guarantees that combine the two layers of uncertainty: (1) multiple environments induced by parameters with an unknown distribution; (2) unknown induced environments which are approximated by IMDPs.
We present a novel approach based on scenario optimisation that yields a single PAC guarantee quantifying the risk level for which a specified performance level can be assured in unseen environments, plus a means to trade-off risk and performance.
We implement and evaluate our framework using multiple robust policy generation methods on a range of benchmarks.
We show that our approach produces tight bounds on a policy's performance with high confidence.
Schnitzer, Yannik,Abate, Alessandro,Parker, David, 2024, Certifiably Robust Policies for Uncertain Parametric Environments