Document detail
ID

oai:arXiv.org:2404.05055

Topic
Computer Science - Machine Learnin... Computer Science - Artificial Inte...
Author
Lobo, Elita A. Cousins, Cyrus Zick, Yair Petrik, Marek
Category

Computer Science

Year

2024

listing date

4/10/2024

Keywords
constructing policies sets ambiguity percentile criterion
Metrics

Abstract

In reinforcement learning, robust policies for high-stakes decision-making problems with limited data are usually computed by optimizing the \emph{percentile criterion}.

The percentile criterion is approximately solved by constructing an \emph{ambiguity set} that contains the true model with high probability and optimizing the policy for the worst model in the set.

Since the percentile criterion is non-convex, constructing ambiguity sets is often challenging.

Existing work uses \emph{Bayesian credible regions} as ambiguity sets, but they are often unnecessarily large and result in learning overly conservative policies.

To overcome these shortcomings, we propose a novel Value-at-Risk based dynamic programming algorithm to optimize the percentile criterion without explicitly constructing any ambiguity sets.

Our theoretical and empirical results show that our algorithm implicitly constructs much smaller ambiguity sets and learns less conservative robust policies.

;Comment: Accepted at Neurips 2023

Lobo, Elita A.,Cousins, Cyrus,Zick, Yair,Petrik, Marek, 2024, Percentile Criterion Optimization in Offline Reinforcement Learning

Document

Open

Share

Source

Articles recommended by ES/IODE AI

Diabetes and obesity: the role of stress in the development of cancer
stress diabetes mellitus obesity cancer non-communicable chronic disease stress diabetes obesity patients cause cancer