14 de May de 2018

Markov Decision Processes with long duration

Abstract: In a Markov Decision Process (MDP), at each stage, knowing the current state, the decision-maker chooses an action, and receives a reward depending on the current state of the world. Then a new state is randomly drawn from a distribution depending on the action and on the past state. Many optimal payoffs concepts have been introduced to analyze the strategic aspects of MDPs with long duration: asymptotic value, uniform value, liminf average payoff criterion… We provide sufficient conditions under which these concepts coincide, and discuss some open problems. (Joint work with Xavier Venel, Paris 1 University).

Comparte en:

Otras noticias