Markov Decision Processes with long duration

Abstract: In a Markov Decision Process (MDP), at each stage, knowing the current state, the decision-maker chooses an action, and receives a reward depending on the current state of the world. Then a new state is randomly drawn from a distribution depending on the action and on the past state. Many optimal payoffs concepts have been introduced to analyze the strategic aspects of MDPs with long duration: asymptotic value, uniform value, liminf average payoff criterion… We provide sufficient conditions under which these concepts coincide, and discuss some open problems. (Joint work with Xavier Venel, Paris 1 University).

Date: Mar 14, 2018 at 14:30 h
Venue: Republica 701, Sala 33.
Speaker: Bruno Ziliotto
Affiliation: Université de Touluse
Coordinator: Prof. José Verschae
More info at:
Event website
Abstract:
PDF

Posted on May 14, 2018 in ACGO, Seminars