Abstract: In a Markov Decision Process (MDP), at each stage, knowing the current state, the decision-maker chooses an action, and receives a reward depending on the current state of the world. Then a new state is randomly drawn from a distribution depending on the action and on the past state. Many optimal payoffs concepts have been introduced to analyze the strategic aspects of MDPs with long duration: asymptotic value, uniform value, liminf average payoff criterion… We provide sufficient conditions under which these concepts coincide, and discuss some open problems. (Joint work with Xavier Venel, Paris 1 University).
Venue: Republica 701, Sala 33.
Speaker: Bruno Ziliotto
Affiliation: Université de Touluse
Coordinator: Prof. José Verschae



Noticias en español
