A Policy Gradient Algorithm for the Risk-Sensitive Exponential Cost MDP

Moharrami, Mehrdad; Murthy, Yashaswini; Roy, Arghyadip; Srikant, R.

Electrical Engineering and Systems Science > Systems and Control

arXiv:2202.04157 (eess)

[Submitted on 8 Feb 2022 (v1), last revised 29 Aug 2022 (this version, v2)]

Title:A Policy Gradient Algorithm for the Risk-Sensitive Exponential Cost MDP

Authors:Mehrdad Moharrami, Yashaswini Murthy, Arghyadip Roy, R. Srikant

View PDF

Abstract:We study the risk-sensitive exponential cost MDP formulation and develop a trajectory-based gradient algorithm to find the stationary point of the cost associated with a set of parameterized policies. We derive a formula that can be used to compute the policy gradient from (state, action, cost) information collected from sample paths of the MDP for each fixed parameterized policy. Unlike the traditional average-cost problem, standard stochastic approximation theory cannot be used to exploit this formula. To address the issue, we introduce a truncated and smooth version of the risk-sensitive cost and show that this new cost criterion can be used to approximate the risk-sensitive cost and its gradient uniformly under some mild assumptions. We then develop a trajectory-based gradient algorithm to minimize the smooth truncated estimation of the risk-sensitive cost and derive conditions under which a sequence of truncations can be used to solve the original, untruncated cost problem.

Subjects:	Systems and Control (eess.SY)
Cite as:	arXiv:2202.04157 [eess.SY]
	(or arXiv:2202.04157v2 [eess.SY] for this version)
	https://doi.org/10.48550/arXiv.2202.04157

Submission history

From: Mehrdad Moharrami [view email]
[v1] Tue, 8 Feb 2022 21:35:10 UTC (93 KB)
[v2] Mon, 29 Aug 2022 22:59:35 UTC (652 KB)

Electrical Engineering and Systems Science > Systems and Control

Title:A Policy Gradient Algorithm for the Risk-Sensitive Exponential Cost MDP

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Systems and Control

Title:A Policy Gradient Algorithm for the Risk-Sensitive Exponential Cost MDP

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators