View RSS Feed

Admin

Neural networks made easy (Part 70): Closed-Form Policy Improvement Operators (CFPI)

Rate this Entry
by , 08-08-2024 at 08:30 AM (248 Views)
      
   
The approach to optimizing the Agent policy with constraints on its behavior turned out to be promising in solving offline reinforcement learning problems. By exploiting historical transitions, the Agent policy is trained to maximize a learned value function.

Behavior constrained policy can help to avoid a significant distribution shift in relation to Agent actions, which provides sufficient confidence in the assessment of the action costs. In the previous article we got acquainted with the SPOT method, which exploits this approach. As a continuation of the topic, I propose to get acquainted with the Closed-Form Policy Improvement (CFPI) algorithm, which was presented in the paper "Offline Reinforcement Learning with Closed-Form Policy Improvement Operators".
more...

Submit "Neural networks made easy (Part 70): Closed-Form Policy Improvement Operators (CFPI)" to Google Submit "Neural networks made easy (Part 70): Closed-Form Policy Improvement Operators (CFPI)" to del.icio.us Submit "Neural networks made easy (Part 70): Closed-Form Policy Improvement Operators (CFPI)" to Digg Submit "Neural networks made easy (Part 70): Closed-Form Policy Improvement Operators (CFPI)" to reddit

Tags: None Add / Edit Tags
Categories
Uncategorized

Comments