Modular Value Iteration Through Regional Decomposition (Gisslen et. al.)

Added by Deon Garrett about 6 years ago

Future AGIs will need to solve large reinforcement-learning problems involving complex reward functions having multiple reward sources. One way to make progress on such problems is to decompose them into smaller regions that can be solved efficiently. We introduce a novel modular version of Least Squares Policy Iteration (LSPI), called M-LSPI, which 1. breaks up Markov decision problems (MDPs) into a set of mutually exclusive regions; 2. iteratively solves each region by a single matrix inversion and then combines the solutions by value iteration. The resulting algorithm leverages regional decomposition to efficiently solve the MDP. As the number of states increases, on both structured and unstructured MDPs, M-LSPI yields substantial improvements over traditional algorithms in terms of time to convergence to the value function of the optimal policy, especially as the discount factor approaches one.

paper_66.pdf (372.6 kB)


Replies (2)

RE: Modular Value Iteration Through Regional Decomposition (Gisslen et. al.) - Added by Emma Watson 3 months ago

In Artificial Systems sometimes we need to use different kind of frameworks for better results. The weblink always tries to help me whenever I am going to make a framework. Because always you need new things.

RE: Modular Value Iteration Through Regional Decomposition (Gisslen et. al.) - Added by Herbert Klein 2 months ago

This is the good idea to fix the issues. As per the british essay writing service, it is better way to separate it with modules. The problems will be solved easily.

(1-2/2)