RealTimeDynamicProgramming (Orbital)

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

Orbital library

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

orbital.algorithm.template
Class RealTimeDynamicProgramming

java.lang.Object
  orbital.algorithm.template.MarkovDecisionProcess
      orbital.algorithm.template.MarkovDecisionProcess.DynamicProgramming
          orbital.algorithm.template.RealTimeDynamicProgramming

All Implemented Interfaces:: java.io.Serializable, AlgorithmicTemplate, EvaluativeAlgorithm, HeuristicAlgorithm

public class RealTimeDynamicProgramming
extends MarkovDecisionProcess.DynamicProgramming
implements HeuristicAlgorithm
extends MarkovDecisionProcess.DynamicProgramming
implements HeuristicAlgorithm

Real-Time Dynamic Programming (RTDP).

Real-Time Dynamic Programming is a variant of asynchronous Dynamic Programming performed concurrently with the control process. It uses concurrent value iteration.

If the heuristic function h is admissible h ≤ h^*, then the greedy policy v will eventually become optimal after several cycles of repeated trials. If h is good, very large problems can be solved.

RTDP permanently uses a real-time dynamic programming variant of value iteration for the utility function U (alias state-value function V).

U(s) := min_a∈A(s) Q_U(s,a) = min_a∈A(s) (c(s,a) + γ*∑_t∈S P_a(t|s) * U(t)) depending upon a discount factor γ∈[0,1]. The formula is a dynamic programming update derived from the condition of the Bellman Optimality Equation. The key fact is that a necessary and sufficient condition for a policy π^* to be optimal is that the expected costs U^*(s) that result from starting in state s∈S and acting according to π^* must satisfy a form of the Bellman Optimality Equation: U(s) = min_a∈A(s) Q_U(s,a) = min_a∈A(s) (c(s,a) + γ*∑_t∈S P_a(t|s) * U(t))

RTDP can as well be considered a reinforcement learning technique with the costs being a negative reward R(t(s,a)) = -c(s,a) made dependent on both, the state s∈S and action a∈A(s), and the task being to minimize costs U(s) instead of maximize utilities U(s). However, there is a multitude of possibilities of defining the costs in terms of the reward.

RTDP is the stochastic generalization of Learning Real Time Search (LRTA^*). For deterministic actions and discounting γ=1 RTDP collapses to LRTA^*.

Author:: André Platzer
See Also:: Greedy, DynamicProgrammingProblem, "A. Barto, S. Bradtke, and S. Singh. Learning to act using real-time dynamic programming. Artificial Intelligence, 72:81-138, 1995.", "Bellman, R. E. (1957). Dynamic Programming. Princeton University Press, Princeton, New Jersey.", Serialized Form
Invariants:: getDiscount()∈[0,1]

Nested Class Summary

Nested classes/interfaces inherited from class orbital.algorithm.template.MarkovDecisionProcess
`MarkovDecisionProcess.DynamicProgramming`

Nested classes/interfaces inherited from interface orbital.algorithm.template.HeuristicAlgorithm
`HeuristicAlgorithm.Configuration, HeuristicAlgorithm.PatternDatabaseHeuristic`

Nested classes/interfaces inherited from interface orbital.algorithm.template.EvaluativeAlgorithm
`EvaluativeAlgorithm.EvaluationComparator`

Constructor Summary
`RealTimeDynamicProgramming(Function heuristic)`

Method Summary
`Function`	`complexity()` Measure for the asymptotic time complexity of the central solution operation in O-notation.
`protected Function`	`plan()` Run the planning.
`Function`	`spaceComplexity()` Measure for the asymptotic space complexity of the central solution operation in O-notation.

Methods inherited from class orbital.algorithm.template.MarkovDecisionProcess.DynamicProgramming
`createMap, getActionValue, getDiscount, getEvaluation, getGreedyPolicy, getHeuristic, maximumExpectedUtility, setDiscount, setHeuristic`

Methods inherited from class orbital.algorithm.template.MarkovDecisionProcess
`getProblem, solve`

Methods inherited from class java.lang.Object
`clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Methods inherited from interface orbital.algorithm.template.HeuristicAlgorithm
`getEvaluation, getHeuristic, setHeuristic`

Methods inherited from interface orbital.algorithm.template.AlgorithmicTemplate
`solve`

Constructor Detail