orbital.algorithm.template
Class RealTimeDynamicProgramming
java.lang.Object
orbital.algorithm.template.MarkovDecisionProcess
orbital.algorithm.template.MarkovDecisionProcess.DynamicProgramming
orbital.algorithm.template.RealTimeDynamicProgramming
- All Implemented Interfaces:
- java.io.Serializable, AlgorithmicTemplate, EvaluativeAlgorithm, HeuristicAlgorithm
public class RealTimeDynamicProgramming
- extends MarkovDecisionProcess.DynamicProgramming
- implements HeuristicAlgorithm
Real-Time Dynamic Programming (RTDP).
Real-Time Dynamic Programming is a variant of asynchronous Dynamic Programming performed
concurrently with the control process.
It uses concurrent value iteration.
If the heuristic function h is admissible h ≤ h^{*}, then the greedy policy v will
eventually become optimal after several cycles of repeated trials.
If h is good, very large problems can be solved.
RTDP permanently uses a real-time dynamic programming variant of value iteration
for the utility function U (alias state-value function V).
U(s) := min_{a∈A(s)} Q_{U}(s,a) = min_{a∈A(s)} (c(s,a) + γ*∑_{t∈S} P_{a}(t|s) * U(t))
depending upon a discount factor γ∈[0,1].
The formula is a dynamic programming update derived from the condition of the Bellman Optimality Equation.
The key fact is that a necessary and sufficient condition for a policy π^{*} to be optimal is that
the expected costs U^{*}(s) that result from starting in state s∈S and acting according to π^{*}
must satisfy a form of the Bellman Optimality Equation:
U(s) = min_{a∈A(s)} Q_{U}(s,a) = min_{a∈A(s)} (c(s,a) + γ*∑_{t∈S} P_{a}(t|s) * U(t))
RTDP can as well be considered a reinforcement learning technique with the costs being
a negative reward R(t(s,a)) = -c(s,a) made dependent on both, the state s∈S and action a∈A(s),
and the task being to minimize costs U(s) instead of maximize utilities U(s).
However, there is a multitude of possibilities of defining the costs in terms of the reward.
RTDP is the stochastic generalization of Learning Real Time Search (LRTA^{*}). For deterministic actions and discounting γ=1
RTDP collapses to LRTA^{*}.
- Author:
- André Platzer
- See Also:
Greedy
,
DynamicProgrammingProblem
,
"A. Barto, S. Bradtke, and S. Singh. Learning to act using real-time dynamic programming. Artificial Intelligence, 72:81-138, 1995.",
"Bellman, R. E. (1957). Dynamic Programming. Princeton University Press, Princeton, New Jersey.",
Serialized Form- Invariants:
- getDiscount()∈[0,1]
Method Summary |
Function |
complexity()
Measure for the asymptotic time complexity of the central solution operation in O-notation. |
protected Function |
plan()
Run the planning. |
Function |
spaceComplexity()
Measure for the asymptotic space complexity of the central solution operation in O-notation. |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
RealTimeDynamicProgramming
public RealTimeDynamicProgramming(Function heuristic)
plan
protected Function plan()
- Description copied from class:
MarkovDecisionProcess
- Run the planning.
- Specified by:
plan
in class MarkovDecisionProcess
- Returns:
- the solution policy function S→A found,
or
null
if no solution could be found.
complexity
public Function complexity()
- Description copied from interface:
AlgorithmicTemplate
- Measure for the asymptotic time complexity of the central solution operation in O-notation.
- Specified by:
complexity
in interface AlgorithmicTemplate
- Returns:
- the function f for which the solve() method of this algorithm runs in O(f(n))
assuming the algorithmic problem hook to run in O(1).
- See Also:
AlgorithmicTemplate.solve(AlgorithmicProblem)
spaceComplexity
public Function spaceComplexity()
- Description copied from interface:
AlgorithmicTemplate
- Measure for the asymptotic space complexity of the central solution operation in O-notation.
- Specified by:
spaceComplexity
in interface AlgorithmicTemplate
- Returns:
- the function f for which the solve() method of this algorithm consumes memory with an amount in O(f(n))
assuming the algorithmic problem hook uses space in O(1).
- See Also:
AlgorithmicTemplate.solve(AlgorithmicProblem)
Copyright © 1996-2009 André Platzer
All Rights Reserved.