Markov Decision Process

Dynamic programming (DP) is an approach for the modeling of dynamic and stochastic decision problems, the analysis of the structural properties of these problems, and the solution of these problems. Dynamic programs are also referred to as Markov decision processes (MDP). Slight distinctions can be made between DP and MDP, such as that in the case of some deterministic problems the term dynamic programming is used rather than Markov decision processes. The term stochastic optimal control is also often used for these types of problems. We shall use these terms synonymously. [Pg.2636]

Several considerations should be taken into account when choosing the state description, some of which are described in more detail in later sections. A brief overview is as follows. The state should be a sufficient summary of the available information that affects the future of the stochastic process in the following sense. The state at a point in time should not contain information that is not available to the decision maker at that time, because the decision is based on the state at that point in time. (There are also problems, called partially observed Markov decision processes, in which what is also called the state contains information that is not available to the decision maker. These problems are often handled by converting them to Markov decision processes with observable states. This topic is discussed in Bertsekas [1995].) The set of feasible decisions at a point in time should depend only on the state at that point in time, and maybe on the time itself, and not on any additional information. Also, the costs and transition probabilities at a point in time should depend only on the state at that point in time, the decision made at that point in time, and maybe on the time itself, and not on any additional information. Another consideration is that often one would like to choose the number of states to be as small as possible since the computational effort of many algorithms increase with the size of the state space. However, the number of states is not the only factor that affects the computational effort. Sometimes it may be more efficient to choose a state description that leads to a larger state space. In this sense the state should be an efficient summary of the available information. [Pg.2637]

Puterman, M. L. (1994), Markov Decision Processes, John Wiley Sons, New York. [Pg.2647]

Serfozo, R. F. (1976), Monotone Optimed Policies for Markov Decision Processes. Mathematical Programming Study, Vol. 6, pp. 202-215. [Pg.2648]

White, D. J. (1980a), Finite-State Approximations for Denumerable-State Infinite-Horizon Discounted Markov Decision Processes The Method of Successive Approximations, in Recent Developments in Markov Decision Processes, R. Hatley, L. C. Thomas, and D. J. White, Eds., Academic Press, New York, pp. 57-72. [Pg.2648]

More formally, the environment is modeled as a Markov decision process (MDP) with states si,. .., e 5 and actions a, . .with the... [Pg.916]

Semi-Markov decision processes for determining multiobjective optimal... [Pg.617]

Similarly to them, it is allowed one of three decisions 5, = A, M, fJ at each deterioration state i. Moreover, semi-Markov decision processes (SMDP) will be used. [Pg.618]

SEMI-MARKOV DECISION PROCESSES MODEL CHARACTERISTICS... [Pg.618]

Semi-Markov Decision Processes (SMDP) will be used here to tackle the behavior of some systems (specifically, flow control valve) in oil industry since it is assumed that the local time spent at each state influences the system dynamics. For the sake of simplicity, the analysis is accomplished at system level, i.e., failures of valve components are disregarded. [Pg.618]

This paper proposed a multiobjective optimization model based on semi-Markov decision processes and multiobjective GAs for the optimal replacement policy for monitored systems from oil industry. The proposed multiobjective GA with SMDP was validated by means of an exhaustive multiobjective algorithm and was able to find almost all solutions from the true non-dominated set. In addition, the time required to run the multiobjective GA jointly with the SMDP was much smaller than the needed by the exhaustive algorithm. [Pg.624]

Chen, D. Trivedi, K. S. 2005. Optimization for condition-based maintenance with semi-Markov decision process. Reliability Engineering System Safety, 90( 1) 25-29. [Pg.624]

Filar, J. andK. Vrieze. 1996. Competitive Markov decision processes. Springer-Verlag. [Pg.61]

The basic model assumes that the relationship between the principal and a single agent extends over multiple periods. To capture the process dynamics, the model assumes that the agent controls a Markov Decision Process. And to eliminate the complications that arise from history-dependent policies, it assumes that the agent has access to ftictionless capital markets and he can smooth his consumption over time in a ftictionless capital market there are no transaction costs and the interest rate on negative bank balances is the same as the interest rate on positive balances. Overall, the model consists of three major components a physical component for process dynamics, an economic component for the preferences of the two parties, and an information component. [Pg.122]

The Kalman Filter. As in the partially observable Markov decision process we presented above, in order to forecast ftiture demand the forecaster needs to estimate the actual state of the system. To this end, suppose that we would like to compute the minimum mean-square errors (MMSE) estimate of the state Xt, given the history of observations t-2, . , i, ... [Pg.408]

Monahan, G.E. 1982. A survey of partially observable Markov decision processes Theory, models, and algorithms. Management Sci. 28 (1) 1-16. [Pg.446]

Baptiste, P., Le Pape, C. Nuijten, W. / CONSTRAINT-BASED SCHEDUUNG Feinberg, E. Shwartz, A. / HANDBOOK OF MARKOV DECISION PROCESSES Methods and Applications... [Pg.818]

Klabjan, D. and Adelman, D., 2006. Existence of optimal policies for semi-Markov decision processes using duality for infinite linear programming, Siam Journal on Control and Optimization 44(6) 2104-2122. [Pg.391]

Tang, H., Yin, B.Q. and Xi, H.S., 2007. Error bounds of optimization algorithms for semi-Markov decision processes, International Journal of Systems Science 38(9) 725-736. [Pg.392]

Einally, there are a few analytical models in the literature that deal with wind turbines. Byon et al. (2010) consider use a Markov Decision Process (MDP) to determine the optimal maintenance strategy under stochastic weather conditions. To our knowledge, this is the first mathematical model for wind turbine maintenance however, it is in a land-based context. Haddand et al. (2010) use a MDP based on real options for the availability maximization of an offshore wind farm. This model takes a condition-based maintenance approach to identify the optimal combination of turbines to be maintained in the wind farm. Besnard et al (2011) formulate a stochastic model for opportunistic maintenance planning of offshore wind farms. They use stochastic optimization to optimize the planning of service maintenance activities to take advantage of low wind periods in order to reduce production loss while the turbines are offline. Besnard et al. (2013) also... [Pg.1142]

In this paper, we propose a Markov decision process to optimize the optimal maintenance and operational decisions for an offshore wind farm subject to stochastic wind conditions. The maintenance decision consists of a form of selective maintenance to determine which subset of wind turbines, based on their degradation level, to maintain. The operational decision determines the optimal blade speed reduction or braking to use for the online wind turbines. Restricting or reducing the turbine speed will lead to less energy production, but also less deterioration on the system. Furthermore we consider an extended downtime period due to maintenance that takes into account the accessibihty and resource issues related to offshore wind farms. [Pg.1142]

We now present the complete Markov decision process formulation. Let F(5) be the maximum expected discounted cost-to-go over the infinite horizon starting from this state. Then... [Pg.1143]

We have formulated a Markov decision process model to jointly optimize the maintenance and operational decisions of an offshore wind turbine farm. We have incorporated many key aspects of this problem including extended maintenance durations and changing wind conditions. [Pg.1145]

This behaviour, which is similar to a Markov Decision Process (MDP) (White 1993), where we wish to preserve the intention of actors in a process and still enable probabilistic behaviour, can be effectively captured by annotating the possible outcomes of specific decisions with pairs of labels and probabilities We employ the following function to ensure meaningful assignment of these intention preserving probabilistic annotations. [Pg.2408]

The key constructs in the PRISM property specification language, as it applies to Markov decision processes, are the P and R operators. The P operator refers to the probability of an event occurring, more precisely, the probability that the observed execution of the model satisfies a given specification. The R operator is used to express properties that relate to rewards (more precisely, the expected value of a random variable, associated with particular reward structure) and since a model will often be decorated with multiple reward structures, we augment the R operator with a label. For example, to determine the mean time to exhaust the supply of cake filler we would specify the following property ... [Pg.2411]

White, D.J. (1993). Markov decision processes. New Jersey, USA John Wiley Sons. [Pg.2416]

Temizer, S., Kochenderfer, M.J., Kaelbling, L.P., Lozano-Perez, T., Kuchar, J.K. Collision avoidance for nnmanned aircraft nsing Markov decision processes. In AIAA Guidance, Navigation, and Control Conference. American Institute of Aeronautics and Astronantics (2010)... [Pg.48]

Increased analytical power. SMC enables DFTCalc to analyse a wide range of dependability metrics, namely those expressed in a large subset of the logic CSL. Also, as argued in [6], certain DFTs give rise to non-determinism. If so, the 1/0-lMC leads to a continuous time Markov decision process (CTMDP). [Pg.297]

Baier, C., Hermanns, H., Katoen, J.-P., Haverkort, B.R. Efficient computation of time-bounded reachability probabihties in uniform continuous-time Markov decision processes. Theoretical Computer Science 345(1), 2-26 (2005)... [Pg.300]

Condition assessment of civil structures is a task dedicated to forecasting future structural performances based on current states and past performances and events. The concept of condition assessment is often integrated within a closed-loop decision, where structural conditions can be adapted based on system prognosis. Figure 1 illustrates a particular way to conduct condition assessment. In the process, various structural states are measured, which may include excitations (e.g., wind, vehicles) and responses (e.g., strain, acceleration). These measurements are processed to extract indicators (e.g., maximum strain, fundamental frequencies) of current structural performance. These indicators are stored in a database, and also used within a forecast model (e.g., time-dependent reliability, Markov decision process) that will lead to a prognosis on the structural system, enabling optimization of... [Pg.1711]

Methods that will be used in this study are partially derived from well-known methods in the fields of production/inventory models, the queuing theory and Markov Decision Processes. The other methods that will be used, apart from simulation, are all based on the use of Markov chains. In a continuous review situation queuing models using Markov processes can be of much help. Queuing models assume that only the jobs or clients present in the system can be served, the main principle of production to order. Furthermore, all kinds of priority rules and distributions for demand and service times have been considered in literature. Therefore we will use a queuing model in a continuous review situation. [Pg.10]

In order to give a good description of the problem, we shall model it as a Markov Decision Problem (MDP). Markov Decision Processes have been studied initially by Bellmann (1957) and Howard (1960). We will first give a short description of an MDP in general. Suppose a system is observed at discrete points of time. At each time point the system may be in one of a finite number of states, labeled by 1,2,.., M. If, at time t, the system is in state i, one may choose an action a, fix>m a finite space A. This action results in a probability PJ- of finding the system in state j at time r+1. Furthermore costs qf have to be paid when in state i action a is taken. [Pg.37]

Odoni A. (1969), "On Finding the Maximal Gain for Markov Decision Processes", Operations Research, Vol. 17, pp. 857-860. [Pg.157]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...