Stanford CS234: Reinforcement Learning | Winter 2019 | Lecture 2
Markov Property → stochastic process evolving over time(whether or not I investi stocks, stock market changes)
Let $S$ be set of states ($s \in S$) and $P$ a transition model that specifies $P(s_{t+1}=s'|s_t=s)$
for finite number($N$) of states, we get transition matrix $P$
example discussed last section(we abort discussion of rewards and actions for easy understanding)
at state $s_1$ we have 0.4 chance of transfering to $s_2$ ($P(s_1|s_2)$) and 0.6 probability of staying $s_1$ ($P(s_1|s_1)$). Such probability matrix is expressed as $P$ above.
Let’s say we start at $s_1$, we can calculate agent’s probablility of next state by calculating dot product of $[1 ,0,0,0,0,0,0]$ and $P$ above. As result we get $[0.6,0.4,0,0,0,0,0]^T$
for finite number($N$) of states,
$S$ : set of states ($s \in S$)