Next, we want to specify one period reward, or negative costs in our module. An instantaneous random reward received upon taking such action is obtained by substituting the equation for return, shown as the first equation here, into the equation for the portfolio change that we derived earlier. This produces an instantaneous random reward, received upon taking action ut, given by equation 13. Please note that because of the market temperature in proportional to matrix Mt, this equation is quadratic rather than linear in action ut. The other thing that we can see from this equation is that rewards are risky, as they depend on a noise term, epsilon t. Therefore, we will add the certain risk penalty to compensate for this risk. A risk penalty is simply a negative reward received for taking risk in trades. We will use a very simple specification for such risk penalty, and we'll take it to be proportional to the variance of the instantaneous reward are zero, conditional on the given values of xt and ut, as shown in equation 14 on this slide. Here, lambda is a risk aversion parameter. This variance is easily calculated, and this produces the last expression here. We can see that this is a quadratic function of the sum xt plus ut. The next element we have to include in the one-step reward is the penalty for fees, or transaction costs. The next element we have to include in the one-step reward is the penalty for fees, or transaction costs. Now, transaction costs depend on the sign of ut because traders buy for a bid price, and sell for an ask price. This can be taken into account by using different proportional transactional costs for positive and negative values of ut. And to handle this we represent each component uit, as a difference of two non-negative values uit plus and uit minus. Then ut becomes a difference of these two values, while the absolute value of ut becomes their sum. In other words, ut is equal to ut plus when it's positive, and it's equal to ut minus when it's negative. Now, with these definitions, we can specify, Shown in equation 17 on this slide, where kappa plus and kappa minus are different transaction cost parameters for buy and sell orders. And finally, we have to incorporate market input effects from trading in stocks. Here we will assume a proportional market input that is proportional to the total position in all stocks. It has a few terms. Two terms are proportional to the size of the order, and they can in general have different coefficients, theta plus and theta minus, That would describe such inputs, for positive and negative values of u. In addition, marketing can depend on signals, therefore, we add a third term here, with a weight given by some metrics, feet. Finally, we collect all this contributions together and define the total one-step reward, their sum, as shown in equation 18. Please note that this is a random reward, as the first term here depends on the noise epsilon t. The rest of the terms is non-random for given values of xt and ut. Now, when we are given an action ut, which is equal to the difference of ut plus and ut minus, we can define an expected one-step reward as an expectation of the expression, given by equation 18. This produces equation 19, for the expected reward R hat t. Here our 0 hat is an expectation of the instantaneous reward R0 conditional on a given state and action, as shown in equation 20. We can now simplify this expression and bring it to a more compact form, which will be shown next. So here we use a vector notation to descend, and introduce a vector AT of size two M, which is made by stacking together components ut plus and ut minus. Then the whole expression for the one-step reward can be expressed as a quadratic functional of xt and at, which is shown in equation 21. We have three second order terms here and two first order terms, proportional to xt and at respectively. There is no constant term here because there is no reward without taking an action at, or having a position xt. Please note that matrices Raat, Rxxt, and Raxt depend on risk aversion lambda. They also depend on market input metrics, M and efficients theta. If all of them are equal zero, then the one-step reward becomes a linear functional, instead of a quadratic functional.