Now, I want to revert to the main topic of this specialization, namely Machine Learning and Reinforcement Learning. So, we spoke at length about the LOB and how different models try to address different aspects of its dynamics. Now, a good question would be, if there is any room here for Machine Learning and Reinforcement Learning? As you can already guess, the right answer here should be a resounding yes. This is almost by definition because if you still remember our very first course, all of statistics can also be viewed as special parametric specifications of Machine Learning models. So, Linear Regression is Machine Learning too. You simply can't go wrong with this definition of Machine Learning. So, the real questions are, of course, different, and also, much harder. The first question is, what methods of Machine Learning would actually work best for this setting? If you remember the No Free Lunch theorem, the answer might be far from obvious before you actually try different models. The second question is, what paradigm to take? We can use pure Machine Learning approaches such as Supervised Learning that focus on predictions, or alternatively, we can try to bypass the prediction stage altogether and focus directly on optimization of our actions in LOB. Finally, the last big question is, what additional assumptions we should add to our models? This question can come up in different forms depending on your settings. If you use Machine Learning, for example, Supervised Learning, then you have to decide on the model architecture, which is the same as a parametric function family for your learning. So, for example, if the two dynamics are non-linear but we instead use linear architectures, then, at best, we will be only able to be approximately right in some range of variables. The other place, one you need something beyond your data, is regularization. It's hard to overemphasize or emphasize enough the importance of regularization for learning. We talked about regularization a few times in this specialization. In many cases of practical interest, regularization is a key component of learning rather than some sort of a nuisance. There is a large variety of functions that you can use as regularizers. For example, you can use a likelihood of some parametric model with some fixed data as a regularizer, and then, your non-parametric model would be pulled towards your parametric prior if you increase the regularization strengths. In Bayesian statistics, regularization is obtained from priors. Finally, I would like to mention general arguments similar to those used in physics in the spirit of our previous week. In physics, it turns out that analysis of different symmetries plays a key role in defining dynamics. Another popular approach in physics is to look at analytical properties of models for complex values of parameters. Even though such ideas might sound devoid of any meaning, they actually turn out to be very useful in many models that arise in physics. By extension, they should also be very useful for financial model building. I will come back to these points a bit later, but now, I want to continue with a brief overview of what exists in this space. There is some published research in the literature that explores shallow architectures for our short-term predictions of price changes in the LOB. For example, a paper by Kercheval and Zhang uses support vector machines to this end. There are also alternative approaches such as, for example, Random Forests or Boosted Trees that could be explored. For example, you can look up at the report that I cite on this slide as an example of such work. On the other hand, the problem lends itself quite nicely to Deep Learning, at least in principle. That's because the data size here is very large. This is the case where Deep Learning often works better than other architectures. Also, as we discussed in our first guided tour course, methods such as SVM are hard to scale to big data. So, neural networks seem quite an attractive alternative to shallow methods for these sorts of problems. On the Deep Learning side, a number of different architectures have been suggested. They include, in particular, the so-called spatial neural networks suggested by Justin Sirignano. Other proposals include using, for example, Recurrent Neural Networks as was suggested by Matthew Dixon. There are also other suggestions, for example, to use Convolutional Neural Networks. We will not go into details here, but you're certainly encouraged to look and try these approaches on your own. Instead, I want to talk about using Reinforcement Learning for optimization of order placement in Limit Order Book. One of the first, if not the first, real work application of Reinforcement Learning for optimal trade execution was reported in 2006. It was done by a group headed by Michael Kearns and made of researchers from Carnegie Mellon University and Lehman Brothers. The algorithm they implemented was similar to an on-line Q-learning algorithm. Features were constructed from characteristics of the LOB such as bid-ask volume imbalance or signed transaction volume. The signed transaction volume is simply the number of shares bought within a short interval like 15 seconds minus the number of shares sold within the same period. Performance measure was taken to be the implementation shortfall, which a popular metric for such problems. Implementation shortfall is simply the difference between the bid-ask price at the beginning of the trading period to the average price paid during the execution period. So, if you sell a block of shares, you want to minimize implementation shortfall. Now, more recently, JP Morgan in 2017 announced a first industrial application, as they put it, of Deep Reinforcement Learning algorithm for optimized trade execution, and there were some debates. For example, in this blog article referenced here, whether this is indeed really the first industrial application given that ways to do this were known since the work of Kearns and coworkers. But in any case, the architecture developed by JP Morgan team involved using deep neural networks for encoding the policy function. In addition, temporal differences algorithms seemed to be used in the implementation. This high-level representation of the production system for the Reinforcement Learning was shown in recent JP Morgan presentations. I refer you to these presentations for further details while we will move on to our next topics.