Back again on the timeline are neural networks, now with even more of an advantage through leaps and computational power and lots of data. DNNs began to substantially outperform other methods on task such as computer vision. In addition to the boon from boosted hardware, there are many new tricks and architectures that helped to improve trainability of deep neural networks like Relus, better visualization methods, CNNs or convolutional neural networks, and dropout. We talked about some of these tricks from other ML methods. The use of non-linear activation functions such as Relus, which usually are set as the default now, we talked about during the first look at neural networks. Dropout layers began being used to help with generalization, which works like ensemble methods which we explored when talking about random forests and boosted trees. Convolutional layers were added that reduced the computational and memory load due to their non-compete connectedness, as well as being able to focus on local aspects, for instance images rather than comparing unrelated things in an image. In other words, all the advances that came about in other ML methods got folded back into neural networks. Let's look at an example of a deep neural network. This exciting history of machine-learning has culminated into deep learning with neural networks containing hundreds of layers and millions of parameters, but with amazing results. Shown here is GoogleNet or inception, which is an image classification model. It was trained for the ImageNet Large visual recognition challenge in 2014 using data from 2012, where it has to classify images across 1000 classes with 1.2 million images for training. It has 22 deep layers, 27 if you include pooling, which we will discuss in a later course, and 100 layers if you break it down into it's independent building blocks. There are over 11 million trained parameters. There are completely connected layers and some that aren't, such as convolutional layers which we will talk about later. It used Dropout layers to help generalize more, simulating an ensemble of deep neural networks. Just like we saw with neural networks in stacking, each box is a unit of components which is part of a group of boxes such as the one I am zoomed in on. This idea of building blocks adding up to something greater than the sum of its parts is one of the things that has made deep learning so successful. Of course, an ever-growing abundance of Data and faster compute power and more memory helps too. There are now several versions beyond this that are much bigger and have even greater accuracy. The main takeaway from all of this history is that Machine Learning Research reuses bits and pieces of techniques from other algorithms from the past to combine together to make ever powerful models, and most importantly, experiment. What is important when creating deep neural networks? The correct answer is all of the above. This is not an exhaustive list, but these three things are very important to keep in mind. First, you need to make sure you have lots of data. There is a lot of research taking place trying to reduce the data needs of deep learning, but until then, we need to make sure we have a lot of it. This is due to the high capacity from the number of parameters that need to be trained in these massive models. Since the model is so complex, it really needs to internalize the data distribution well, therefore it needs a lot of signal. Remember, the entire point of machine learning is not to train a whole bunch of fancy models just because, is to train them so that they can make very accurate predictions. If you can't generalize the new data to predict from, then what good of a model is that? Therefore, once again having enough data is important so that it doesn't overfit to a small dataset that it had just seen a million times instead of a gigantic dataset just seem much less. This also allows you to have a large enough validation and test sets to tune your model with. Additionally, adding Dropout layers/ performing data augmentation, adding noise, etc is the way that you can have even better generalization. Lastly, Machine Learning is all about experimentation. There are so many different types of algorithms, hyperparameters, and ways to create your Machine Learning data set these days. There really is no way of priority to know the optimal choices from the start for almost all problems. By experimenting and keeping careful track of what you've tried already and performance metrics to compare models across, you not only will have a lot of fun, but also will create some amazingly powerful tools. Next, I'll talk through a bit more on how neural networks continue to build on performance of past models. Here you see the performance of specific model versions of deep neural networks over the years. As you can see in the chart, a significant jump came in 2014, which is highlighted in blue, where Google's inception model broke through that 10 percent error rate with a 6.7 percent. The performance of DNNs continues to improve with each passing year and learn from the lessons gain from prior models. In 2015, a version 3 of the Inception Model scored a 3.5 percent error rate. So what makes this models improve so drastically over a short span of time? Often when a research group develops a new technique or method that works very well, other groups then take those ideas and build off of them. This provides a significant jump forwards and experimentation so that progress speeds up. This can involve better hyperparameters, more layers, better generalizability, better subcomponents like convolutional layers, etc. Explain how you would apply ML to the problem? There could be more than one right answer. You own a winter ski resort and want to predict the traffic levels of ski runs based on four types of customers: beginner, intermediate, advanced, and expert that have bought tickets and the amount of previous snowfall. Take a moment to write an answer now. This could be regression or classification, since I didn't specify what exactly I mean by traffic levels. Do I mean the number of people who use that ski run per hour or do I want something more categorical such as high, medium, and low? For this, I would begin with a base heuristics such as the average number of people on each slope and then move onto base models of linear or logistic regression depending on if I decided to go to regression or classification route respectively. Depending on performance and amount of data, I would then probably move on to neural networks. If there are other features in the data, I would also try those and monitor performance. Internally at Google, there are, at last count, over 4,000 Production deep ML miles powering their systems. Each of these models and versions gets performance benefit of building on the successes and failures of past models. One of the most widely used early on was Sibyl, which was originally created to recommend related YouTube videos. This recommendation engine works so well it was later incorporated widely into ads and other parts of Google. It was a linear model. Vizier was another model which ended up becoming the de facto parameter tuning Engine for other models and systems. Google Brain, the ML research arm within Google created a way to harness the computational power of thousands of CPUs to train large models like deep neural networks. The experience building and running these models is what shaped the creation of TensorFlow, an open-source library for machine learning. Google then created TFX or the Tensor flow based Machine Learning platform and we'll show you how to build and deploy Production ML models using TensorFlow and tools like Cloud ML Engine, Dataflow, and BigQuery. To recap, the last few decades have seen a proliferation in the adoption and performance of neural networks. With the ubiquity of data, these models have the benefit of more and more training examples to learn from. The increase in data and examples has been coupled with scalable infrastructure for even complex and distributed models with thousands of layers. One note that we'll leave you with is that although performance with neural networks maybe great for some applications, they are just one of many types of models available for you to experiment with. Experimentation is key to getting the best performance using your data to solve your challenge.