Hi, I'm Chris Vargo, back again here telling you more about supervised machine-learning. Again, I'm so excited to talk about how we can use supervised machine learning to produce really good models that can help us classify text documents. We've talked about deep learning, but we haven't actually talked about what a Neural network is. Underneath the hood of every deep learning algorithm is a Neural network. Neural networks are one type of machine learning algorithm that have been around really since the 1950s, but really have only become popular recently due to advances in computing power, availability of computing power, along with improvements in their design by researchers in multiple disciplines including computer science, cognitive psychology, neuroscience, and other disciplines like information science. Deep-learning uses Neural networks instead of linear regression approaches although I still think conceptually their processes are somewhat similar. They use Neural networks to solve supervised machine learning tasks. Neural networks are a type of machine learning model that is inspired by the way that Neurons in our brains work. They can be used to solve classification in regression problems but they're especially useful for solving complex non-linear problems where traditional linear based regression models fail. I recently stumbled upon this book for my toddler. I think it's a great description of what Neural networks are from a very high level, so we're going to read it. Support the authors and buy a copy if you want to buy a funny book for your toddler or a toddler that you might know. They're really fun, and really my daughter tried to start reading this book when she was two, and she can now recite it word for word. We've got a young data scientist in the family. I've read it to Charlotte a bunch. Let me just summarize it for you. Neural networks are made up of layers, and each layer is a set of Neurons. The input layer receives the data to be classified or predicted, and then passes it on to a series of hidden layers, which consists of more Neurons that apply more mathematical functions, referred to as activation on top of what they've received from the previous neuron layers. This process reaches a point to where it outputs into an output layer where our predictions are made based on what has passed through all the layers. The number of Neurons in each layer is determined by the amount of data that we have and how much detail we want to capture. The more layers, the more complex our model will be. Neurons in each layer are connected to Neurons in the next layer. That connection is called a synapse, and it's what allows us to capture more complex relationships between our input data and output predictions. Each neuron receives an activation from its previous neuron and applies some type of mathematical function on top of that value. It then passes that new value alone as its own activation for all subsequent Neurons down the line until we reach a final prediction or classification. In order for Neural networks to learn how to best classify or predict something based on past examples, they need feedback about their performance so far, ie, whether they're right or wrong about their current model, prediction, classification, etc. This is done by comparing what was predicted against to what actually happened known as the ground truth, and this is really the same as supervised machine-learning. If there was a mistake made somewhere along the way in this process, we can adjust our model accordingly so that future predictions will be better informed than before hence why these models are said to be self-learning. The Neural network will then compare its prediction, the box with the labels, and adjust the model so that it improves future predictions, ie it will learn which features and characteristics best distinguish the starfish in this example. In all, Neural networks are a natural extension of linear regression. The key difference is that instead of using coefficients, we use weights and biases to model the relationship. Transfer learning moved Neural networks and deep learning even further forward. They specified the size and the shape of the layers, but they also transferred over language models. That is, an understanding of how humans use words together. Transfer learning is a technique that allows you to use the knowledge of one Neural network and apply it to another. Transfer learning is a form of machine learning work and algorithm learns how to do something, by studying examples, and then applies what it's learned to new situations. It's like teaching someone how to play chess by showing them lots of games between grandmasters rather than having them read books about the rules or memorize, opening and end games. The idea is that you can teach an AI something about a specific task, such as playing chess or recognizing images, and then it should be able to get up to speed on other tasks more quickly because they all rely on similar concepts. In this practice, this means training your model with large datasets from which we extract features for our relevant problem at hand. These features are used at training time when building our Neural Network Architecture. This way, we can have less training data but still achieve good performance results. What are the most common transfer learning models in NLP, and how are they created? Models such as Stanford's glove, Google's Word2Vec, and Facebook's fast texts, were created by taking large corpuses of texts, such as Wikipedia, and using them to create a neural network and word vectors. The resulting model is then used for classification tasks such as sentiment analysis, or text classification, as we're going to do here in this class. Language modeling is a task where you take some input and then predict the next word in the sequence. Take a look at this screenshot. It's a great example of how a supervised machine learning algorithm responds using transfer learning. It knows what word is most likely to come next. Because of this knowledge of the human language and correlations, BERT is a new language model that has recently been used and quickly became, the current state of the art. We have Google to thank for this, and we're going to use BERT in this class. Put simply pre-trained neural networks like BERT come pre-configured with an understanding of the relationships of language. They know what terms are most likely to come next given what is typically written in that language, just like auto suggests on your phone, they know what words are more likely to come next given a sequence, and they know the relationships that those words tend to have with each other. This is a huge advantage for neural networks. It allows them to learn language much faster than traditional methods. It also means that they can be used in situations where the dataset is small or not well defined. A word embedding is just like any other embedding, except instead of taking numbers or strings, they take words. These are useful because they allow us to represent words with vectors. We don't have to worry about spelling mistakes or typos when doing machine learning translation in text classification. We only need to worry about what goes in and what comes out. Models such as Stanford's glove or Google's Word2Vec and Facebook's fast texts were created by taking a large corpus such as Wikipedia and using it to train a neural network on the word vectors. Then the resulting model is then used for tasks such as classical sentiment analysis. A language model is a task where you take some input from the text and predict the next word in the sequence. This is used for tasks such as speech recognition. That's why Alexa knows what you say even though it might not get every word, machine translation and language generation such as chatbots. BERT is a new model that has been released by Google. It is the state of the art language model and outperforms all other models on most tasks. Check out BERT in the model hub to see a ton of pre-trained models that are really good at doing different tasks and what are trained in different languages. Having face transformers is also a set of pre-trained models that can be used to do language learning, they were trained on WikiText-2 Data, which is a collection of Wikipedia articles in English. The transformer model takes an input sequence and outputs the next word in the sequence with probabilities for each possible word. This allows you to use it as your own neural network architecture by using it as an embedding layer. This all sounds so sophisticated, but in reality it's just a few lines of code. We are leveraging the hard work of others, so that we have a general understanding of the semantic relationship that words have with each other. So let's dive into some supervised classification examples. Think about chat bots as a classification problem. We need to decide if a customer message warrants a specific response. For example, responding to a message with the pricing for a product. To do this, we need to give the model training data, that is times where customers sent messages in the chat and asking for prices. These will be labeled as ones in our training data, and the rest of the messages will be labeled as zero. If we get enough examples, the model should be able to learn when the customer is asking for pricing information and then we are able to return an answer from an FAQ document that gives them exactly what they're looking for. Here's another example, chat bot AI. You can see here that it's looking for keywords in consumer texts to decide what it says next. Deep learning is used on all different types of data, photos, videos, audio, you name it. People have tried to use deep learning on those data sets. Photo classification is another common example of deep learning. Here we have images that were labeled to contain various automakers logos. The input is the image presented in their raw byte format. The model learns what bytes inside of that image correlate to different labels, and then it can do it automatically. One major advantage to supervised deep learning is that most deep learning algorithms handle the pre-processing of the data automatically. We, as humans, can't create or annotate images like we can text. Luckily, new advances in Google's TensorFlow package handle that for us. All we have to do is pass the images and their labels onto the computer and we let them model decipher the image and figure out what it is about that image that actually creates the desired output. So this is an example of a deep learning algorithm, and the pieces of paper with holes in it that we see here are the layers of our supervised deep learning algorithm. Using Google's deep learning solution is crazy easy to do. You can actually give it a try with no code using Google's teachable machine and if you have five minutes and some images that are labeled, you can really create a working deep learning algorithm that given an image will tell you if the thing that you are interested in is present. Google, Apple, and Facebook all have image AI trained and deployed on their tech stacks. Every image that you take on an iPhone is processed for classification using Apple's photo AI. The quality of these algorithms have increased exponentially over the last five years. You can see here this image is classified as running a marathon with high probability. Google does it too. This is a Google Photos search for the term whiskey from my photos. What I'm particularly impressed with here is the model's ability to detect whiskey in all its forms, in a glass, in a fancy glass, in a bottle. This AI knows whiskey. Recently, social media listening tools have adopted deep learning. If you, as a marketer, wants to try to find every time your brand is featured in an image or on social media, even when you're not tagged about it or have no other way of knowing about it, you can now use AI to detect your logo in millions of Facebook posts or tweets. It's all using deep learning. All we need to do is upload a collection of the Hard Rock Logo, and if we upload enough of those logos, eventually the machine learning algorithm is going to understand exactly what it is about that logo that makes it unique and when it sees it in the wild in a photo, it will classify it as such.