In Module 2, we created and trained classification models. In this module, we will construct a loan portfolio with the help of the models. Please open the product notebook and be prepared to pass the video and practice in the notebook during the lecture. Let's go to Module 3 section. If you just open the notebook, don't forget to select a code cell in Module 3 and run all above cells. The classification model predicts whether a loan will be fully paid or not. But for loan portfolio, we care more about the portfolio return than the repaid rate. So we'll first define a function to calculate portfolio return where we'll use two columns in the data set; funded amount and total payment. The total return is calculated with total payment divided by funded amount minus one. But since there are two different terms in the data set, total return is not an accurate benchmark so we will use annual return to evaluate a portfolio. This is the formula to calculate annual return based on total return in long-term year. We will define a function to display annual return of loans with different terms in our portfolio. In this function, we first calculate annual return of all 36 months loans in the dataset, then calculate the annual return of all 60 months loans. We can then use this function to display annual returns of the whole test set. In the test set, the 36 months loans have 2.69 annual return and 60 months loans have 2.74 percent annual return. Please be aware that this is not the true annual return because this calculation assumes the total payment is paid at the end of the loan term. The loans are paid in monthly installments, so the real return is actually much higher than what we see here. But since we just need a benchmark to evaluate loan portfolios, we can accept this simplified return calculation. Now, let's construct our loan portfolio with the help of the classification model. We will demonstrate this process with the Random Forest Classifier we created in Module 2. These stuff are copied from Module 2. We'll first predict on the test set and get a list of predictions. Then we will create a new column prediction in DF test and set it to the prediction made by the model. Then we construct our portfolio from the test set with loans that are predicted as one. Let's take a look at our portfolio. This portfolio only contains loans with predictions equals to one. The repaid column is the true status. We can see that majority of them are fully repaid. The repaid rate of the test set is about 85 percent and the repaid rate of our portfolio is about 88 percent. Now, let's compare annual returns of the test set in our portfolio. First, we will display the annual return of the test set, then we will display an annual return of our portfolio. You can see that our portfolio have better annual return from both 36 months loans and 60 months loans. You may have noticed that in the demo, we don't include long-term in the training feature. But as you saw in your model one analysis, 36 months loans have much higher repay rate than that of the 60 months loans. We can check it again here with group-by. We can see that 36 months loans have 88 percent repaid rate, and 60 months loans have 76 percent repaid rate. It seems natural to include long-term in the training feature. Let's add loan_term_ year to model_columns and see what would happen. In the new portfolio, return of 36 months loans drops a bit, and a return of 60 months loans increases a bit. But the biggest impact is that in the new portfolio, there are about 10 times more 36 months loans than that of 60 months loans. This ratio is about three to one in the test set. What have caused this imbalance in the new portfolio? The reason is that 36 months loans have much higher repaid rate, so when you set class_weight to balanced, the model will overwhelmingly favor 36 month loans. One possible way to get a more balanced portfolio and still use long-term information is to split the dataset to two subsets, one for 36 months loans only, and one for 60 months loans only. Then you can train and predict with two classification models, then select 36 months loans and a 60 months loans separately. Once we select loans by the classification model, we can improve the portfolio with some filters. For example, there's a column pub_rec_bankruptcies, which is the number of previous public record of bankruptcies. Majority of the loans have zero value in this column. If we check annual return of the loans with and without public record of bankruptcies, we can see that loans with public record bankruptcies have much lower return. We can simply filter out all loans with public record bankruptcies from our portfolio. Our new portfolio has a little better return from both 36 months loans and 60-month loans. In model 3, you will need to construct your long portfolio with the logistic regression models created in model 2. Instead of long-term, you will need to check the impact of long grade. Loan grade is assigned by Lending Club to a loan. We can see that higher great loans have higher re-pay rate, but it doesn't necessarily mean they have higher return since higher great loans also have lower average interest rate. Here we check annual return of all grade A loans. We can see that even though grade A loans have 94 percent of re-pay rate, they have lower annual returns from both 36 months loans and 60 months loans comparing to the whole test set. In task 3.2, you will add grade code to model columns and train your logistic regression model, and then construct a new portfolio with the prediction. Evaluate the portfolio return, and think what is the reason of the change and whether you should include grade code in the training feature. In task 3.3, you were tasked with more extreme class weight. A balanced class weight set weights inversely proportional to class frequencies in the input data. In the dataset, about 85 percent of loans are fully paid or class one. So balanced class weight is roughly equivalent to set the class 0-0.85 and class 1-0.25. Try set some more extreme class weight than balanced, like setting class 0-0.9 and class 1-0.1 and discuss the impact of the change. In task 3.4, you will need to develop some more futures that could improve your portfolio return. Once you've finished all the tasks in module 3, you can work on the model 3 peer review assignments. Your assignment will be reviewed by your peers, and you will also review assignment of at least two of your peers. Lastly, I'd like to spend a little time on how you would use your analysis to accurate choose open loans to invest on Lending Club. Let's log into Lending Club as investor. Then select Manual investing. Then we can download all current over loans. Then we can load this dataset into a DataFrame. Repeat the data preparation we have done in module 2. Choose the columns we used to train the model and then predict with our classification model on the dataset. Then apply whatever futures we come up with to improve the portfolio. We can then go back to Lending Club, select the loans there are in the portfolio determined by our analysis. If you have done a good job in your analysis, you should expect a better return from your loan portfolio than average investors on Lending Club.