All the work we've done in the past few videos has provided a great example of how visualizations can help you narrow down on what statistical models are worth your time. This is not a course in statistics, so I don't wanna get bogged down in the details, but I did want to show you how some of the simple statistical capabilities of Tableau can help you move through your data analysis faster. If you can get a statistic to help you decide whether you should take an effect into account, for example, it could be very useful. This exercise is a good example of how you can use visualizations to help you narrow down what statistical models are worth your time. Now this is not a course in statistics so I don't want to get bogged down in details but I do wanna give you an example of how some of the simple statistics, capabilities of Tableu can also help you move through your data analysis faster. So if you can get a statistic that helps you decide very quickly whether or not you should take an effect into account, that can be very useful, for example. So the statistic that I'm gonna show you how to do in Tableau is called regression. And to show you what that is we're gonna change this graph. I don't want the different points to be connected by a line anymore, I'd like you to change it so that you are looking at just individual marks. And do that for each one please. Okay. So a regression is a process that tries to take a line and, to the best of its ability, makes a line that goes through these different dots. So it's trying to minimize the amount of error between the line and each one of these dots, okay? And the line is called the best fit line or a turn line so Tableau can do this regression very quickly actually so if you right click, you'll see Trend Lines, and you can see Show Trend Lines. Now what this is showing here, this dark black line is that the best fit line I'm telling you so it's. The best job that a regression can do to make a line that connects all of these dots. These outside curvy lines are the confidence intervals. In this case it's a 95% confidence interval. And those are basically ways of letting you know how reliable this line is. The way to interpret 95% confidence intervals is that everything within these lines here has a 95% chance of containing the mean, okay? Now of course, so this is a visual way of getting an idea of how confident you should be in this line but, the whole point of using statistics is to get an actual number, so, that it can make the decision for you about whether or not this line is something that you want to pay attention to. So, Tableau gives you those numbers as well. So, the fastest way to do this, is if you hover over the line, you'll see it actually gives you the equation for the best fit line. I don't wanna go through the details of what the equation means. I have included some resources with this video in case you're interested in learning more. There's certainly a lot more to learn and regression is very interesting. But for right now, the main thing I want you to pay attention to is that part underneath it that's called the P-value. The P-value indicates how likely it is that the observed trend happened by chance, or is indeed reliable. The smaller the P-value, the more likely it is that the conclusions that you would draw from looking at this line are indeed reliable and something that you should trust. Now the convention is that you should only trust trend lines with P-values that are less than either 0.05, or 0.01. Now when you're using a P-value of 0.05, that means that there's a false positive rate of 5% or less. If you're using a P-value of 0.01, that means there is a false positive rate of 1% or less. So that gives you an idea of the things that you should consider important for your analysis. Now, this particular line has a P-value of 0.07. So one way to decide whether or not this effect is important for your analysis would be to say well, it's not under 0.05, so I'm not gonna worry about it too much. Maybe I'll put it in the back of my head, but I'm not gonna worry about it too much. So now that you know how to do this, I wanna do this for each one our job sub categories. And before we do that I wanna go ahead and get rid of these confidence intervals because it will make the graphs look pretty busy. So if you right click on the line and go to Edit Trend Lines and you see here that show confidence bands is clicked. I want you to unclick that and press OK. And now you see that there's no confidence interval. So let's go ahead and let's add trend lines to these other graphs. And now we're gonna look at these lines for each individual subgroup, job title subgroup. So now take job title subgroup and bring it over to color on the marks card and do that for each of your graphs. Now once you've done that for all three graphs, you can start looking at the data for each subgroup independently. Now as before, this is very busy, but we can use that highlighting trick to look at each one more simply. So let's look at just the data for data scientist. Now when you first look at this you might be thinking, okay this line is pretty flat, this one doesn't seem to be changing, and this one definitely seems to be going down. But you should always remember to look at the axis. This axis goes from zero to a little over 100,000. This one goes to, over two million. This one goes to about 150. So this line going up is actually, goes up way more than it looks like here. So, the best way, you can either change this axis, so that it's the same as it is for this or you can hover over the line to be able to see whether or not it is a significant trend. And what you're seeing here is for the data scientist indeed it does look like the maximum paid wage is going up over time. Even though this line looks like it's steeper than the one above it, it actually is not significant. That's because again, the axis is not the same when there's two different graphs. It looks like for data scientist, the maximum paid wage is going up, but the median is not. For a software engineer, the maximum paid wage is going up over time, and the minimum paid wage is going down over time. Overall, again, the median paid wage is not changing. Data analyst can see that the minimum paid wage is going down overtime, whereas the maximum is staying basically the same. Again, the median is not changing. And for business analyst again, the minimum is going down, the maximum is stranding up, but it's not significant. And the median paid wage is not changing at all. So according to this analysis it does look like the minimum salaries for Software Engineers, Data Analyst and Business Analyst are decreasing over time. Even though the median paid wages are remaining exactly the same for all of these different categories. The maximum paid wage for Data Scientists and Software Engineers on the other hand are increasing. So overall that suggests that there might be increasing amount of opportunity to be awarded with very high salaries in data scientist jobs and software engineering jobs, but overall the typical salaries for data related jobs seem to be staying quite constant. And for every category other than data scientists, as time goes on, there might actually be increased risk that you will make less than you expected. So that was an interesting result to get from our data. Now there's one more thing I wanna show you about trend lines, even though it's not related to our data question, cuz I think it might help you later. And right now you'll see, you saw that it makes trend line for each sub-group independently. Now you can do this so that it makes a trend line for the entire group as a whole, but still shows the data for each group independently. And to do that, go to Edit Trend Lines, and you'll see that it says here Allow a trend line per color. If you unclick that, this is a trend line. This black line is a trend line for all of the data aggregated together, even though it's showing you all the different groups. So this can be something that's useful in the future. For our purposes though, these individual trend lines are more useful for actually analyzing our data and figuring out what they mean. Nice job in this lesson, we actually did quite a lot in this video. First you learned about how to make line graphs. You learned about leaned about hierarchies. You learned that dates can be considered either continuous or discrete variables in tableau. You learned about the effect of treating dates as either continuous or discrete in tableau. You learned how to make regression, or best fit lines, and how to interpret them. And you learned some interesting insights about salaries of data related jobs. Did a lot of work today.