[MUSIC] In previous classes you learned about the machine learning process life cycle or MLPL. We've also emphasized the importance of good, clear problem definition. In this video we'll go into detail about the first stage of the MLPL, Business Understanding and Problem Discovery, or BUPD. As we walk you through all of the stages keep in mind, the viewpoint is from a machine learning team working with an industry-specific client. But the client can be anyone, including yourself. It's the person or group at the core of the problem being solved. The structure for internal projects requiring cross-functional team members is the same. Let's dive into Business Understanding and Problem Discovery, or BUPD. The BUPD stage is where we develop a specific understanding of the business processes. And where we try to concretely define the problem we'll be working on. There's only high-level focus on data in this stage. Among the four stages of the MLPL, this stage is primarily about strategically aligning business process with the execution of the project. Correctly understanding the business functions can save you a lot of time down the road. Taking your time in this stage is a good idea so that you don't jump into a machine learning solution that builds on incorrect assumptions. And does not align with the business problems you care about. We break up the BUPD phase into several parts. You can think of these as a checklist in no particular order. Let's start with objectives. This is where you want to identify business objectives that machine learning techniques can address. Most commonly, you work with the business to gain an understanding of the processes, to try to see where machine learning can help improve an existing business process. In this case, something's being done already and you want to improve it in some way using machine learning. Another option is that the business thinks there may be some new innovation or service they want to provide that somehow uses machine learning. Once you and the business are aligned on objectives, you need to consider the stakeholders. And sort out how everyone will communicate and how often. When it comes to stakeholders, you need to identify both the internal and external stakeholders and exactly what their roles will be. For communication you need to decide how everyone will collaborate on the project. This may look different from business to business. We'll go into more details about defining development environments and collaborative environments later in the video. You also need to agree on how often to meet. As a rule of thumb often is better, especially at the start. Because you don't want to spend a lot of time working on wrong assumptions. It might be a good idea to meet every week. If you meet every two weeks or monthly you're going to spend a lot of time offline, potentially assuming something that turns out to be wrong. It's better to meet more frequently, clarifying any assumptions that you have. It's better to get the information directly from the client, if you can. Because they can directly answer the questions you have. This leads us to existing practices, where you identify what business processes or practices are already in place. When it comes to exploring the existing practices, you first need to make sure that you and the business are aligned in your terminology. The business won't understand your machine learning jargon, and you won't understand the domain-specific jargon. It's important to clarify a common vocabulary. To achieve this common language, it's typical for the machine learning scientists to try to adapt to the business domain. Instead of the business domain adapting to the machine learning. You'll notice there's an emphasis on domain knowledge in the BUPD phase. Let's look at an example where the business is a medical laboratory that manually process has a lot of chest X-rays. They want to think about how to somehow improve or automate some aspect of this process. The medical lab decides they want a model that can look at chest X-rays and give a probability risk score for several diseases. This is the bare, high-level initial problem that they want to tackle. In order to tackle such a problem, the machine learning scientist, with the help of the client and domain experts, has to understand exactly what the client wants as a final outcome. And then understand from the very initial stage of the X-ray being created all the individual processing steps that the business currently goes through. In particular, the machine learning scientists should think about how and why distinct X-ray images might differ. Even for the same patient, different X-ray machines might produce slightly different images. Different doctors may have differing opinions about certain disease markers. It's important that you keep such differences in mind and account for them if need be. As a machine-learning scientist, you should also try to understand how having a particular disease would show up in the X-ray. You're a machine learning scientist though, not a medical doctor. You might not have a clue about the medical space. That's okay. You can and should rely on the domain experts to help you explore such details. Talk to them to get an understanding of what phenomenon corresponds to which specific diseases. As a human, can you see the X-ray image and distinguish the markers for a particular disease? Of course, this is a specific example from the medical field. But the point is this, domain knowledge is vital to a successful machine learning project. And it's good to get an overview of the actual processes in place pre machine learning. Along with understanding existing practices, you'll also want to work with the business to clearly define what milestones you want to reach. What are the timelines for these milestones and what exactly will the deliverables look like. Keeping an eye on the overall final outcome of the project you want to work with the business to define what deliverables you want out of the BUPD stage of the machine learning project life cycle. What concrete things will come out of this stage? Possibly a report describing all the procedures that are involved in the business process. As well as a report describing the machine learning version of the problem that's going to be solved. After all there's a business problem and an equivalent machine learning solution. You need to outline what both of these are and then clearly define what success looks like. Remember, there are two problems here. So, you need to define success criteria for both the business and your machine learning model. Let's go back to our X-ray example from the medical domain. We want to build a model that takes in an X-ray image and gives a risk score for several particular diseases. For the machine learning success criteria, you may decide to get the multi-class accuracy of your model, to get a sense of how well the model predicts certain diseases. The business success criteria, however, will look very different. It should be defined by the business itself, in this case the medical laboratory. They may choose to somehow measure if the use of the model improved their services. Maybe it's saved time for doctors. Maybe it's saved money, or perhaps there was a revenue gain because the lab's now able to cater to more patients the business should outline. The business should outline what kind of key performance indicators are relevant to them. And it's up to the business to define how they measure the success of the overall project in their own context. The final outcome of the BUPD stage should be a clear understanding of the machine learning problem you're going to tackle. So, make sure any milestones you use help with this understanding. What concrete deliverables will come out of each stage of the machine learning process life cycle. There are many factors that will put constraints on what you, as the machine learning scientist, are able to do. External constraints that are out of your control will have a big impact on how you implement your machine learning solution. First, there are constraints imposed by the specific domain you're working in. You should discuss such business constraints when you're learning about the business processes with the domain experts. But there are also constraints on the machine learning solution side as well. For example, maybe your model needs to be explainable. Or you can't use certain types of data because of privacy issues. Depending on the problem domain you need to think about any data constraints that may arise. Can I host the data in US servers only, or only on Canadian servers? Maybe the data can only be stored in Alberta. At this point you need to examine privacy issues around the data. And when thinking about such constraints you might also want to highlight any data sources that will be used for the project. Don't get bogged down in details, but do look broadly at what's available and what kind of permissions might be needed. You may also need to limit yourself to certain types of models because there's limited hardware capacity. Say you can't use cloud networks, or you have to limit yourself around other hardware constraints. At this point, you may want to define both the development environment and the collaborative environment that you're going to use. Is the business using a particular tool they want the machine learning scientists to use as well? Is the business open to using other programming languages and collaborative tools? Can GitHub or another versioning system act as the core repository and be used to track all the information? There will be a lot of content generated throughout the process and you need to properly capture all of it. You must agree with the business on how best this can be done. Last, but certainly not least, let's talk about problem definition. You start with a business problem and you want to provide a machine learning solution which solves some machine learning problem. This, the machine learning problem, is the part that you can solve. The business needs to agree that solving this machine learning problem will be relevant to their original business problem. There's a translation happening here, if you solve the wrong machine learning problem, it's not going to address the relevant business problem. This is why it's so important to try to understand the domain and why you need to clearly define success criteria for both the machine learning solution. And whether the machine learning solution actually solves the problem you care about. In this video we looked closer at the Business Understanding and Problem Discovery phase of the machine learning process lifecycle. BUPD is a communication and discussion phase. It's rare that you'll have hands-on work with data at this point. You want to work with domain experts to unravel information about the problem and how it might be tackled with machine learning. You may start looking into some machine learning literature to see if similar problems have already been worked on. This literature review can help you place the business problem in your academic setting. By the end of this phase you want to be able to answer, what is the machine learning solution for this business problem? The key to answering this question is communication.