Welcome to the established toolchain lesson of the DataOps methodology. A DataOps toolchain is what we use in order to really get value from all of the automation available to us, to provide business ready data. This lesson looks at automation aspects of DataOps. It focuses on how iterative intelligent processes can be chained together to allow multiple iterations, continuously monitoring and measuring progress. It is key to address the challenges involved in manual redundant processes, and communication gaps, and uses the technology of your choice. A smart DataOps toolchain can help to deliver business value quickly and repeatedly to stakeholders and data consumers. A DataOps toolchain can help to overcome the following common roadblocks. Teams spend more time identifying data pipeline and code inconsistency issues, due to older code or incorrect connection and metadata information, infrastructure, or operations-related challenges, or resolving technical dependencies across stakeholders compared to the time spent focusing on data delivery. Manual processes lead to long response times, frequent errors, inconsistent data, and poor repeatability, needed to support multiple teams continuously. Siloed processes stemming from on-demand economies, lead to unstable data, or unpredictable results. What is a DataOps toolchain? A DataOps toolchain is the key ingredient, which links people and processes with technology, and automates redundant tasks. It iterates through the whole process repeatedly, and flags problems anywhere in the pipeline, either by addressing those problems itself, or by triggering an exception to be looked at by the appropriate person. It uses and updates a number of KPIs, that measure progress towards the overall objective, publishing current status, and detecting any deviation from the set of KPIs, also enabling historical analysis. A DataOps toolchain can be put in place, to ensure the reliable delivery of business ready data using a trusted repeatable process. A DataOps toolchain can be broken down to five steps; leveraging existing analytics tools, along with toolchain components meant to address source control management, process management, and efficient communication amongst groups, to deliver a reliable data pipeline. First of all, we need to use source control management. After all, a data pipeline is nothing but source code responsible for converting raw content into useful information. We can automate the data pipeline end to end, producing a source code which can be consumed in multiple projects. A revision control tool, like GitHub, helps us store and manage all the changes to code and configuration to minimize inconsistent deployment. Secondly, we can automate DataOps processes and workflow. For the DataOps methodology to be successful, automation is key, and requires a data pipeline designed with runtime flexibility. Key requirements to achieve this are automated data curation services, metadata management, data governance, master data management, and self-service interaction. Thirdly, we need to add data and logic tests. To be certain that the data pipeline is functioning properly, we need to test inputs and outputs, and we need to apply business logic. At each stage, the data logic is tested for accuracy or potential deviation, along with errors or warnings before they're released, ensuring consistent data quality. Fourthly, we need to work without fear of disruption to the current data pipelines. Data analytics professionals dread the prospect of deploying changes that break the current data pipeline. This can be addressed with two key workflows which are later integrated into production. First, the value pipeline creates continuous value for organizations. Second, the innovation pipeline takes the form of new analytics undergoing development, which are later added to the production pipeline. Lastly, we need to implement communication and process management, because efficient and automated notifications are critical within a DataOps practice. When changes are made to any source code, or when the data pipeline is triggered, or failed, or completed, or deployed, the right stakeholders can be notified immediately. Tools such as Slack or Trello, enable cross stakeholder communications, and they're a key part of the toolchain.