Now that you have an overview of the TFX platform, let's go more in depth on its key concepts. A TFX component is an implementation of the machine learning task in your pipeline. They are designed to be modular and extensible while incorporating Google's machine learning best practices on tasks such as data partitioning, validation, and transformation. Each step of your TFX pipeline, again called a component, produces and consumes structured data representations called artifacts. Subsequent components in your workflow may use these artifacts as inputs. In this way, TFX lets you transfer data between components during the continuous execution of your pipeline. Components are composed of five elements. First, the component specification or components specs define how components communicate with each other. They described three important details of each component: it's input artifacts, output artifacts, and runtime parameters that are required during component execution. Components communicate through typed input and output channels. Components specs are implemented as protocol buffers, which are Google's language-neutral, platform-neutral, extensible mechanism for serializing structured data similar to XML or JSON, but smaller, faster, and simpler. Second, a component contains a driver class which coordinates compute job execution, such as reading artifact locations from the ML metadata store, retrieving artifacts from pipeline storage, and the primary executor job for transforming artifacts. Third, a component contains an executor class, which implements the actual code to perform a step of your machine learning workflow, such as ingestion or transformation on dataset artifacts. Forth, a component contains a component interface, which packages the component specification and executer for use in a pipeline. Finally, a component interface packages the component specification, driver, executer, and publisher together as a component for use in a pipeline. Let's look closer at how a TFX components jobs execute sequentially at runtime to reinforce our understanding of components. First, a driver reads the component specification for runtime parameters and retrieve the required artifacts from the ML metadata store for the component. Second, an executor performs the actual computation on the retrieved input artifacts and generates the output artifacts. Finally, the publisher reads the components specification to log the pipeline component run and ML metadata and write the components output artifacts to the artifacts store. TFX pipelines are a sequence of components linked together by a directed acyclical graph of the relationships between artifact dependencies. They communicate through input and output channels. A TFX channel is an abstract concept that connects component data producers and data consumers. Component instances produce artifacts as outputs and typically depend on artifacts produced by upstream component instances as inputs. For example, transform data is an artifact. It depends on the training data artifact ingested into your pipeline and serves as input to your model during model training. Parameters are inputs to pipelines that are known before your pipeline is executed. Parameters let you change the behavior of a pipeline or part of a pipeline through configuration protocol buffers instead of changing your components and pipeline code. TFX pipeline parameters are a great abstraction that allows you to increase your pipeline experimentation velocity by running your pipeline fully or partially with different sets of parameters, such as training steps, data split spans, or tuning trials without changing your pipelines code every time. For example, the number of workers is a runtime perimeter to your entire pipeline that you can fix. When prototyping your pipeline, you can run different pipelines in parallel with a fixed number of workers that have different components or models and benchmark their runtime, memory, and performance in order to inform further improvements in your pipeline. TFX implements a metadata store using the ml metadata library, which is an open-source library to standardize the definition, storage, and querying metadata for ml pipelines. The ml metadata libraries store the metadata in a relational backend. For notebook prototypes, this can be a local SQL database, and for production Cloud deployments, this could be a managed MySQL or Postgres database. ML metadata does not store the actual pipeline artifacts. TFX automatically organize and stores the artifacts on local file systems, on a remote Cloud storage file system for the consistent organization across your machine learning projects. Orchestrators coordinate pipeline runs specifically component executers sequentially from a directed graph of artifact dependencies. Orchestrators ensure consistency with pipeline execution order, component logging, retries and failure recovery, and intelligent parallelization of component data processing. TFX pipelines are task aware, which means that they can be authored in a script or a notebook to run manually by the user as a task. A task can be an entire pipeline run or a partial pipeline run of an individual component and its downstream components. A key innovation of TFX pipelines is that they're both task and data-aware pipelines. Data-aware means TFX pipelines store all the artifacts from every component over many executions so they can schedule component runs based on whether artifacts have changed from previous runs. The implication is that your pipeline automatically checks whether re-computation of artifacts, such as large data ingestion and transformation tests is necessary when scheduling component runs. The implication is that this can dramatically speed up your model retraining in tuning velocity in a continuous training pipeline, resulting in insignificant pipelines, speed ups, and compute resource efficiencies. TFX horizontal layers coordinate pipeline components. These are primarily shared libraries of utilities and protobufs for defining abstractions that simplify the development of TFX pipelines across different computing and orchestration environments. For most machine learning cases with TFX, you will only interact with the integrated front-end layer and don't need to engage directly with the orchestrator, shared libraries, and ML metadata unless you need additional customization. At a high level, there are four horizontal layers to be aware of. An integrated frontend enables GUI-based controls over pipeline job management, monitoring, debugging, and visualization of pipeline data models and evaluations. Orchestrators come integrated with their own frontends for visualizing pipeline-directed graphs. Orchestrators run TFX pipelines with shared pipeline configuration code and produce. All orchestrators inherit from a TFX runner class. TFX orchestrators take the logical pipeline object, which can contain pipeline args components and a DAG, and are responsible for scheduling components of the TFX Pipelines sequentially based on the artifact dependencies defined by the DAG. There are also shared libraries and protobufs that create additional abstractions to control pipeline garbage collection, data representations, and data access controls. An example is the TFXIO library, which defines a common in-memory data representation shared by all TFX libraries and components in an I/O abstraction layer to produce such representations based on Apache Arrow. Finally, you have pipeline storage. ML metadata records pipeline execution metadata and artifact path locations to share across components. You also have pipeline artifacts Storage, which automatically organizes artifacts on local or remote Cloud file systems.