Hello there! My name is Omar Ismail, and I am a Solutions Engineer at Google Cloud. In this module, we talk about the different IAM roles, quotas and permissions required to run Dataflow, Google's batch and streaming analytics service, based on Apache Beam. In this video, we learn how IAM provides access to the different Dataflow resources. You have your Beam code, and now you want to run it on Dataflow. Let us look at what happens when your Beam code is submitted. When the pipeline is submitted, it is sent to two places. The SDK uploads your code to Cloud Storage and sends it to Dataflow. The Dataflow service does a few things: Validates and optimizes the pipeline Creates the Compute Engine Virtual Machines in your project to run your code. Deploys the code to the VMs Starts to gather monitoring information for display. When all that is done, the VMs will start running your code! At each of the stages we mentioned-- user submission of code, dataflow validating the pipeline, and the VM running, IAM plays a role in determining whether to continue the process. We will briefly explain how IAM comes into play at each stage. Three credentials determine whether a dataflow job can be launched. The first credential that is checked is the User role. When you submit a code, whether you are allowed to submit it is determined by the IAM role, set to your account. On Google Cloud, your account is represented by your email address. For example, when I submit a dataflow job, it is done via omar@my-successful-company.com. 3 User Roles can be assigned to each user or group. Each role is made up of a set of permissions that determine how much access each user or group has to the different Dataflow resources. The first role you can assign to a user or group is the Dataflow Viewer role. If you want a user or group to be able to only view dataflow jobs, assign them the Dataflow Viewer Role. This role prevents submitting, updating, and canceling jobs. It allows users who have the role to only view Dataflow jobs, either in the UI or by using Command Line Interface. The next role you can assign to a user or group is the Dataflow Developer role. This role is ideal for a person who is responsible for managing pipelines that are running. For a job to run on Dataflow, the User must be able to: Submit the job to Dataflow, Stage files to Cloud Storage View the available Compute Engine quota. If a user only has the Dataflow Developer role, they can view and cancel jobs that are currently running, But they cannot create jobs, because the role does not have permissions to stage the files and view the Compute Engine quota. You can use the Dataflow Developer role as a building block to compose custom roles. For example, if you also want to be able to create pipelines, you can create a role that has the permissions from the Dataflow Developer role, plus the permissions required to stage files to a bucket and to view the compute engine quota. The last role you can assign to a user or group is the Dataflow Admin role. Use this role to provide a user or group with the minimum set of permissions that allow both creating and managing dataflow jobs. The Dataflow Admin role allows a user or group to interact with dataflow and stage files in an existing Cloud Storage bucket and view the compute engine quota. The second credential Dataflow uses, is the Dataflow service account. Dataflow uses the Dataflow service account to interact between your project and Dataflow; For example, to check project quota, to create worker instances on your behalf, and to manage the job during job execution. When you run your pipeline on Dataflow, it uses this service account: service-<project-number>@dataflow-service-p roducer-prod.iam.gserviceaccount.com. This account is automatically created when a Dataflow project is created. It is assigned the Dataflow Service Agent role and has the necessary permissions to run a Dataflow job in your project. In our job overview diagram, the Dataflow service account is responsible for the interaction happening, here, between your project and Dataflow. The last credential used to run dataflow jobs is the controller service account. (1)The controller service account is assigned to the compute engine VMs, to run your dataflow pipeline. (2)By default, workers use your project's Compute Engine default service account, as the controller service account. This service account: (<project-number>-compute@developer.gserviceaccount.com) is automatically created when you enable the Compute Engine API for your project from the APIs page in the Google Cloud Console. (3,4)The Compute Engine default service account has broad access to your project's resources, which makes it easy to get started with Dataflow. However, for production workloads, we recommend that you create a new service account with only the roles and permissions that you need. At a minimum, your service account must have the Dataflow Worker role and can be used by adding the --service_account_email flag when launching a Dataflow pipeline. When using your own service account, you might also need to add additional roles to access different Cloud Platform resources. For example, f your job reads from BigQuery, your service account must also have a role like the bigquery.dataViewer role. In our job overview diagram, where would the Controller Service Agent be?