Now that you're familiar with the basics of BigQuery, it's time to talk about how BigQuery organizes your data. BigQuery organizes data tables into units called data sets. These data sets are scoped to your GCP project. When you reference a table from the commands line and SQL queries are in code, you refer to it by using the construct project.dataset.table. What are some reasons to structure your information into datasets, projects and tables? These multiple scopes, project, dataset and table can help you structure your information logically, you can use multiple datasets to separate tables pertaining to different analytical domains and you can use project level scoping to isolate datasets from each other according to your business needs. Also, as we'll discuss later, you can align projects to billing and use data sets for Access Control. You store data in separate Tables based on logical schema considerations. The project is what the billing is associated with. For example, if you query a table that belong to BigQuery public data project, the storage costs are build to that data project. To run a query, you need to be logged into the GCP console. You'll run a query in your own GCP project and the query charges are then build to your project, not the public data project. In order to run a query in a project, you need Cloud IAM permissions to submit a job. Remember that running a query means that you must be able to submit a query job to the service. Access control is through Cloud IAM, and is that the data set level, not individual tables in the data set. In order to query some data in a table, you need at least read permission on the data set in which the table lives. Like cloud storage, BigQuery data sets can be regional or multi-regional. Regional data sets are replicated across multiple zones in the region, multi-regional means replication among multiple regions. Every table has a schema, you can enter the schema manually through the GCP console or by supplying a JSON file. As with Cloud Storage, BigQuery storage encrypts data at rest and over the wire using Google-managed encryption keys, but it's also possible to use customer-managed encryption keys. authentication is through Cloud IAM, and so it's possible to use Gmail addresses or G Suite accounts for this task. Access control as we talked about is through Cloud IAM roles and involves giving permissions. We discussed two of those in read access and the ability to submit query jobs. However, many other permissions are possible. Remember that permissions are at the data set level. When you provide access to a dataset, either read or write, you provide access to all of the tables in that dataset. Logs and BigQuery are immutable and are available to be exported to Stackdriver. Admin activities and system events are logs, an example of a system event is table expiration. If when creating a table you can figure out to expire in 30 days, at the end of 30 days a system of event will be generated in logs. You will also get immutable logs of every access that happens to a data set under your project. BigQuery provides predefined roles for controlling access to resources. You can also create Cloud IAM roles consisting of your defined set of permissions and then assign those roles to users or groups. You can assign a role to a Google email address or a G Suite group. An important aspect of operating a data warehouse is allowing shared but controlled access against the same data to different groups of users. For example, Finance, HR and marketing departments all access the same tables, but their levels of access differ. Traditional data warehousing tools make this possible by enforcing row-level security. You can achieve the same results in BigQuery by defining authorized views and row-level permissions. Sharing access to datasets is easy. Traditionally onboarding new data analyst involves significant lead time, to enable analysts to run simple queries, you had to show them where the data source is resided, setup ODBC connections and tools and access rights. Using GCP, you can greatly accelerate an analyst time to productivity. To onboard an analyst on GCP, you grant access to the relevant projects, introduce them to the Google Cloud platform console and the BigQuery web UI, then you share some queries to help get them acquainted with the data. The GCP console provides a centralized view of all assets in your GCP environment, the most relevant asset to data analyst might be cloud storage buckets where they can collaborate on files. The BigQuery web UI presents the list of data sets that the analyst has access to, analyst can perform tasks in the GCP console according to the role you grant them, such as, viewing metadata, previewing data, executing, saving and sharing queries. You can only control access to datasets, when you provide read access to a dataset to a user, every table in that dataset is readable by the user. But what if you want more fine-grained control? That's when you use views, for example, in this example, we're creating a view and dataset B and the view is a subset of the table data in dataset A. Now, by providing users with access to dataset B, we're creating an authorized view that is only a subset of the original data. Note that you cannot export data from a view and dataset B has to be in the same region or multi-region as dataset A. A view is a SQL query that looks like and has properties similar to a table. You can query a view just like you query a table, BigQuery also supports materialized views. These are views that are persisted so that the table does not need to be queried every time the view is used. BigQuery will keep the materialized view refreshed and up-to-date with a contents of the source table. In the queries we saw earlier, we wrote the query in SQL and hit run on the UI, what this did was submit a query job to the BigQuery service. The BigQuery query service is separate from the BigQuery storage service, however, they're designed to collaborate and be used together. In this case, we were querying native tables in the bigquery public data project. Querying native tables is the most common case and the most performant way to use BigQuery. BigQuery is most efficient when working with data contained in its own storage service, the storage service and the query service work together to internally organize the data to make queries efficient over huge datasets of terabytes and petabytes in size. The query service can also run query jobs on data contained in other locations such as tables in CSV files hosted on cloud storage. So you can query data and external tables or from external sources without loading it into BigQuery, these are called Federated queries. In either case, the query service puts the results into a temporary table and the user interface pulls and displays the data in the temporary table. This temporary table is stored for 24 hours, so if you run the exact same query again, and if the results would not be different then BigQuery will simply return a pointer to the cached results. Queries that can be served from the cache do not incur any charges. It's also possible to request that the query job right to a destination table, in that case you get to control when the table is deleted. Because the destination table is permanent and not temporary, you'll get charged for the storage of the results. To calculate pricing, you can use BigQuery's query validator and combination with a pricing calculator for estimates. The query validator provides an estimate of the size of data that will be processed during a query, you can plug this into the calculator to find an estimate of how much running the query will cost. This is valid if you're using an on-demands plan where you pay for each query based on how much data is processed by that query. Your company might have opted for a flat rate plan, in that case your company will be paying a fixed price, and so the cost is really how many so-called slots your query uses. You can separate the cost of storage and the cost of queries, by separating projects A and B, it's possible to share the data without giving access to run jobs. In this diagram, users 1 and 2 have access to run jobs and access the datasets in their respective projects. If they run a query, that job is built to their own project. What if user 2 needs the ability to access the data in project A? This person who owns project a can allow user 2 to query the project A dataset and the charges will go to Project B. The public dataset project owner granted all authenticated users access to use their data, the special setting all authenticated users makes the dataset public. Authenticated users must use BigQuery within their own project and have access to run BigQuery jobs so that they can query the public dataset. The billing for the query goes to their project even though the query is using public or shared data. BigQuery offers one terabyte of querying for free every month so public data sets are an easy way to try out BigQuery. The BigQuery data transfer service allows you to copy large datasets from different projects to yours in seconds. We'll talk more about the BigQuery data transfer service in the next section on data loading.