Next we'll talk about how to load new data into BigQuery. Recall from earlier modules that the method you use to load data depends on how much transformation is needed. EL or extract and load is used when data is imported as is where the source and target have the same schema. ELT or extract, load, transform, is used when raw data will be loaded directly into the target and transformed there. ETL or extract, transform, load is used when transformation occurs in an intermediate service before it's loaded into the target. You might say that the simplest case is EL, if the data is usable in its original form, there's no need for transformation just load it. You can batch load data into BigQuery, in addition to CSV you can also use data files with delimiters other than commas by using the field delimiter flag. BigQuery supports loading gzip compressed files as well however, loading compressed files isn't as fast as loading uncompressed files. For time-sensitive scenarios or scenarios in which transfer uncompressed files to cloud storage is bandwidth or time-constrained, conduct a quick loading test to see which alternative works best. Because load jobs are asynchronous, you don't need to maintain a client connection while the job is being executed. More importantly load jobs don't affect your other BigQuery resources. A load job creates a destination table if one doesn't already exist, BigQuery determines the data schema as follows. If your data is an Avro format, which is self-describing BigQuery can determine the schema directly, if the data is in JSON or CSV format BigQuery can auto detect the schema, but manual verification is recommended. You can specify a schema explicitly bypassing the schema in as an argument to the load job. Ongoing low jobs can append to the same table using the same procedure as the initial load, but do not require the schema to be passed with each job. If your CSV files always contain a header row that should be ignored after the initial load and table creation. You can use the skip leading rows flag to ignore the row, for details see the documentation on BQ load flags. BigQuery sets daily limits on the number and size of load jobs that you can perform per project and per table. In addition, BigQuery sets limits on the sizes of individual load files and records. You can launch load jobs through the BigQuery web UI, and to automate the process you can set up cloud functions to listen to a cloud storage event. That is associated with new files arriving in a given bucket and launched a BigQuery load job. BigQuery can import data stored in the JSON format so long as it's a new line delimited. It can also import files in Avro, parquet and RC, the most common import is with CSV files which are the bridge between BigQuery and spreadsheets. BigQuery can also directly import firestore and datastore export files. Another way that BigQuery can import data is through the API, basically any place where you can get code to run can theoretically insert data into BigQuery tables. You could use the API from a compute engine instance, a container on Kubernetes, app engine, or from cloud functions. However, you'd have to recreate the data processing foundation in these cases. In practice the API is mainly used from either cloud data proc or cloud dataflow. The data transfer service or DTS provides connectors and pre-built BigQuery load jobs that perform the transformations necessary to load report data, from various business services directly into BigQuery. You can transfer files to cloud storage in the schema that is native to the existing on-premises data storage. Loading into a set of staging tables and BigQuery and then transformed into the ideal schema for BigQuery using BigQuery SQL commands. It's a common practice to automate execution of queries based on a schedule or an event, and cache the results for later consumption. You can schedule queries to run on a recurring basis, scheduled queries must be written in standard SQL, which can include data definition language, and data manipulation language statements. The query string and destination table can be parameterised allowing you to organize query results by date and time. By maintaining a complete 7-day history of changes against your table, BigQuery allows you to query a point-in-time snapshot of your data. You can easily revert changes without having to request a recovery from backups. This slide shows you how to do a select query to query the tables as of 24 hours ago, because this is a select query, you can do more than just restore table. You can join against some other table or correct the values of individual columns, you can also do this using the BigQuery command line tool as shown in the second snippet. Here, we're restoring data as of 120 seconds ago. You can recover a deleted file only if another table with the same ID in the dataset has not been created. In particular, this means you cannot recover a deleted table if it's being streamed into. Chances are that the streaming pipeline would have already created an empty table and started pushing rows into it. Be careful using create or replace table because this makes the table are recoverable. BigQuery is a managed service, so you don't have the overhead of operating, maintaining, or securing the system. A typical data warehouse system requires a lot of code for coordination and interfacing. You can get BigQuery DTS running without coding, the core of DTS is scheduled an automatic transfers of data from wherever it's located. In your data center, on their clouds and south services, you can get all of that data into BigQuery. Transferring the data is only the first part of building a data warehouse. If you're assembling your own system, you would need to stage the data so that it can be cleaned for data quality. And transformed using ELT like we talked about before, and processed and put into its final stable form. A common issue with data warehouse systems is late arriving data. For example, a cash register closes late and does not report its daily receipts during the schedule transfer period. To complete the data you would need to detect that not all of the data was received, and then request the missing data to fill in the gap. This is called data backfill, and it's one of the automatic processes provided by BigQuery DTS. Backfilling data means adding a missing past data to make a dataset complete with no gaps, and to keep all analytic processes working as expected. Use the data transfer service for repeated, periodic scheduled imports of data directly from software as a service systems, and to tables in BigQuery. The data transfer service provides connectors, transformation templates, and the scheduling. The connectors establish secure communications with the source service and collect standard data exports in reports. This information is transformed with in BigQuery, the transformations can be quite complicated resulting in 25 to 60 tables. And the transfer can be scheduled to repeat as frequently as once a day. The BigQuery data transfer service automate state of movement from SAS applications to be query on a scheduled managed basis. It can also be used to efficiently move data between regions. Notice that you don't need cloud storage buckets, data transfer service runs BigQuery jobs that transform reports from SAS sources into BigQuery tables and views. Google offers several connectors including Campaign Manager, Cloud Storage, Amazon S3, Google Ad Manager, Google ads, Google Play Transfers. YouTube channel, YouTube content owner, Teradata migration, and over a hundred other connectors through partners. Keep in mind if your data transformations are simple enough, you may be able to do them with just SQL. BigQuery supports standard DML statements such as insert, update, delete, and merge. But you should not treat it as an OLTP system, avoid individual updates and inserts as are strict limits to the number of update statements every day. BigQuery also supports DDL statements like create or replace table. In the example on this slide, the replace statement is used to transform a string of genres into an array, we'll cover arrays in greater detail soon. A unique feature of BigQuery is that some data can be queried without first importing it into BigQuery tables. For example, it can look in the first sheet of a Google worksheet or CSV or JSON file. You could use a federated query to import data from a CSV and cloud storage and transform at using SQL, all in one query. However, importing the data into BigQuery will provide much faster performance. Here's a quick demo of how you can query a cloud SQL database which is hosted MySQL, Postgres or SQL Server directly from BigQuery using external connections syntax and the from clause. [MUSIC] Lastly, what if your transformations went beyond what functions were currently available in BigQuery? Well, you can write your own, BigQuery supports user defined functions or UDF. A UDF enables you to create a function using another SQL expression or an external programming language. JavaScript is currently the only external language supported, we strongly suggest you use standard SQL though, because BigQuery can optimize the execution of SQL much better than it can for JavaScripts. UDFs allow you to extend the built-in SQL functions. They take a list of values which can be arrays or struts, and return a single value which can also be an array or a struct. UDFs written in JavaScript can include external resources such as encryption or other libraries. Previously UDFs were temporary functions only, this meant you could only use them for the current query or command line session. Now we have permanent functions, scripts and stored procedures, and beta, but they might even be generally available by the time you're seeing this. Please check the documentation, when you create a UDF BigQuery persistent and stores that as an object in your database. What this means is that you can share your UDFs with other team members or even publicly if you want to. The BigQuery team has a public GitHub repository for common user-defined functions at the link you see here.