Next, we'll talk about how to load new data into BigQuery. Recall from an earlier module that the method you use to load data depends on how much transformation is needed. E L or extract and load is used when data is imported, as is, where the source and target have the same schema. E L T or extract load transform is used when raw data will be loaded directly into the target and transformed there. E T L or extract transform load is used when transformation occurs in an intermediate service before it is loaded into the target. You might say that the simplest case is E L. If the data is usable in its original form, there's no need for transformation just load it. You can batch load data into BigQuery. In addition to C S V, you can also use data files with Delimiters other than commas by using the field underscore Delimiter flag. BigQuery supports loading G zip compressed files. However, loading compressed files isn't as fast as loading un compressed files. For time sensitive scenarios, or scenarios in which transferring un compressed files to cloud storage is banned with, or time constrained conduct a quick loading test to see which alternative works best because load jobs are asynchronous. You don't need to maintain a client connection while the job is being executed. More importantly load jobs don't affect your other BigQuery resources. A load job creates a destination table if one doesn't already exist. BigQuery determines the data schema as follows. If your data is in avro format, which is self describing, BigQuery can determine the schema directly. If the data is in Jason or C S V format, BigQuery can auto detect the schema, but manual verification is recommended. You can specify a schema explicitly bypassing the schema as an argument to the load job. Ongoing load jobs can append to the same table using the same procedure as the initial load. But do not require the schema to be passed with each job. If your C S V files always contain a header row that should be ignored after the initial load and table creation, you can use the skip underscore leading underscore rose flag to ignore the row. For details, see the documentation on bigqueryloadflags.avro. Big query sets daily limits on the number and size of load jobs that you can perform per project and per table. In addition, BigQuery sets limits on the sizes of individual load files and records. You can launch load jobs through the BigQuery web UI. To automate the process, you can set up cloud functions to listen to a cloud storage event that is associated with new files arriving in a given bucket and launch a big query load job. Big query can import data stored in the Jason file format as long as it is new line delimited. It can also import files in avro parquet and O or C format. The most common import is with C S V files which are the bridge between BigQuery and spreadsheets. BigQuery can also directly import fire store and data store export files. Another way that BigQuery can import data is through the A P I. Basically, any place where you can get code to run, can theoretically insert data into BigQuery tables. You could use the A P I from a compute engine instance, a container on kubernetes, app engine or from cloud functions. However, you would have to recreate the data processing foundation in these cases. In practice the A P I is mainly used from either data proxy or data flow. The BigQuery data transfer service provides connectors and pre built big query load jobs that perform the transformations necessary to load report data from various services directly into BigQuery. Cloud storage can be useful in the E L process. You can transfer files to cloud storage in the schema that is native to the existing on premises data storage, and then load those files into BigQuery. It is a common practice to automate execution of queries based on a schedule or event and cash the results for later consumption. You can schedule queries to run on a recurring basis. Scheduled queries must be written in standard sequel, which can include data definition language and data manipulation language statements. The query string and destination table can be parameterised, allowing you to organize query results by date and time. By maintaining a complete seven day history of changes against your table's, BigQuery allows you to query a point in time snapshot of your data. You can easily revert changes without having to request a recovery from backups. This slide shows how to do a select query to query the table as of 24 hours ago. Because this is a select query, you can do more than just restore a table. You can join against some other table or correct the value of individual columns. You can also do this using the BigQuery command line tool. As shown in the second snippet here, we're restoring data as of 120 seconds ago. You can recover a deleted table only if another table with the same idea in the data set has not been created. In particular, this means you cannot recover a deleted table if it is being streamed to. Chances are that the streaming pipeline would have already created an empty table and started pushing rose into it. Also, be careful using create or replace table because this makes the table irrecoverable. BigQuery is a managed service. So you don't have the overhead of operating maintaining or securing the system. A typical data warehouse system, requires a lot of code for coordination and interfacing. You can get BigQuery data transfer service running without coding. The core of BigQuery data transfer services scheduled and automatic transfers of data from wherever it is located in your data center on other clouds in SAAS services into big query. Transferring the data is only the first part of building a data warehouse. If you were assembling your own system, you would need to stage the data so that it can be cleaned, data quality and transformed E L T, Extract Load, Transform and processed put into its final and stable form. A common issue with data warehouse systems is late arriving data. For example a cash register closes late and does not report its daily receipts during the scheduled transfer period. To complete the data, you would need to detect that not all of the data was received and then requests the missing data to fill in the gap. This is called data backfill and it is one of the automatic processes provided by BigQuery data transfer service. Backfilling data means adding missing past data to make a data set complete with no gaps and to keep all analytic processes working as expected. Use the data transfer service for repeated periodic scheduled imports of data directly from software as a service systems into tables in BigQuery. The BigQuery data transfer service provides connectors, transformation templates and the scheduling. The connectors establish secure communications with the source service and collect standard data exports and reports. This information is transformed within BigQuery. The transformations can be quite complicated resulting in from 25 to 60 tables. And the transfer can be scheduled to repeat as frequently as once a day. The BigQuery data transfer service can also be used to efficiently move data between regions. Notice that you don't need cloud storage buckets. BigQuery data transfer service runs BigQuery jobs that transform reports from SAS sources into BigQuery tables and views. Google offers several connectors including campaign manager, cloud storage, amazon S3 google ad manager, google ads, google play transfers, youtube channel, youtube content owner, terra data migration and over 100 other connectors through partners. Keep in mind if your data transformations are simple enough, you may be able to do them with just sequel. If cloud storage is par of your workflow, you can load files from cloud storage into staging tables in BigQuery first, and then transform the data into the ideal schema for BigQuery by using BigQuery sequel commands. BigQuery support standard D M L statements such as insert, update, delete and merge. There are no limits on D M L statements. However, you should not treat BigQuery as an O LTP system. The underlying infrastructure is not structured to perform optimally as an O L T P. There are other more appropriate products on google cloud for such workloads. BigQuery also supports D D L statements like create or replace table. In the example on this slide, the replace statement is used to transform a string of genres into an array. Will cover arrays in greater detail later in the course. Lastly, what if your transformations went beyond what functions were currently available in BigQuery? Well, you can create your own. BigQuery supports user defined functions or U D F. A UDF enables you to create a function using another sequel expression, or an external programming language. JavaScript is currently the only external language supported. We strongly suggest you use standard sequel though, because BigQuery can optimize the execution of sequel much better than it can for JavaScript. UDFs allow you to extend the built in sequel functions. UDFs take a list of values which can be arrays or struck's and return a single value which can also be an array or strucked. UDFS written in JavaScript can include external resources, such as encryption or other libraries. Previously UDFs were temporary functions only. This meant you could only use them for the current Query or command line session. Now, we have permanent functions, scripts and procedures in beta but they might even be generally available by the time you're seeing this. Please check the documentation. When you create a UDF, BigQuery persists it and stores it as an object in your database. What this means is you can share your UDFs with other team members, or even publicly if you wanted to. The BigQuery team has a public git hub repo for common user defined functions at the link you see here.