The second SAS Programming course focused primarily on the data preparation phase of the SAS programming process. You learn much more about using the data step. As you now know, it's important to understand what's happening behind the scenes in order to use the data step accurately and efficiently. Not only is this critical information for the certification exam, but it also makes your life much easier as you prepare for real and often messy data. Before we get into the specific syntax you learned, let's take a few minutes to look at a data step we wrote earlier to clean and prepare data. But this time, let's carefully examined each statement, and understand the process that SAS goes through to prepare and execute the code. This data step reads the cr.orderstable. You might recall that the orders table had a few issues, and we were called to fix those problems. Specifically, we include only the rows where delivery data is valid, meaning, it occurred on or after order date. We converted the values and customer country to uppercase, and the negative values of quantity were replaced with a missing value. After cleaning up these data issues, we calculated new columns including; profit, shipped days, and age range. We also created order source conditionally based on the values of order type. Now, let's peep behind the curtains of this data step and see what SAS does with each statement as the code is compiled and executed. First, let's review the compilation phase. During this phase, SAS checks for syntax errors and prepares the code for execution. The key part to remember about the compilation phase is that SAS builds the Program Data Vector or PDV, which processes one row of data at a time. In the PDV, each column attribute is defined including the required name, type, and length. Other compile time statements established rules for how the PDV processes the data when it gets to the execution phase. First, the data statement defines the name and type of the output table. Then, the set statement adds all columns from the cr.orders table and their attributes to the PDV. The length statement is a compile-time statement that adds a column named order source to the PDV and sets the type to character with a length of eight. The where statement is also a compile-time statement that flags the columns delivery date and order date so that later in execution, only rows that meet the where condition are read into the PDV. As SAS continues to compile the step, the new numeric columns, profit, shipped days, and age range are all added to the PDV, and the format statement assigns a currency format to the attributes of the profit column. Finally, the drop statement flags those columns to drop in the execution phase when rows or output from the PDV to the profit table. At the end of the compilation phase, the descriptor portion for the profit table is created. With the PDV built, SAS then moves into the execution phase. This is when data is read into the PDV one row at a time, data is processed according to the rules established in the compilation phase, and rows are written to the output table. I'll take advantage of the data step debugger in SAS enterprise guide to view this process. Remember, there are no questions about interfaces on the certification exam, so there won't be any questions specifically about the debugger. However, there are questions that require you to think through the steps of compilation or execution. After launching the debugger, we see all the columns in the PDV. Each column is assigned a missing value to begin with the exception of underscore N underscore, and underscore ERROR underscore. Where did these two columns come from? They certainly aren't in the input table or the code. These two columns are automatically included in the PDV during execution, however, they are not written to the output table. underscore N underscore is a counter that indicates the iteration number of the data step. The value is one now because it's the first time through the data step loop. Underscore ERROR underscore is one if a data error is encountered for a row and zero, otherwise. If a data error occurs, a note with details regarding the error is written to the log. With the set statement highlighted, I'll click this button to execute that line. Notice that values are now populated for all of the columns read from the cr.orders table. Notice that the debugger skips the length and where statements and jumps to the customer country assignment statement. This is because length and where our compile-time statements and they've already done their job. The length statement define the attributes for order source. The where statement reads into the PDV only rows that meet the condition. Notice, order date and delivery date are equal so the condition is met. As I proceed to execute the statements, the case of customer country changes from lowercase BE to uppercase BE. Profit, ShipDays, and age range are calculated. When we reach the if condition, the first expression is true, order type is one. So, retail is assigned to order source, and SAS skips the subsequent else if statements. Drop is also skipped because it is a compile-time statement, and we've reached the end of the first iteration of the data step. After the concluding step boundary, the run statement, there is an implicit output action that writes the contents of the PDV as a row in the output table. Then, SAS loops back to the top of the data step. Now, we're back to the set statement again. But remember what happens in the PDV. Values for the new columns or reinitialized or in other words reset to missing. Also, notice that undescore N underscore is now two. Executing the set statement overwrites the values in the PDV with the next row from the input table that meets the where condition. The executable statements are processed in order again, and row number two is written to the profit table. This cycle just continues until the end of the input table is reached.