Up until now, we've mostly been moving data between variables, taken them from user input or outputting them to a display. Now, we're going to take things to the next level and start working with files. To do this, we're going to use select and assign clauses in the environment division, FD statements in the data division, and from the procedure division, open, close, read into, and right from statements. The environment and data division is where we describe our inputs and outputs, which will get used in the procedure division, because as we all know, that's where the action happens. Let's take a look at that action in action. Here, you can see the file control paragraph in the input/output section of the environment division. What we're doing here is associating each COBOL internal filename with an external dataset name. In this case, PRINT-LINE is the name of the internal file name and PRTLINE is the external dataset or file. So that's the select and assign clauses in the environment division. Further down in the data division, this is where we define the level numbers, variable names, data types, and lengths. The FD reserved word is used to give the COBOL compiler more information about those internal filenames. So PRINT-LINE, which is linked with the dataset file PRTLINE has these fields, types, and lengths, and ACCOUNT-REC, which is linked to ACCTREC has these fields, types, and lengths. When your COBOL code is running on an IBM Z mainframe, it'll have access to both traditional z/OS datasets as well as Unix files. Right now, we're going to focus on the z/OS sequential data storage method because it's extremely common and a good place to start when working with files in COBOL. A dataset in z/OS is made up of records, and you can think of each record as a line within a dataset, with the defined length. Like anything written out row by row, you'll find the exact value you're looking for by scanning down to the right record and then across for the field you want. If you take all these fields and add up their length, that makes up the entirety of the length of that record. You got it? It's simple. The way data input/output or IO is handled is a program says, "Hey, I'd like to read this piece of data out here." Then the operating system or data IO subsystem goes and gets it and places it into memory, where the program can go and read it. If a program is requesting lots and lots of reads, one right after the other, that's a lot of IO operations, which is somewhat inefficient and can lead to a slow performance. It would be like going into a restaurant and ordering a soda, and then waiting until that's brought out and saying, "You know what, I'd also like a hamburger," then they bring that out and you say, "Fries too," and they bring that out and you say, "Ketchup." Then they throw you out of the restaurant because that's extremely inefficient and annoying, and we don't want to be inefficient or annoying. In COBOL programming, we have this concept of something called blocks. Because we know so much about the records, we can load data as a block, where a block is a group of records. As long as the program is going to read one record, followed by the next, followed by the next, and so on, we can get the records we want in order into what's called a buffer. Take a look at this. We can see a block of records loaded into a buffer, where the overall record length is the size of each field combined. We set that overall block size in the buffer with the block contains clause. Earlier, we use the assign clause to describe a dataset source file. We've said select ACCOUNT-REC with a dash in it, assign to ACCTREC. If we're running this in z/OS using JCL, we need to have a JCL DD statement to link ACCOUNT-REC to that actual dataset. The JCL required for the COBOL compiler to make the connection back to the actual dataset looks something like this, //ACTREC DD DSN=MY.DATA,DISP=SHR. If you're unfamiliar with JCL, this is a data declaration statement. That's what the DD stands for. It says that for ACCOUNT-REC, look here at MY.DATA, the DISP=SHR just means that it expects that the dataset exist before we try to use it, and that other people can use it at the same time. We don't require exclusive access. Now, I don't know about you, but I want to see this all spelled out from end to end. So we'll start out with ACCOUNT-REC. That's what we reference within our procedure division in the actual code. ACCOUNT-REC is what's defined in the environment divisions, input/output section //ACCTREC, seven characters, then gets connected via a DD statement to MY.DATA, the actual dataset on the DISP. Now, don't sweat the JCL stuff too much right now. I know you came here to learn COBOL, not JCL, and chances are, if you're given some code to write or modify, somebody will prepare that JCL for you or at least tell you what datasets and names to use, so you won't have to guess. There's a link to an online resource for basic JCL concepts in the course notes, if you're curious though. One other important note. The COBOL compiler assumes that what you tell it for the filenames is true and correct, and it will work when it goes to run because it doesn't have any way of checking what's actually in your JCL or actually on the file system. If any part of that chain is broken or not correctly linked, you're going to get an error at runtime. That's why it's important to have a good naming scheme because chasing those types of errors is no fun at all. Up next, we'll talk about putting all of that to use in the procedure division.