In this lecture, I'm going to try and provide a brief introduction to both HTML and XML. Now, both XML and HTML are markup languages. What does this mean? It means that there's text that you can describe with tags and those tags tell you something about the text. So, let's look at XML first. XML stands for Extensible Markup Language and its purpose is to describe data. So, in XML, we get to define our own tags and tags can be nested within one another. So, we can have hierarchically-structured documents. So, here's a simple XML description of an appointment, say I'm making an appointment calendar. So, I've defined in this document a bunch of tags which are the things in the braces here and they provide metadata. They tell me about the text that's going to follow. So, when I have an appointment, the "to" metadata tells me that's the person that I'm making the appointment with. The "time" metadata says this is the time that this appointment is going to happen. Note that these tags don't have any predefined meaning. They mean whatever I say they mean. This is just a mechanism for structuring this hierarchically-structured data. Now, HTML is a markup language that's specifically designed to describe how to display data. So, in HTML we define it as two basic areas. We have a header that describes information about the document, and then we have the body and the tags that we're using are used to describe how we should display the contents. So, a heading, "h1", is something that means we should display that content in a large font and "p" means paragraph. That means that's just sort of body texts that we should display in paragraph form. So, let's look a little bit more at the structure of HTML. So, first we have a tag that says this is HTML so that the browser' know how to interpret it. Then we have a header section that describes a metadata about this document. So, for example we could put the title of the web page and that's what shows up in the browser in the tab when we're looking at this particular page. Then we hav the data about the page itself. So, this particular web page can be very boring. It's going to have one top level heading, and a paragraph. This is what it looks like when you display it in the browser. The browser understands this set of tags and can display the content appropriately. So, what are the elements that we're looking at for HTML documents? So, at a very high level, there aren't too many of them. We have headings that describe, sort of, top level descriptions of content, we have paragraphs, we have links to other content, we have images, we have buttons that allow radio selection or check boxes or simple buttons that allow forms to be submitted, and then finally we have lists and tables. So, let's take a look at what those look like in a document. So, we have a header with a title, we have a body that has different levels of headings, HTML allows six different levels of headings, we have a paragraph, we have a link that links us back to Coursera, we have an image, and a button that we could use to submit a form and it will display this out on the browser, you can see all the visual elements presented. The way HTML is structured is it's sort of like a scroll that goes from top to bottom. So, as you put elements on the page, they go from, sort of, top down. Now, there's much, much more to HTML than that. Nobody builds web pages that look like this anymore. So, the primary thing that's changed is that there are heavy use of styles. So, there's something called cascading style sheets that says, when I see a heading, I want you to use this font in this background and other things and they can be very very sophisticated. What we can do in a document is we can have divisions in it and each of those divisions will use a different cascading style sheet to change how the information is presented. Then we can use lists and tables to present complex data sort of graphically in lists and tables. Finally, we can use JavaScript to build arbitrarily complex user interfaces like games that actually are downloaded into the browser and then run as if there were any other program. Finally, with HTML5, we can define semantic elements. So, it's starting to bridge the gap a little bit between HTML and XML and each of those semantic elements can have a style associated with it. This is complex enough that we can't possibly cover it in this course. What we're going to do is, we're going to describe how to extract certain HTML elements out and XML elements out so that we can test them. But if you want to know more about how HTML and XML are structured, their entire Coursera course is on this. So, I would encourage you to go out and take one of these courses and gain a more in depth understanding of XML and HTML. But let's look at a complex page just so you can see kind of what the depth of structure is that you're likely to find. This is the web page again for our software engineering center and as modern web pages go, this is actually relatively simple. But still it's much more complicated than the examples that we showed in the lecture so far. So, one nice thing is that current browsers have the capability of showing you what the HTML is that this is associated with the page that you see and allows you to navigate within it. So, in order to do that in Chrome, there are a couple of different ways. First, I can just hit the F12 key, and that brings up the page source. I can also right-click on a page, and have used the "View page source" command. So, let's look at what we see within a web page. Now first, in the header, there's a bunch of information about where this content came from, and more importantly, there's a bunch of these cascading style sheets that tell the browser how to display different kinds of content. These cascading style sheets are actually separate web pages that have their own URLs that get linked in as part of the load process for this web page. So, we can take a look at where these come from and also we have some JavaScript on the web page that's going to be used for different capabilities, probably search. So, we load that in from different websites and that all gets done as part of the header process the metadata that's associated with the page. Then when we look at the page itself, we start to see a bunch of divisions which are going to be used for different formatting purposes to allow the style sheets to properly describe the content of the page. If we scroll down a bit, we can see that by far the largest chunk of the web page is managed by this particular division. We can look into it and see that this subsection deals with the left side menu and we have the grid that deals with the the paper contents and stuff. So, we can use this ability of the browser to map between the HTML and the visual display to get an understanding of how the page is laid out. Another thing that we can do, if we want to look at a particular graphical element, say I want to be able to find the search box, I can select it and then I can inspect it and what will happen is it will take me directly to that element of the HTML document. So, this is super handy for navigation and when we get to automated tests using programs, we're going to use this capability to figure out how to navigate programmatically to the different aspects of the page that we want to test. So, to recap, HTML and XML are both markup languages. That is they contain data and they contain metadata about the data. In XML, the focus is on the data itself. We're trying to describe hierarchically-structured data and we use the Markup tags to show that hierarchy. In HTML, we're focused on presentation. So, the metadata, the tags are describing different ways of displaying the content. Both of these are easy to start, but hard to master. It's very simple to write your first web page or to write a very simple XML description of data, but it's hard to do it well. Your first web page will definitely looked like it was produced by an amateur. So, if you want to get good at this, check out one of the other Coursera courses that's entirely devoted to describing how XML and HTML fit together.