SGML, HTML, and XML
The relationship between SGML and the Hypertext Markup Language (HTML) needs to be clearly understood. Although not originally designed as such, HTML is now an SGML application, even though many HTML documents exist that cannot be validated according to the rules of SGML. HTML consists of a set of elements that are interpreted by Web browsers for display purposes. The HTML tags were designed for display and not for other kinds of analysis, which is why only crude searches are possible on Web documents. HTML is a rather curious mixture of elements. Larger ones, such as <body>; <h1>, <h2>, and so on for head levels; <p> for paragraph; and <ul> for unordered list, are structural, but the smaller elements, such as <b> for bold and <i> for italic, are typographic, which, as we have seen above, are ambiguous and thus cannot be searched effectively. HTML version 3 attempts to rectify this ambiguity somewhat by introducing a few semantic level elements, but these are very few in comparison with those
identified in the TEI core set. HTML can be a good introduction to structured markup. Since it is so easy to create, many project managers begin by using HTML and graduate to SGML once they become used to working with structured text and begin to see the weakness of HTML for anything other than the display of text. SGML can easily be converted automatically to HTML for delivery on the Web, and Web clients have been written for the major SGML retrieval programs.
The move from HTML to SGML can be substantial, and in 1996 work began on XML (Extensible Markup Language), which is a simplified version of SGML for delivery on the Web. It is "an extremely simple dialect of SGML," the goal of which "is to enable generic SGML to be served, received, and processed on the Web in the way that is now possible with HTML" (see http://www.w3.org/TR/REC-xml ). XML is being developed under the auspices of the World Wide Web Consortium, and the first draft of the specification for it was available by the SGML conference in December 1996. Essentially XML is SGML with some of the more complex and esoteric features removed. It has been designed for interoperability with both SGML and HTML-that is, to fill the gap between HTML, which is too simple, and fullblown SGML, which can be complicated. As yet there is no specific XML software, but the work of this group has considerable backing and the design of XML has proceeded quickly.