eCommerce
|
XML: The Making Of A Markup Language OCTOBER 1998
When was the last time a markup language did anything useful for your company? HTML can paint some pretty pictures in your Web browser, but despite the addition of technologies such as Dynamic HTML, Cascading Style Sheets, and a raft of new tags, serious applications are moving a lot faster on the Web-server side than on the browser, which is burdened by legacy markup constructs and a static implementation of markup tags. And what about Standard Generalized Markup Language? SGML, a pre-Web technology, became a standard in 1986 and has since seen widespread use in encoding documents of all types, but it hasn't generated nearly as much excitement in the business marketplace as its most recent offspring, XML. You may have heard of XML, but chances are you haven't seen any yet. That's because XML, the Extensible Markup Language, is still in an embryonic period: Standards are being negotiated and tools are being created. Companies such as Adobe Systems, IBM, Microsoft, Netscape, and Sun Microsystems are proposing XML-based standards and, in a limited fashion, are beginning to support XML in their existing products. The result is that, even without current substantial implementations, the impact of XML on the Web, and on businesses in general, is clearly going to be far-reaching. Reasons To Believe There are good reasons for that. XML, which is really just a language for creating other markup languages, makes it easy to create structured documents, easily readable by humans, with tags that describe the content of the document. These documents can be exchanged easily and understood by properly written applications. Search engines, intelligent agents, EDI applications, database repositories, query systems, and online catalogs are just a few of the many applications that will greatly benefit from XML's structured document definitions. Even outside the Web, in applications such as document management and the movement of legacy data from one system to another, applications will be able to harness the advantages of XML. Two things are slowing down the adoption of XML on the Web. First, XML is a very new standard. It was just granted formal recommendation status in February by the World Wide Web Consortium, and it's still waiting on several infrastructure technologies (see chart, bottom) to make it useful. Second, widespread development won't be galvanized until stand- ardized markup languages are agreed upon and developed for vertical markets. How XML Works The best way to get a feel for the power of XML is to consider a useful example. The language is well-suited for projects with large quantities of similarly structured data, such as the universe of articles published by the various magazines owned by CMP Media Inc., InformationWeek's parent company. In our fictitious example, CMP Media will create an in-house publishing system that will work with all of its magazines. In particular, CMP wants to use XML to track articles, manage copy flow, and easily create a rich, searchable database of articles. Fortunately, XML is a great choice for designing the document structures for all of these tasks. Note that XML is not a programming language and therefore does not create executable bi- naries of any type. XML simply lets you define your tags and the relationship between them. The XML-encoded articles will have a rich structure that will make them easy to track, format, and manipulate. Inside XML XML files generally have two parts. One part is the XML tags and content itself, the other is the Document Type Definition that defines the tags and their relationships. The DTD can reside in the same file as the XML source or it can be in a separate file. The tables on page 71 show a sample XML file for InformationWeek's InternetView column. The first box of sample code shows the file column.xml, a recent InternetView column encoded in XML. (The body of the column is abbreviated in this example). The XML file is easily readable by humans (who can read English). I made up the various markup tags and it is clear what they refer to. Perhaps the only difficult lines are the first two, which declare that the file is XML 1.0 compliant and that it is not a standalone file (it depends on the file column.dtd). The second line actually defines the location of the DTD; in this case, it's in the file named column. DTD files define the tags and structure of the associated XML file but, unlike XML files, they are clearly not meant to be read by humans. The first ELEMENT statement defines the order that the other elements must appear in the XML file. In this case, the order is COLUMN, HEADLINE, AUTHOR, AUTHORPHOTO, COLUMNBODY, and SIGNATURE. The COLUMN tag also has three attributes: the tagline, which tells us it's the InternetView column; the current version of the column; and the date of the issue in which the column will appear. Each of the attributes is of type CDATA (Character Data)-a text string that does not get interpreted by the XML parser. All of the other elements are of type PCDATA (Parsed Character Data), and may contain HTML tags or other markup information. The XML parser will check the tags in the PCDATA string to ensure that they adhere to XML syntax rules (all tags must have a closing tag, for example). The DTD file in this example will give our custom XML-aware application a clear view of the document structure and, through a definition of the tags, a meaning to its contents, but it doesn't give any clue as to the format of the document. There isn't any information, such as you might find in a Microsoft Word file, about what font is being used, which characters are bold, or how they are justified in the document. These types of display issues will be handled in the near future by style sheets. A standard, called Extensible Style Sheet Language, is expected to be issued as a W3C Proposed Recommendation in about a year. XSL will be based on the SGML style sheet standard, DSSSL, and will be compatible with CSS, the HTML style-sheet standard. Style sheets will typically be kept in external files and referenced from within an XML file with a line such as this: "column.xsl" type="text/xsl" ?> The diagram on page 71 includes a simple XSL style sheet that could be used to display the column in a Web browser. This XSL style sheet uses CSS flow conventions and HTML tags. In general, XSL style sheets will be much more complicated than this example and will rarely, if ever, be written completely by hand. Tools such as ArborText's XMLStyler program offer a graphical user interface for creating XSL style sheets. Standards groups are working to make XML as Web-friendly as possible. It's even possible, though it involves awkward syntax, to embed JavaScript scripts in XML files. The preferred way is to put the JavaScript in external files and then refer to the scripts by using the SRC attribute of the HTML. |
XML Home Architecture B2B Catalog Manager ERP Introduction Microsoft Middleware Primer XML to EDI Extranet Tech. Specs |