A Short Introduction to XML

First and foremost, CSIXML is a well-formed XML document. This means that all CSIXML files will conform to the syntax rules for XML. Well formed XML documents possess a tree structure that consists of elements and element attributes. The document is expected to have a single root element which can contain any number of sub-elements which in turn can contain any number of sub-elements and/or other content. Every element must have a name and can optionally have a set of attributes which are a collection of name/value pairs where the name is unique.

Most XML files will begin with a sequence that identifies the file as XML and can also specify the character encoding of the file (if no character encoding is specified, the file is assumed to use the UTF-8 unicode character encoding). The following example shows this sequence as it will appear in CSIXML data files:

<?xml version=“1.0” standalone=“yes”?>

XML is derived from SGML (Standard Generalized Markup Language) and shares much of the same syntax rules as SGML. HTML (HyperText Markup Language) is also derived from SGML and, as a result, also has a significant resemblance to XML. XML elements are represented using tags. A tag begins with the less than character (<) followed by the name of the element. If that element has attributes, these will be expected to follow the element name with a name=“value” syntax. At least one white space character is expected to separate the attributes and the element name. Other than this rule, XML parsers ignore the presence of whitespace within the tag. If an element is empty (contains no child elements), the tag can end with a slash character (/) and the greater than character (>). If the element is not empty (the element does have child elements), the tag is expected to end with a greater than character (>) and the child elements or element content will be expected to follow. In this case, the end of the element is marked by another less-than character (<) followed by a slash character (/), the element name, and a greater than character (>).

The following example shows how an element with content may appear:

The following example shows how an empty element may appear:

Because XML reserves special characters for its mark-up language, pre-defined entities are recognized by all XML parsers. These entities include the following:

Pre-Defined XML Entities
<	less than sign (<)
>	greater than sign (>)
&	ampersand (&)
"	a double quote (“)
'	apostrophe or single quote (‘)

In addition to these pre-defined entities, arbitrary unicode characters can be represented by using the sequence &xxx; where xxx is the decimal unicode code value for the desired character.

For more details regarding XML documents and their contents, you can visit the W3C consortium web page at www.w3.org/XML/. In addition, they offer an excellent tutorial at www.w3schools.com/xml/default.asp.