Bristle Software XML Tips

This page is offered as a service of Bristle Software, Inc.  New tips are sent to an associated mailing list when they are posted here.  Please send comments, corrections, any tips you'd like to contribute, or requests to be added to the mailing list, to tips@bristle.com.

Table of Contents:

  1. What is XML?
  2. Differences from HTML syntax
  3. Tags vs Attributes
  4. Escaping special chars
  5. XML Parsers
    1. Microsoft MSXML
      1. Using MSXML from VB
      2. Using MSXML from IE via VBScript
      3. Using MSXML from IE via JavaScript
  6. DTD Tips
    1. DTD Quick Reference
    2. DTD Conditional Compilation
  7. See Also

Details of Tips:

  1. What is XML?

    Original Version: 5/11/1999
    Last Updated: 2/12/2013
    Applies to:  XML 1.0+

    XML (eXtensible Markup Language) is syntactically similar to HTML (HyperText Markup Language).  They both consist basically of regular text marked up with tags and attributes.  However, the purposes of XML and HTML are very different.  The purpose of HTML is to describe the layout and physical appearance of the embedded text, to be displayed as a Web page.  The purpose of XML is to describe the structure and semantics of the embedded text, to be manipulated programmatically as data.  With HTML, you use the one set of predefined tags with their predefined display-oriented meanings.  With XML, you choose a set of tags that are in common use in your application area, or invent your own set.

    Prior to XML, you could pass data between programs in proprietary formats, or as databases, or as CSV (comma-separated values) files.  The producer and consumer of the data generally needed a common understanding of how the data was packaged so that the consumer could correctly interpret what the producer generated.  To send an address, there had to be agreement about how to recognize the street, city, state, etc. as parts of the address.  For example, it might be agreed that the 3rd field in a CSV file was the state, as:

    	"123 Main St", "Malvern", "PA", "19355"

    With XML, the data stream is self describing:

    	<address>
    		<street>123 Main St</street>
    		<city>Malvern</city>
    		<state>PA</state>
    		<zip>19355</zip>
    	</address>

    This structured approach lends itself it reusable parsing tools, automatic data transformations, debugging tools, and other reuse opportunities.  No longer is the consumer required to write a parser to interpret the incoming data stream.  It simply uses an existing XML parser, asking it to find the value of "state", for example.

    The tags and attributes can be formally defined via a DTD (Document Type Definition) or a more modern XML Schema.  These allow you to specify the acceptable nesting of tags (street must be inside address), which tags and attributes are optional, default values, data types, etc.  DTDs are not new to XML.  There is a DTD that defines the HTML syntax also.  (In fact, in loose terms, HTML is an instance of XML with a specific set of tags and attributes.)  XML Schemas are a more powerful replacement for DTDs, with several advantages that will be discussed in future tips.

    Don't underestimate the value of XML.  It is the single most important development in the computer field since ASCII text.  In the long run, it will have more impact than even HTML.  HTML is a less than perfect, but very widely adopted standard for marking up documents for display.  It's popularity made the World Wide Web possible.   XML is a less than perfect, but very widely adopted standard for the exchange of data.  As with HTML, the value of the standard is that it is good enough for now, and already wildly popular.  Henceforth, we will always have a standard.  The current version of XML may be replaced soon and often, with successively better standards, but there will never again be a day where there is no standard for exchange of data between different types of computer systems.

    11/18/2011 Update:  JSON is the new XML.

    2/12/2013 Update: Companies are dropping support for XML in favor of JSON:

    --Fred

  2. Differences from HTML syntax

    Last Updated: 2/25/2001
    Applies to:  XML 1.0+

    Here are some ways in which XML is syntactically different from HTML.  Also noted are suggestions for how to write your HTML (even with older browsers) to be more like XML so that it is likely to comply with the emerging XHTML standard.

    1. End tags required.

      HTML allows <br> without </br>.  XML always requires an end tag, though it can be included in the begin tag using the <xxx /> format.

      HTML Suggestion:
        Use the <xxx /> format for tags that don't require an end tag. Be sure to put a space before the slash.  Otherwise, old browsers may ignore the tag entirely.

    2. Properly nested tags.

      HTML allows tags to be improperly nested, as:
          <center><b>...</center></b>.
      XML requires proper nesting.  Each tag must be ended before any enclosing tag is ended, as:
          <center><b>...</b></center>.

      HTML Suggestion:
        Always nest tags properly.

       
    3. Case sensitive.

      XML is case sensitive.  The tag <address> is not the same as the tag <ADDRESS>.  The standard for XML is to use all lowercase letters for tags and attributes.

      HTML Suggestion:
        Always use lowercase.

    4. Quotes required around attribute values.

      HTML allows:
          <tag attribute=value>
      XML always requires quotes (single or double), as:
          <tag attribute="value">
      or:
          <tag attribute='value'>

      HTML Suggestion:
        Always use quotes, even around numbers and percent values.

    5. Attribute values required.

      HTML allows:
          <tag attribute>.
      XML always requires each attribute to have a value, as:
          <tag attribute="value">
      or:
          <tag attribute='value'>)

      HTML Suggestion:
        Provide a dummy value when no value is required. 
      Example:    <option selected='dummyvalue'>   
      instead of:  <option selected>

    6. Whitespace-sensitive.

      HTML parsers (browsers) ignore almost all whitespace.  This includes whitespace between tags, as:
      	<hr>	<p>

      as well as whitespace inside of tags, as:

      	<font    face =  "Ariel"    
      	>

      XML parsers may not tolerate such whitespace, especially linebreaks.

      HTML Suggestion:
        This is a theoretical problem that has never been a problem for me (or anyone I know) in actual practice.  All XML parsers I've used seem relatively whitespace insensitive.  Don't worry about it for now.  Using whitespace for indentation and line breaks to format your HTML more readably is far too valuable to give up without a good reason.

    7. Predefined entities.

      HTML has hundreds of predefined "entities", including:
      	&nbsp;	non-breaking space
      	&copy;	copyright symbol
      	&amp;	ampersand (&)
      	&lt;	less than (<)
      	etc...

      XML has only five predefined entitites:

      	&amp;	ampersand (&)
      	&quot;	double quote (")
      	&apos;	apostrophe (')
      	&lt;	less than (<)
      	&gt;	greater than (>)

      These are exactly the five that you need because they are a fundamental part of the XML syntax.  Since XML is the "eXtensible Markup Language", you can define as many additional entities as you want in your DTD or XML Schema.  If you want to define some of the standard HTML entities, you can simply copy them into your own DTD from the DTD that defines HTML:
          http://www.w3.org/TR/html401/sgml/entities.html

      If you don't want to bother creating a DTD or XML Schema, there is an easier way.   In both XML and HTML, you can use the following to insert any special character via its numeric code:

      	&#nn;	where nn is a decimal number
      	&#xnn;	where nn is a hexidecimal number)

      For example, in both HTML and XML, you can use &#160; instead of &nbsp; and &#169; instead of &copy;  For a complete list of predefined HTML entities and their numeric codes, see:
          http://hotwired.lycos.com/webmonkey/reference/special_characters/

      Thanks to Howard Kapustein for reminding me about this difference.

    --Fred

  3. Tags vs Attributes

    Last Updated: 7/5/2000
    Applies to:  XML 1.0+

    Here are some things to keep in mind when choosing between a tag and an attribute to store a piece of data in XML:

    1. You can usually get away with either. For example:
      	<address city="Tucson" state="AZ"></address>

      or:

      	<address>
      		<city>Tucson</city>
      		<state>AZ</state>
      	</address>
    2. Attributes are the leaves of the tree.  You cannot nest an attribute or a tag inside of an attribute.  Only use an attribute when you are positive that the data is atomic and will never need to be further described.

    3. Attributes don't allow multiple values.  For example, you can write:
      	<parent>
      		<child>Billy</child>
      		<child>Mary</child>
      	</parent>

      but not:

      	<parent child="Billy" child="Mary"></parent>

      and you don't want to get stuck having to parse the multiple values out of a single attribute, as:

      	<parent child="Billy,Mary"></parent>

    --Fred

  4. Escaping special chars

    Last Updated: 7/5/2000
    Applies to:  XML 1.0+

    You can use CDATA in an XML document to escape all special characters in a block of text, rather than using entities for each special character, as:

    	<![CDATA[ this text escaped ]]>

    --Fred

  5. XML Parsers

    1. Microsoft MSXML

      Last Updated: 2/2/2001
      Applies to:  XML 1.0+

      Microsoft includes an XML parser with IE 5.0+. You can also download it from Microsoft at:

              http://msdn.microsoft.com/xml/

      It is an ActiveX component, so you can use it from VB, ASP, IE, etc. 

      Keep in mind however that some aspects of XML (especially XSL) are evolving rapidly, so newer versions are not always compatible with older versions.  For example, MSXML version 2.5 supports an older XSL syntax for sorting, using the order-by attribute of elements like for-each and apply-templates, while the newer MSXML 3.0 supports the newer XSLT syntax, using the sort element.

      --Fred

      1. Using MSXML from VB

        Last Updated: 2/2/2001
        Applies to:  XML 1.0+, MSXML 2.0+, VB5+

        To use MSXML from a VB application, add a reference to your VB project via the Project | References... menu.  The one I use in this sample is: Microsoft XML, version 2.0

        You can then write code like:

            Dim xmlDOM As msxml.DOMDocument
            Set xmlDOM = New msxml.DOMDocument
        
            ' Load XML data from a URL into the XML DOM (Document 
            ' Object Model).
            xmlDOM.async = False
            xmlDOM.Load("some URL that returns an XML stream")
        
            ' Iterate over the XML DOM tree to get the list of items,
            ' loading them into a VB Combo Box.    
            cboItems.Clear
            Dim xmlNode As msxml.IXMLDOMNode
            For Each xmlNode In xmlDOM.childNodes(1).childNodes
                cboItems.AddItem xmlNode.Text
            Next

        You manipulate the objects in the XML DOM like any other objects in VB.  The names of the objects, methods, properties, parameters, etc., pop up automatically as you type, via VB's "Intellisense", just as they do for other VB objects.  To learn more about the objects in the DOM, hit F2 to view them in the VB Object Browser, or read the docs at the Microsoft MSDN Web site, currently:

                http://msdn.microsoft.com/library/psdk/xmlsdk/xml_9yg5.htm

        --Fred

      2. Using MSXML from IE via VBScript

        Last Updated: 2/2/2001
        Applies to:  XML 1.0+, MSXML 2.0+

        You can use MSXML from VBScript code running in IE5.  The code looks like:

            Dim xmlDOM
            Set xmlDOM = CreateObject("msxml.DOMDocument")
        
            ' Load XML data from a URL into the XML DOM (Document 
            ' Object Model).
            xmlDOM.async = False
            xmlDOM.Load("some URL that returns an XML stream")
        
            ' Iterate over the XML DOM tree to get the list of items,
            ' loading them into an HTML SELECT control via DHTML.
            document.all.selItems.length = 0
            Dim xmlNode
            For Each xmlNode In xmlDOM.childNodes(1).childNodes
                Dim optNew
                Set optNew = document.createElement("OPTION")
                optNew.text = xmlNode.childNodes(0).text
                optNew.value = optNew.text
                document.all.selItems.options.add(optNew)
            Next

        Note that you are manipulating 2 document object models here.  The XML DOM manipulations used to get the data are shown in bold, and the DHTML DOM manipulations used to insert the data into the HTML SELECT control in the Web page are underlined.  To learn more about the objects in the DOMs, use the VB Object Browser, or read the docs at the Microsoft MSDN Web site, currently:

                XML DOM:     http://msdn.microsoft.com/library/psdk/xmlsdk/xml_9yg5.htm
                DHTML DOM: http://msdn.microsoft.com/workshop/author/dhtml/reference/dhtmlrefs.asp

        --Fred

      3. Using MSXML from IE via JavaScript

        Last Updated: 2/2/2001
        Applies to:  XML 1.0+, MSXML 2.0+

        You can use MSXML from JavaScript code running in IE5.  The code looks like:

            var xmlDOM = new ActiveXObject("msxml.DOMDocument");
        
            // Load XML data from a URL into the XML DOM (Document 
            // Object Model).
            xmlDOM.async = false;
            xmlDOM.load("some URL that returns an XML stream");
        
            // Iterate over the XML DOM tree to get the list of items,
            // loading them into an HTML SELECT control via DHTML.
            document.all.selItems.length = 0;
            var xmlNodes = xmlDOM.childNodes[1].childNodes;
            for (var xmlNode = xmlNodes.nextNode();
                 xmlNode;
                 xmlNode = xmlNodes.nextNode())
            {
                var optNew = document.createElement("OPTION");
                optNew.text = xmlNode.childNodes[0].text;
                optNew.value = optNew.text;
                document.all.selItems.options.add(optNew);
            }

        Note that you are manipulating 2 document object models here.  The XML DOM manipulations used to get the data are shown in bold, and the DHTML DOM manipulations used to insert the data into the HTML SELECT control in the Web page are underlined.  To learn more about the objects in the DOMs, use the VB Object Browser, or read the docs at the Microsoft MSDN Web site, currently:

                XML DOM:     http://msdn.microsoft.com/library/psdk/xmlsdk/xml_9yg5.htm
                DHTML DOM: http://msdn.microsoft.com/workshop/author/dhtml/reference/dhtmlrefs.asp

        --Fred

  6. DTD Tips

    1. DTD Quick Reference

      Last Updated: 7/5/2000
      Applies to:  XML 1.0+

      1. !ATTLIST ElementName AttrName AttrType AttrDef AttrName AttrType AttrDef
      2. Attribute Type:
        CDATA Character data, no markup
        ID Unique ID value
        IDREF Reference to ID or another element
        ENTITY, ENTITIES Name(s) of external entity
        NMTOKEN, NMTOKENS Only chars valid in a name (letters, digits, periods, dashes, underscores, colons)
        NOTATION Name of a notation
        (this|that) Alternation of literal values
        NOTATION(this|that) Alternation of notation names
      3. Attribute Default:
        #REQUIRED  
        #IMPLIED  
        #FIXED value  
        default_value  
      4. !DOCTYPE Name External_ID
      5. !ELEMENT - Tags and enclosed data.  Example:  <lastname>Stluka</lastname>
      6. Element Structure Symbols:
        | Alternation (one or the other)
        , Sequence (one, then the other)
        ? Zero or one
        <no symbol> Exactly one
        * Zero or more
        + One or more
        () Grouping
        ANY Anything goes
        EMPTY No contents allowed
        #PCDATA Parsed character data (can contain markup)
        #CDATA Unparsed character data (no markup)
      7. !ENTITY - Like a #DEFINE macro in C/C++.
        1. Predefined General Entity
          &amp; Ampersand (&)
          &lt; Less than (<)
          &gt; Greater than (>)
          &apos; Apostrophe (')
          &quote Quote (")
        2. General Entity
          - Defined in DTD; used in document
          - Example:  <!ENTITY amp "&#38;">
          - Example:  <!ENTITY ProductName "Super Duper Editor">
        3. Parameter Entity
          - Defined in DTD; used in DTD
          - Example:  <!ENTITY % Description "This is the description.">
          1. Can be an External Entity -- Evaluates to an URL string, the contents of which are substituted like a #INCLUDE of C/C++.
      8. !NOTATION Name External_ID
        - Identifies an external program to handle some data.
      9. External_ID
        - Example:  SYSTEM "http://.../myname.dtd"
        - Example:  PUBLIC "-//Company//DTD//myname//EN" "http://.../myname.dtd"
      10. Comment - Syntax:   <!-- This is a comment -->
      11. Marked Section:
        1. <![CDATA[ ... ]]>
        2. <![IGNORE[ ... ]]>
        3. <![INCLUDE[ ... ]]>
      12. Tag - Example:  <lastname>
      13. Valid XML - Validated against a DTD
      14. Well-formed XML - Syntactically valid, but not validated against a DTD.

      --Fred

    2. DTD Conditional Compilation

      Last Updated: 7/5/2000
      Applies to:  XML 1.0+

      You can enable and disable sections of your DTD (similar to #IF or #IFDEF in C/C++), as:

      	<!ENTITY % part1 "IGNORE">
      	<!ENTITY % part2 "INCLUDE">
      	<![%part1;[ ... ]]>
      	<![%part2;[ ... ]]>

      The first 2 lines define the entities part1 and part2 to each have the value IGNORE or INCLUDE.  The second 2 lines are each expanded to look like one of the following:

      	<![%IGNORE[ ... ]]>
      	<![%INCLUDE[ ... ]]>

      so that the enclosed DTD statements are enabled or disabled.

      --Fred

  7. See Also

    Last Updated: 7/5/2000
    Applies to:  XML 1.0+

    The following are good sources of info about XML:

    1. St. Laurent, Simon.  XML, A Primer.  Foster City, CA: MIS:Press, 1998.  ISBN 1-55828-592-X.
    2. McLaughlin, Brett.  Java and XML.  O'Reilly.  ISBN 0-596-00016-2.
    3. Box, Don, et. al.  Essential XML, Beyond Markup. Addison Wesley.   ISBN 0-201-70914-7.
    4. Britt, James, et. al.  Professional Visual Basic 6 XML.  Wrox.   ISBN 1-861003-32-3

    --Fred

©Copyright 2000-2021, Bristle Software, Inc.  All rights reserved.