This page is offered as a service of Bristle Software, Inc. New tips are sent to an associated mailing list when they are posted here. Please send comments, corrections, any tips you'd like to contribute, or requests to be added to the mailing list, to tips@bristle.com.
Original Version: 5/11/1999
Last Updated: 2/12/2013
Applies to: XML 1.0+
XML (eXtensible Markup Language) is syntactically similar to HTML (HyperText Markup
Language). They both consist basically of regular text marked up with tags and
attributes. However, the purposes of XML and HTML are very different. The
purpose of HTML is to describe the layout and physical appearance of the embedded text, to
be displayed as a Web page. The purpose of XML is to describe the structure and
semantics of the embedded text, to be manipulated programmatically as data. With
HTML, you use the one set of predefined tags with their predefined display-oriented
meanings. With XML, you choose a set of tags that are in common use in your
application area, or invent your own set.
Prior to XML, you could pass data between programs in proprietary formats, or as
databases, or as CSV (comma-separated values) files. The producer and consumer of
the data generally needed a common understanding of how the data was packaged so that the
consumer could correctly interpret what the producer generated. To send an address,
there had to be agreement about how to recognize the street, city, state, etc. as parts of
the address. For example, it might be agreed that the 3rd field in a CSV file was
the state, as:
"123 Main St", "Malvern", "PA", "19355"
With XML, the data stream is self describing:
<address> <street>123 Main St</street> <city>Malvern</city> <state>PA</state> <zip>19355</zip> </address>
This structured approach lends itself it reusable parsing tools, automatic data transformations, debugging tools, and other reuse opportunities. No longer is the consumer required to write a parser to interpret the incoming data stream. It simply uses an existing XML parser, asking it to find the value of "state", for example.
The tags and attributes can be formally defined via a DTD (Document Type Definition) or a more modern XML Schema. These allow you to specify the acceptable nesting of tags (street must be inside address), which tags and attributes are optional, default values, data types, etc. DTDs are not new to XML. There is a DTD that defines the HTML syntax also. (In fact, in loose terms, HTML is an instance of XML with a specific set of tags and attributes.) XML Schemas are a more powerful replacement for DTDs, with several advantages that will be discussed in future tips.
Don't underestimate the value of XML. It is the single most important development in the computer field since ASCII text. In the long run, it will have more impact than even HTML. HTML is a less than perfect, but very widely adopted standard for marking up documents for display. It's popularity made the World Wide Web possible. XML is a less than perfect, but very widely adopted standard for the exchange of data. As with HTML, the value of the standard is that it is good enough for now, and already wildly popular. Henceforth, we will always have a standard. The current version of XML may be replaced soon and often, with successively better standards, but there will never again be a day where there is no standard for exchange of data between different types of computer systems.
11/18/2011 Update: JSON is the new XML.
2/12/2013 Update: Companies are dropping support for XML in favor of JSON:
--Fred
Last Updated: 2/25/2001
Applies to: XML 1.0+
Here are some ways in which XML is syntactically different from HTML. Also noted are suggestions for how to write your HTML (even with older browsers) to be more like XML so that it is likely to comply with the emerging XHTML standard.
<hr> <p>
as well as whitespace inside of tags, as:
<font face = "Ariel" >
XML parsers may not tolerate such whitespace,
especially linebreaks.
HTML Suggestion: This is a theoretical problem that has never been a problem for
me (or anyone I know) in actual practice. All XML parsers I've used seem relatively
whitespace insensitive. Don't worry about it for now. Using
whitespace for indentation and line breaks to format your HTML more
readably is far too valuable to give up without a good reason.
non-breaking space © copyright symbol & ampersand (&) < less than (<) etc...
XML has only five predefined entitites:
& ampersand (&) " double quote (") ' apostrophe (') < less than (<) > greater than (>)
These are exactly the five that you need because
they are a fundamental part of the XML syntax. Since XML is the "eXtensible
Markup Language", you can define as many
additional entities as you want in your DTD or XML Schema. If you want to define
some of the standard HTML entities, you can simply copy them into your own DTD from the
DTD that defines HTML:
http://www.w3.org/TR/html401/sgml/entities.html
If you don't want to bother creating a DTD or XML Schema, there is an easier way. In both XML and HTML, you can use the following to insert any special character via its numeric code:
&#nn; where nn is a decimal number &#xnn; where nn is a hexidecimal number)
For example, in both HTML and XML, you can use  
instead of and © instead of ©
For a complete list of predefined HTML entities and their numeric codes, see:
http://hotwired.lycos.com/webmonkey/reference/special_characters/
Thanks to Howard Kapustein for reminding me about this difference.
--Fred
Last Updated: 7/5/2000
Applies to: XML 1.0+
Here are some things to keep in mind when choosing between a tag and an attribute to store a piece of data in XML:
<address city="Tucson" state="AZ"></address>
or:
<address> <city>Tucson</city> <state>AZ</state> </address>
<parent> <child>Billy</child> <child>Mary</child> </parent>
but not:
<parent child="Billy" child="Mary"></parent>
and you don't want to get stuck having to parse the multiple values out of a single attribute, as:
<parent child="Billy,Mary"></parent>
--Fred
Last Updated: 7/5/2000
Applies to: XML 1.0+
You can use CDATA in an XML document to escape all special characters in a block of text, rather than using entities for each special character, as:
<![CDATA[ this text escaped ]]>
--Fred
Last Updated: 2/2/2001
Applies to: XML 1.0+
Microsoft includes an XML parser with IE 5.0+. You can also download it from Microsoft at:
http://msdn.microsoft.com/xml/
It is an ActiveX component, so you can use it from VB, ASP, IE, etc.
Keep in mind however that some aspects of XML (especially XSL) are evolving rapidly, so newer versions are not always compatible with older versions. For example, MSXML version 2.5 supports an older XSL syntax for sorting, using the order-by attribute of elements like for-each and apply-templates, while the newer MSXML 3.0 supports the newer XSLT syntax, using the sort element.
--Fred
Last Updated: 2/2/2001
Applies to: XML 1.0+, MSXML 2.0+, VB5+
To use MSXML from a VB application, add a reference to your VB project via the Project | References... menu. The one I use in this sample is: Microsoft XML, version 2.0
You can then write code like:
Dim xmlDOM As msxml.DOMDocument Set xmlDOM = New msxml.DOMDocument ' Load XML data from a URL into the XML DOM (Document ' Object Model). xmlDOM.async = False xmlDOM.Load("some URL that returns an XML stream") ' Iterate over the XML DOM tree to get the list of items, ' loading them into a VB Combo Box. cboItems.Clear Dim xmlNode As msxml.IXMLDOMNode For Each xmlNode In xmlDOM.childNodes(1).childNodes cboItems.AddItem xmlNode.Text Next
You manipulate the objects in the XML DOM like any other objects in VB. The names of the objects, methods, properties, parameters, etc., pop up automatically as you type, via VB's "Intellisense", just as they do for other VB objects. To learn more about the objects in the DOM, hit F2 to view them in the VB Object Browser, or read the docs at the Microsoft MSDN Web site, currently:
http://msdn.microsoft.com/library/psdk/xmlsdk/xml_9yg5.htm
--Fred
Last Updated: 2/2/2001
Applies to: XML 1.0+, MSXML 2.0+
You can use MSXML from VBScript code running in IE5. The code looks like:
Dim xmlDOM Set xmlDOM = CreateObject("msxml.DOMDocument") ' Load XML data from a URL into the XML DOM (Document ' Object Model). xmlDOM.async = False xmlDOM.Load("some URL that returns an XML stream") ' Iterate over the XML DOM tree to get the list of items, ' loading them into an HTML SELECT control via DHTML. document.all.selItems.length = 0 Dim xmlNode For Each xmlNode In xmlDOM.childNodes(1).childNodes Dim optNew Set optNew = document.createElement("OPTION") optNew.text = xmlNode.childNodes(0).text optNew.value = optNew.text document.all.selItems.options.add(optNew) Next
Note that you are manipulating 2 document object models here. The XML DOM manipulations used to get the data are shown in bold, and the DHTML DOM manipulations used to insert the data into the HTML SELECT control in the Web page are underlined. To learn more about the objects in the DOMs, use the VB Object Browser, or read the docs at the Microsoft MSDN Web site, currently:
XML DOM: http://msdn.microsoft.com/library/psdk/xmlsdk/xml_9yg5.htm
DHTML DOM: http://msdn.microsoft.com/workshop/author/dhtml/reference/dhtmlrefs.asp
--Fred
Last Updated: 2/2/2001
Applies to: XML 1.0+, MSXML 2.0+
You can use MSXML from JavaScript code running in IE5. The code looks like:
var xmlDOM = new ActiveXObject("msxml.DOMDocument"); // Load XML data from a URL into the XML DOM (Document // Object Model). xmlDOM.async = false; xmlDOM.load("some URL that returns an XML stream"); // Iterate over the XML DOM tree to get the list of items, // loading them into an HTML SELECT control via DHTML. document.all.selItems.length = 0; var xmlNodes = xmlDOM.childNodes[1].childNodes; for (var xmlNode = xmlNodes.nextNode(); xmlNode; xmlNode = xmlNodes.nextNode()) { var optNew = document.createElement("OPTION"); optNew.text = xmlNode.childNodes[0].text; optNew.value = optNew.text; document.all.selItems.options.add(optNew); }
Note that you are manipulating 2 document object models here. The XML DOM manipulations used to get the data are shown in bold, and the DHTML DOM manipulations used to insert the data into the HTML SELECT control in the Web page are underlined. To learn more about the objects in the DOMs, use the VB Object Browser, or read the docs at the Microsoft MSDN Web site, currently:
XML DOM: http://msdn.microsoft.com/library/psdk/xmlsdk/xml_9yg5.htm
DHTML DOM: http://msdn.microsoft.com/workshop/author/dhtml/reference/dhtmlrefs.asp
--Fred
Last Updated: 7/5/2000
Applies to: XML 1.0+
CDATA | Character data, no markup |
ID | Unique ID value |
IDREF | Reference to ID or another element |
ENTITY, ENTITIES | Name(s) of external entity |
NMTOKEN, NMTOKENS | Only chars valid in a name (letters, digits, periods, dashes, underscores, colons) |
NOTATION | Name of a notation |
(this|that) | Alternation of literal values |
NOTATION(this|that) | Alternation of notation names |
#REQUIRED | |
#IMPLIED | |
#FIXED value | |
default_value |
| | Alternation (one or the other) |
, | Sequence (one, then the other) |
? | Zero or one |
<no symbol> | Exactly one |
* | Zero or more |
+ | One or more |
() | Grouping |
ANY | Anything goes |
EMPTY | No contents allowed |
#PCDATA | Parsed character data (can contain markup) |
#CDATA | Unparsed character data (no markup) |
& | Ampersand (&) |
< | Less than (<) |
> | Greater than (>) |
' | Apostrophe (') |
"e | Quote (") |
--Fred
Last Updated: 7/5/2000
Applies to: XML 1.0+
You can enable and disable sections of your DTD (similar to #IF or #IFDEF in C/C++), as:
<!ENTITY % part1 "IGNORE"> <!ENTITY % part2 "INCLUDE"> <![%part1;[ ... ]]> <![%part2;[ ... ]]>
The first 2 lines define the entities part1 and part2 to each have the value IGNORE or INCLUDE. The second 2 lines are each expanded to look like one of the following:
<![%IGNORE[ ... ]]> <![%INCLUDE[ ... ]]>
so that the enclosed DTD statements are enabled or disabled.
--Fred
Last Updated: 7/5/2000
Applies to: XML 1.0+
The following are good sources of info about XML:
--Fred
©Copyright 2000-2021, Bristle Software, Inc. All rights reserved.