Section 6: XML Best Practice
What are valid XML documents?
There are several places in an XML document where it can run into problems. A valid document is one that has no runtime problems, all of the code is properly formatted, and all of the referenced materials are properly defined.
To create a valid XML document, you need to create a DTD… for those who might dislike DTD’s this can be a bit of a hassle, but it’s necessary if the program reading the XML document is going to know what all of the definitions for the elements are and how best to use them.
A valid XML document must also be accompanied by a stylesheet, whether it’s CSS or XSL. This is so that the document can be formatted for the screen or program properly, and so that all of the elements defined by the DTD are used properly.
Open up your preferred text editor or XML editor, and begin a new document. Make your XML declaration, reference a stylesheet (it can be a made-up stylesheet name for this exercise), and then create a short inline DTD (defining at least 3 elements.)
Once you’ve finished this, go ahead and create a brief XML document, choosing one of your elements as a root element and placing some brief content within the other two.
Your finished document should look a little something like this:
Tooter
Black
If you wish, you could then go in and create your stylesheet file so that the document will run as a valid document.
What are well formed XML documents?
Though there are many places where you can run your code together without encountering errors, it tends to make it more difficult to troubleshoot or modify your code later (and makes it extremely difficult for someone else to do anything with your code!) Whenever possible, you should use appropriate white space and indentation to make your code more easily readable.
When creating a well-formed XML document, you don’t need to use DTD’s… the elements are defined by their usage, not by a set of pre-made rules that you declare early in the file. Stylesheets are used to help to define the elements, and are used to format the final work.
The early work that you did with stylesheets and XML documents was an example of creating a well-formed XML document. Open up your editor and go through the same process, but talk about something other than just Tooter and Shade this time. The stylesheet reference needs to be there, but it doesn’t need to be an actual stylesheet at this point.
Your finished document might look a little something like this:
<?xml version=”1.0”?>
<?xml-stylesheet type=”text/css” href=”dogs.css”?>
<dogs_info>
<dogs>Tamerlane, Shadow, and Jake are wonderful dogs!</dogs>
<neighbor_dogs>Of course, Lucy and Sarah are good dogs, too.</neighbor_dogs>
</dogs_info>
All of the defining characteristics of the elements should be defined in dogs.css, just as they were when dealing with out cat examples.
How do you go about validating XML documents?
There are several ways to validate XML documents, both from your own computer and online.
As an example of one way to validate an XML document, open up an internet browser such as Microsoft Internet Explorer. Your browser may have validating tools built in, or you may have to download them from the website of the browser’s creator (in the instance of Internet Explorer, the validation software is a file known as iexmltls.exe, which can be downloaded from Microsoft at http://www.microsoft.com/downloads/. Load your XML document using this tool, and it will scan the code for errors. If the code runs properly without any errors in definitions or layout, then it will be shown validated.
There are also online validation checkers, such as the one located at one of the subpages of the Edinburgh Language Technology Group at http://www.ltg.ed.ac.uk/~richard/xml-check.html. To use this validation technique, you simply input the URL of your XML document and select the appropriate options. The program runs, checking the validation of your document and then displays the results for you.
What are rules for naming elements?
The names that you can use for elements are fairly open, though it’s often best to format the names so that it’s easy to tell what data the element contains as well as easily readable.
The basic rules for naming elements are as follows:
Element names can contain letters, numbers, and special characters
Element names cannot start with a number or punctuation character
Element names cannot start with the letters ‘xml’
Element names cannot contain spaces
For a bit of practice, consider the following incorrect XML names. Using the same general name, create a name that follows the rules above.
<xmlData>
<4Cats>
<?s_for_Bob>
<cat info>
Possible answers to this practice are:
<DataForXml>
<FourCats>
<Questions_for_Bob>
<cat_info>
Note the use of capitalization to make reading the names easier. Also, while special characters can be used, you should avoid the use of the period, the colon, and the hyphen… these can cause errors because the processor thinks that you’re doing something other than just creating a name for an element. You should also be careful when using foreign characters, since it’s possible that the processor might not recognize them.
How do you use attributes?
Attributes can be used in a variety of ways, both to contain data and to help further define the data that’s held by elements. They can provide information that’s not included in the element itself but can make a difference in the way that processors handle the data.
Consider the following example:
<cat name=”Tooter”>Fat black cat</cat>
The attribute, “Tooter” might be used to reference other information, ranging from links, to pictures, to additional information about everyone’s favorite fat black cat.
Quotation marks must always be used with attributes, but it can be either single or double quotes. If the attribute data uses double quotes within it, you should use single quotes for the attribute itself; if it uses single quotes within it, use double for the attribute.
Examples:
<actor name=’Robert “Bob” Chapin’>
<swordmaster name=”Robert ‘Bob’ Chapin”>
Just be sure that you define your attributes before using them later, or else the processor might think that you’re trying to use an invalid element name.
How do you use comments?
Good use of comments can make troubleshooting and reading code much easier. Comments can tell you what each section of elements does, which attributes are used for what, and can also contain copyright and author information.
As an example, let’s refer to the dog example that I used earlier. You might remember that it looked something like this:
<?xml version=”1.0”?>
<?xml-stylesheet type=”text/css” href=”dogs.css”?>
<dogs_info>
<dogs>Tamerlane, Shadow, and Jake are wonderful dogs!</dogs>
<neighbor_dogs>Of course, Lucy and Sarah are good dogs, too.</neighbor_dogs>
</dogs_info>
Using comments, add a little information about the creator of the file, when it was made, and a description of what the dogs_info element is going to contain.
Don’t forget to be careful where you place your comments, and remember to close them when you’re done. The commented version might look something like this:
<!- – Copyright 2005 AnotherPuppyPage.net – ->
<?xml version=”1.0”?>
<?xml-stylesheet type=”text/css” href=”dogs.css”?>
<!- – Information about my favorite dogs – ->
<dogs_info>
<dogs>Tamerlane, Shadow, and Jake are wonderful dogs!</dogs>
<neighbor_dogs>Of course, Lucy and Sarah are good dogs, too.</neighbor_dogs>
</dogs_info>
Someone else reading the source file would then be able to easily determine who created the file, when, and what was coming without having to read all of the information contained within the root element.