Internet

Learn XML Programming

How is the XML declaration used?

Ok, I’m going to let you in on a secret.  The XML declaration, just like the HTML declaration in HTML documents, is actually optional.  It doesn’t have to be there… many programs can still handle XML data fine even if it’s not.  So why should you even include it?

The answer is simple, really… the declaration makes sure that whatever encounters the file knows that it’s XML.  If it’s a human user looking at the file, then they know what they’re seeing and which version of it they’re going to be dealing with.  If it’s a program, it lets it know which language it needs to process in just in case the program didn’t automatically use XML.  Besides, it’s just one line… that’s not a lot of input for a guaranteed declaration that you’re using XML.

Of course, there are different ways to use your XML declaration… the one that’s been referenced here is just the most basic.  In all, there are 4 different options depending upon the information that you want to pass on:

<?xml version=”1.0″?>

<?xml version=”1.0″ encoding=”UTF-8″?>

<?xml version=”1.0″ encoding=”ISO-8859-1″ standalone=”no”?>

<?xml version=”1.0″ standalone=”yes”?>

Notice that all four options tell the number of the version of XML that’s being used.  The second option introduces encoding, which tells the character set that the document is written in.  The third lists the encoding (notice that there is more than one option), as well as introducing the standalone option… a way to let the reader or program know whether or not the XML document needs an external DTD for processing.  And of course, the fourth version omits the encoding but keeps the standalone information.

So what does all of this mean in regards to creating XML?  Not a lot, really.  The basic declaration is all that you’ll need for most of your XML projects… while you can put in additional information if you wish, most programs and users would be able to tell rather quickly whether or not you were using a different character set or if there was an external DTD needed.  If you really want to be thorough, though, feel free to use one of the more elaborate declarations… after all, if you’ve got the information in the declaration then you know that you’ve covered all of your bases.

How do you go about including processing instructions?

Processing instructions are simply snippets of code that instruct the program accessing your document on where to go to find additional information needed to process the document.  You’ve been doing this already… every time you declare a stylesheet, as in the example of:

<?xml-stylesheet type=”text/css” href=”cats.css”?>

you’re using processing instructions.  When the browser or program reaches that line, the XML document is basically telling it, ”Hey!  There’s more information that you need in order to do this right… you need a stylesheet.  It’s a CSS file, written in text, and it’s located over there at cats.css.”

Look back through some of the examples that you’ve seen and some of the work that you’ve done.  See the processing instructions?  Even when you’re referencing external DTD’s you use processing instructions.  All that they do is send the program somewhere else, so that it can gather the information that it needs and then come back.

What are non-permissible characters in XML?

There aren’t many non-permissible characters in XML, but should you use one then it can cause serious problems with your data.  The main problems that arise from the use of non-permissible characters is often associated with incorrect usage of punctuation.

As mentioned previously, periods can’t be used in the naming of XML elements, along with hyphens and colons.  The use of these characters makes the program or browser that’s working with the XML document think that something else is going on… either you’re listing a filename, you’re attempting to perform an advanced function within an element name, or you’re assigning a namespace to the element that hasn’t been defined.  In any of these instances, you’ll end up getting an error returned.

Other characters that can’t be used in the naming of elements for (hopefully) obvious reasons include <, >, [, ], {, }, as well as the comma itself.  As a general rule, it’s best to avoid punctuation except for where it’s part of the code or contained within an element, or else some generally bad and unwanted effects can occur.

What are XML errors, and what are fatal errors?

Nobody wants to encounter errors in their work, but it’s going to happen eventually.  Perhaps you forgot to close a tag, or misspelled a word… regardless, there’s an error somewhere in the document.

Errors come in two basic types… your general errors, and your fatal errors.  Don’t let the word “fatal” get you upset; as a general rule, the word “fatal” in the computer world translates into “mildly uncomfortable” in the real world.  It’s not a good thing to have a fatal error, but at least it can be corrected.

General errors occur because there’s something out of place and the processing program doesn’t like it.  Often a message will pop up on the screen to let you know that there are errors on the page (and you usually have a chance to view them), and the page will display as normal (though some of the formatting may be incorrect due to the error.)  If you encounter an error, don’t panic… simply check (if you can) to see where the error occurs, and go into the code to try to fix it.  If the error came from an external file and became a true error once it hit the XML code, this might take a little bit of work… tracing errors back to their origin can sometimes be quite taxing.  Luckily, you can use the document that was displayed when the error was reported as a guide to tell whether or not the error is in the formatting or somewhere else.

A fatal error, on the other hand, means that something happened that can’t be easily handled by the processor.  The attempt to use the document has been terminated (hence, the term “fatal”) and you need to fix it before the data in the document can be used.

This is where it gets tricky… fatal errors are often much harder to trace back to their source, though they’re usually caused by an easily-repairable mistake.  You have no messed-up document to work from, though you may have a line number or a code snippet that was displayed with your error message.  The best thing that you can do with a fatal error is to try to fix it a little at a time… go into the code to the approximate area where the error occurred, fix what you think caused it, and attempt to reload the page.  If it loads, then you’ve most likely fixed the problem… if it doesn’t, move on to the next possible problem and repeat the process.  Sometimes it’s a problem that requires you to completely recode a portion of your document (or perhaps all of it!), but most errors can be easily repaired once you locate them.

It’s the finding them that can be the tricky part.