Learn XML Programming

Section 5:  Document Type Definitions

What are Document Type Definitions, or DTD’s?

Document Type Definitions, also known as DTD’s, are one of the more common methods of working with XML.  By using DTD’s, it’s possible to create specific definitions for each component of an XML document.  DTD’s define a common set of elements and attributes, allowing you to refine the way that your XML documents operate.

There are two different ways that you can work with DTD’s.  The first is by using inline DTD’s, or DTD’s that are created within the document itself.  These DTD’s are listed at the beginning of the document, and help to further define the elements contained within the root element.  An example of this would look like:

<!doctype cats_info [

<!element cats_info (#pcdata)>

Note the use of exclamation points and brackets, and how the element is imbedded within the brackets of the doctype declaration.  The #pcdata is a reserved XML keyword which is used to let the browser know that cats_info can only contain characters.

The second way to create a DTD is to create an external DTD, a separate file that contains all of your DTD data and is referenced from within the XML file (much like CSS and XSL files are referenced.)

The file that you create will be saved with a .dtd extension, such as cats.dtd.  You will then need to reference it from your XML file, using the declaration, as well as the system keyword.  Your reference declaration will look something like this:

You would then create and declare your elements and the rest of your document as you did before.

Note the use of exclamation points and brackets, and how the element is imbedded within the brackets of the doctype declaration.  The is a reserved XML keyword which is used to let the browser know that can only contain characters.

What is DTD Validation?

Validation is the process by which a browser uses a DTD to determine if the data being used by an element is appropriate for it to use.  You’ve already seen a little bit of DTD validation in the example above… the reserved keyword #pcdata is used for validation, making sure that the element it’s used to define contains only characters.  If the element contains anything other than character data, then the DTD will report an error to the browser and the page won’t be displayed.

Here are other examples of validating rules that are used in DTD’s:

(#pcdata)* Indicates that the element can only contain zero or more characters.

(ElementOne) Indicates that the element contains one instance of ElementOne.

(ElementOne+) Indicates that the element contains one or more instances of ElementOne.

(ElementOne?) Indicates that the element contains zero or more instances of ElementOne.

(ElementOne, ElementTwo) Indicates that the element contains one instance each of ElementOne and ElementTwo.

(ElementOne | ElementTwo) Indicates that the element contains one instance of ElementOne or it contains one instance of ElementTwo.

(#pcdata |ElementOne)* Indicates that the element consists of either multiple characters or one instance of ElementOne.  Note that #pcdata needs to appear first.

EMPTY Indicates that the element contains no content.

ANY Indicates that the element may contain any content.  Note that this should only be used when creating and checking documents, and shouldn’t appear in the final product (as it opens too many possibilities for errors.)

What is a Document Type Declaration?

The Document Type Declaration is something that you should be familiar with as well.  In the previous examples where you encountered <!doctype> and <!element>, those are the beginnings of document type declarations.  All that a document type declaration does is let the browser know that you’re defining a part of a document, and gives it an idea of what part of the document you’re defining.

While document type declarations are used to define the rules for a root element in a DTD as well as the DTD itself, they can be used in other circumstances as well.  If you have several elements that are contained within another element, you can use a document type declaration to declare that they’re contained elements.

In cats.xml, you created the elements cats and friends within the root element, cats_info.  If all of that was a part of a larger document, though, you might want to make cats and friends as elements contained within cats_info, which would then be a container element instead of a root element.  The declaration for your DTD to connect these three elements would look something like this:

<!element cats_info (cats, friends)>

Notice that the contained elements are in parenthesis, and are separated by commas.  Elements with namespaces can be contained in this manner as well, but need to be referred to by their full names… in other words, you’d use the following instead:

<!element cats_info (my_cats:cats, my_cats:friends)>

Any number of elements may be contained in the group, and they don’t necessarily have to be declared in the same order as they appear in the XML document.  For the sake of consistency, though, it’s a good idea… that way, you don’t have to keep searching because they’re listed out of order in the DTD.

What is a PUBLIC source specifier?

Source specifiers are fancy terms for the bit of code that tells your XML document where to find what it’s looking for.  If you use the PUBLIC source specifier, then you’re referencing an external DTD that’s held somewhere other than on your system.  The PUBLIC keyword lets your program know that it’s going to be searching elsewhere for the information that it wants, and the source that you provide (usually in URL form) tells it where to go to find it.

As an example, let’s take the previously-mentioned cats.dtd as a source.  You may remember that it looked a little something like this:

<!doctype cats_info SYSTEM “cats.dtd”>

Unfortunately, for this scenario we’ll assume that you’re working with someone else, and they have the .dtd file on their system.  Looks like it’s the PUBLIC keyword to the rescue!

<!doctype cats_info PUBLIC “cats.dtd” “”>

Notice that the full URL for the resource was given, so that the program would know exactly where to look for the DTD file.

What is a SYSTEM source specifier?

If you’re using a SYSTEM source specifier, then you’re simply including a reference to a .dtd file or other resource that’s located on the same system as the file that the browser’s reading from.  As with many of the items in this part of the tutorial, you’ve already seen this; a SYSTEM source specifier was used in the initial example of an external DTD.

When declaring a source using SYSTEM, then it’s initially assumed to be in the same directory as the XML document that references it.  Take the example from above:

<!doctype cats_info SYSTEM “cats.dtd”>

Whatever folder or directory the cats.xml file is in, that’s the folder or directory that it’s going to search for the cats.dtd file.  This can cause problems if the file is in a subdirectory… that’s why you always need to make sure that you spell out the exact location of the file if you don’t have them both in the same directory.  If you keep all of your .dtd files in a subdirectory named DTD, you need to specify it.

<!doctype cats_info SYSTEM “/DTD/cats.dtd”>

You don’t need to put the entire path in there, though… just the path from the current directory to the one that contains your files.  (If the .dtd file is contained in another directory that’s not directly connected with the one your XML file is in, then you need to supply the entire path for it… and you also might want to consider using subdirectories to make things a little easier.)

How do you declare entities?

One useful thing that you can do with DTD’s is the creation of entities.  An entity can exist in various forms within XML… one type of entity, known as a parameter entity, is a piece of text that can be referenced repeatedly without having to retype it every time.  It’s kind of like being able to create your own form of shorthand within the DTD, so that you can use a much smaller letter combination to refer to a larger string of text.

As an example, let’s take the names of our cats, Tooter and Shade.  Instead of manually typing “Tooter and Shade” every time we encounter the pairing like that in our DTD, we can create the entity, TS, to represent the longer phrase.  The syntax for doing this looks a little something like this:

<!ntity % TS “Tooter and Shade”>

Now, every time we need to have “Tooter and Shade” appear within our DTD, we can simply substitute TS in its place and the browser will know what we mean.  Note the exclamation point, the percentage sign, and the quotes are all needed for the code to work correctly.

Other entities known as general entities can be used to define other types of text or data.  For instance, if we’re wanting to reference a picture of our favorite felines throughout different parts of a larger document, we might want to create an entity that would represent that picture.  The declaration will look a little different, but the end result will be the same… having a smaller object that we can substitute for a larger filename.  Here’s what it would look like:

<!entity PIC SYSTEM “tooter_and_shade.gif” ndata gif>

The ndata gif part of the entity declaration lets the code know that it’s dealing with a .gif image, and the SYSTEM tag tells it to search the system folder or directory for a picture named tooter_and_shade.gif.  Whenever the document encounters the entity named PIC afterwards, it will know to substitute the picture in the entity’s place.