Internet

Learn XML Programming

How do you declare empty elements?

Sometimes, you need to create an element that doesn’t have anything in it at first, so that data can be placed into it by either an end user or by the XML program itself.  The problem with this is that you have to create the element, but not define any data within it… something that’s not easily accomplished in a lot of programming languages.  Luckily, XML is prepared to handle this situation with the appropriately-named EMPTY validation rule for elements.

You may remember seeing EMPTY on the list of validation rules earlier… it might even have struck you as a bit odd to see something that created elements with nothing in them.  Elements created this way often have attributes associated with them that have meaning and values, thus using the empty element as somewhat of a placeholder for the other attributes.

To create an empty element, use the following format:

<!element cats_info EMPTY>

Wherein cats_info would be the element that you were creating to be empty.

What is some of the jargon used with entities?

Different jargon is used with various entities to effect the processing of XML codes in different ways.  Attribute lists can be created for entities dealing with multiple attributes, and data referenced by entities can be either required or implied.

To create an attribute list, begin by declaring the name of the list, like so:

<!attlist cat_characteristics

The attlist stands for attribute list, and cat_characteristics is the name of the list.  From there, it’s time to begin listing attributes.

<!attlist cat_characteristics

hair_length

hair_color

name

>

Note that the closing > was placed on a line of its own. This makes things a lot easier if you have to add or modify any of the listings… just make sure that you don’t forget it.

As was mentioned a moment ago, data can also be either required or implied.  What this means is that the data may or may not be present when the XML file first begins to run, and through use of the #required and #implied tags you can create rules to let the program know whether it needs the information at onset or not.  Required means that the data is present when it begins to run and thus is available for use, and implied means that it might not be present so it shouldn’t try to access it until it’s been defined elsewhere.

Using these to define rules for attributes would look like this:

<!attlist cat_characteristics

hair_length #implied

hair_color #required

name #required

>

This way, the attributes hair_color and name are defined somewhere, and therefore are required… hair_length, on the other hand, might not be, so it shouldn’t be accessed yet.

What is the ANY content model?

As mentioned in a note earlier, the ANY content model can cause problems if not handled correctly.  Basically, it sets the rule for your element or attribute to “anything goes”… meaning that the element will accept chunks of text, pictures, or even references to other programs without caring what they are.  This is fine for testing or initial programming, but it can lead to problems down the line if the element or attribute that’s been set to “any” is working in conjunction with other elements, attributes, or programs that require specific types of data.  If not used carefully, it might try to send picture data to an element that’s working with text only… and then everything comes crashing down.

If you want to use the ANY model, though, the format is very simple.  Going back to the previous example in which cats_info was declared as a text element, substituting ANY for the text declaration will give you the following:

<!doctype cats_info [

<!element cats_info (ANY)>

]>

When your DTD encounters this, it will know that cats_info is an element, and it will accept any type of data.  Just remember to use caution when putting elements and attributes at this setting.

What are repeated elements, and how are they handled?

Sometimes, you’ll have a certain element that you want to use several times… each instance might have slightly different data in it, but it all falls into more or less the same type of data and covers the same topic.  Instead of having to list a variety of different elements to enter all of your data, you might want to instead use a repeated element.

A repeated element is just what it sounds like… an element that’s repeated and used several times within the same document.  If you were working on an XML document and wanted to list the names of your cats and the cats of your friends, you wouldn’t necessarily have to list each cat as a separate and distinct element… as an example, consider the following:

<cats_info>

<cat>Tooter</cat>

<cat>Shade</cat>

<cat>Loki</cat>

<cat>Sally</cat>

</cats_info>

Even though each instance contains different data, the repeating element cat is used for each one.  Just make sure that you don’t repeat the same data in two places… it can cause an error because repeating elements need to have unique data so that the XML document can tell them apart.

What about mixed content?  How is it used and handled?

Mixed content in DTD’s can present problems, but luckily the problems are easily avoidable.  All that it takes is the use of DTD element definitions to set up rules for how the content should be handled.

Consider the following scenario.  You’re defining the cats element, but in some instances there is just going to be a string of text and in others there will be clustered elements to go along with it.  This could cause problems, because most programs will be expecting either one or the other… unless they know that it’s coming beforehand.  That’s where your list of repeating elements comes into play.  Consider the following code:

<!ELEMENT cats (name+)>
<!ELEMENT name (#PCDATA | color)*>
<!ELEMENT color (#PCDATA)>

What you’re seeing here is the definition for the element, cats.  When your program reaches cats, it sees that there is going to be one or more instances of the element name inside of cats.  (Remember the previous listing… that’s what the + means.)  When it gets to name, then, it sees that name can either consist of a string of text, or an instance of the element, colorColor, then, is defined as a string of text… though it doesn’t have to appear in the place of name, since it has a definition all its own.

Of course, color could be defined in other ways… perhaps you want to have a picture appear, or have a more complex element.  Maybe color will have several options available to it, as well.  If you wanted, you could create a separate element for each possible color, and have all of them listed as options for the color element.  It’s entirely up to you.