Whats a DTD and why do I need one (or not)?
One of the keys to XML (and SGML) is the Document
Type Definition (DTD), the specification for a particular class of
documents. A DTD says what types of elements are meaningful for his
type of document, and how they fit together. DTDs are important
because they provide a basis for creating and managing whole groups of
documents of the same type in a uniform way. This improves the quality
of your information, because at least one source of error or confusion
has been removed.
HTMLs use of DTDs was fairly static (see HTML DTDs) but in XML it is different: you
actively choose whether or not to use a DTD. If your editing software is smart enough
(or if you are accurate enough), you can omit any DTD and fabricate
the markup on the fly to suit the occasion. It just has to follow the
rules in the XML specification for DTD-less markup, known as
standalone XML, so that browsers can read it without error.
This is fairly straightforward:
|
For HTML, several different DTDs evolved (with associated specifications: 2.0, 3, 3.2, 4.0, Pro,
etc), but they all defined broadly the same basic
underlying markup, with the later ones adding more recent
features.
Because current browsers are hard-wired to interpret
only HTML markup and nothing else, silently ignoring markup they
dont recognize, HTML DTDs tend to be used only by software which
performs full SGML checking of the markup (validation), such as
SoftQuads
HoTMetaL editor, or other SGML-based formatting,
conversion, or data management software.
If your pages dont need checked HTML, or if they
have to use the features of a particular editor which generates it, or
if you need some private markup recognized by a specific browser, then
the answer to the question is that you probably dont want or
need any DTD. |
For DTD-less XML
- All elements must have start-tags and end-tags even if
theres nothing between them (you cant omit things like
</P>!).
- In the special case of truly empty elements
(i.e., those that do not possess an end-tag at all)
you can use the special form of abbreviation if you wish: a start tag
with a trailing slash, like
<BR/>).
- Elements cannot overlap (same rule as for HTML): they
must be nested inside one another (so
<B>bold
<I>italic</B></I> is an error: it must be <B>bold
<I>italic</I></B>).
- You must put all attribute values in quotes, (e.g.,
<link
to="http://www.foo.bar/">) and you cant have default
values or automated type-recognition like NUMBER or
ID.
- You must use
< and & for < and
&.
- You have to tell XML processors that they
should not expect a DTD anywhere, and if theres any special processing or
formatting, you need to provide a stylesheet:
<?xml version="1.0" standalone="yes"?>
<?xml-style href="quickmessage.xsl" type="text/xsl"?>
<message stamp="1998-08-18T11:32:45.26+0000">
<to address="mike@foo.com">Mike</to>
<text>Are you free for lunch at 1.00pm today?</text>
<from address="pete@foo.com">Peter</from>
</message>
But if you are going to create the same type of
document again and again (and most people do seem to want to do this),
its a lot easier if you use a standard structure (and vary the
appearance with a stylesheet if you wish). A DTD provides this
structure.
But youre not constrained
to using an existing DTD: you can write your own, which is why so
many groups of potential users are busy producing them. The example
above represents a DTD which would look something like
this:
<!ELEMENT message (to,text,from)>
<!ATTLIST message stamp CDATA #REQUIRED>
<!ELEMENT to (#PCDATA)>
<!ATTLIST to address CDATA #REQUIRED>
<!ELEMENT text (#PCDATA)>
<!ELEMENT from (#PCDATA)>
<!ATTLIST from address CDATA #REQUIRED>
The declaration for each element type provides an
expression which models what it may contain: usually either more
element types (here to,text,from), or Parsed
Character Data (text), or possibly a mixture. Attribute lists are
declared giving an attribute type and default status (required,
implied, etc, or an actual default value). Any XML
system using this therefore knows which elements go where and
can use their names for matching with a style in a stylesheet, or
performing a search, or controlling your editor.



