Document Types > Document Types > Understanding DTDs
  
Understanding DTDs
A document type definition (DTD) is the basic building block for constructing authoring applications. It defines the valid elements in the document type and their relationships to one another. The .dtd file contains the DTD.
Arbortext Architect includes a document type editor. The ArbortextDTD Editor is a text-based application designed for creating, editing, and validating SGML or XML document types.
SGML DTDs must be in compliance with the syntax required by ISO 8879. XML DTDs must be in compliance with the syntax required by the XML standard. (Refer to www.w3.org for the latest version of the XML standard.) Arbortext Architect provides an online parser to ensure your DTD is valid. Once you have a valid DTD, you can build an application based on your DTD.
Once the DTD has been parsed without error, you may compile and test the document type.
Following is an example of a DTD:
<!DOCTYPE report [
<!ENTITY % Emph "emph | keyword | wordbox" >
<!ELEMENT report - - (front, (part | chapter), gloss?) >
<!ELEMENT front - o (title | author?) >
<!ELEMENT part - o (title, par+, chapter*) >
<!ELEMENT chapter - o (title, par+, section*) >
<!ELEMENT section - o (title, par+, topic*) >
<!ELEMENT topic - o (title, par+) >
<!ELEMENT gloss - o (keyword, par)* >
<!ELEMENT title - o (#PCDATA) >
<!ATTLIST title Id CDATA #IMPLIED >
<!ELEMENT author - o (#PCDATA) >
<!ELEMENT par - o (#PCDATA | %Emph;)* >
<!ELEMENT emph - o (#PCDATA) >
<!ELEMENT keyword - o (#PCDATA) >
<!ELEMENT wordbox - o (#PCDATA) > ]>
The following table provides information about DTD syntax:
Syntax for Declarations
Markup
Structure
<! … >
A markup declaration.
<! -- -- > -- --
A comment. A markup declaration can consist entirely of a comment, or a comment can be nested within another type of markup declaration.
<!DOCTYPE doc_name [other_declarations] >
This is the Document Type Declaration (document type) which begins the DTD. Anything between the square brackets ([ and ]) is part of the document type named doc_name.
<!ELEMENT element_name minimization(content_model)>
An Element Declaration. Elements translate into tags and tag pairs in Arbortext Editor. Maximum name length is 8 characters (in reference concrete syntax).
<!ATTLIST element_name attribute-name declared_value default>
An Attribute Declaration for a particular element or group of elements. Must always be associated with an element declaration.
<!ENTITY … >
An Entity Declaration. Once it has been declared, an entity can thereafter be referenced by an entity reference. Entity declarations must precede entity references. Entity Declarations are of two types: General entities or Parameter entities. General entities are used for text replacement. Parameter entities are used as substitute model groups.
Entity references provide a short-hand way of referring to a larger body of material or a common piece of information (such as a file, whose exact location on the system may not be known). The entity declaration identifies the entity for future reference.
Element declarations identify the names of tags (elements) and also define their relationships to one another. The name of the element follows the <!ELEMENT part of the declaration. Thus, in the following element declaration bdy is the name of the element that is being declared: <!ELEMENT bdy (part+|chp+)> where (part+|chp+) is the content model. The content model is a list of the subelements allowed within the element being declared (these subelements, in turn, have their own content models describing what is allowed within them).
The following Occurrence Indicators are used in the content model to indicate the number of times an element may occur:
Occurrence Indicators
Symbol
Indicates
+
Required and repeatable. An element or model group must occur one or more times.
*
Optional and repeatable. An element or model group may appear zero or more times.
?
Optional. The element or model group may appear once or not at all.
Otherwise, the element is required and only one occurrence of the element is allowed.
In the following example, chapters are optional. However, if chapters are used, there must be at least a two or more:
<!ELEMENT part - o (title, par+, (chapter, chapter+)?)>
The following symbols are called Ordering Connectors and are used to separate elements and indicate their frequencies. Ordering Connectors indicate the relationship between elements. Several types exist, but only one may appear per model group. Ordering Connectors are used in Element Declarations.
Ordering Connectors
Symbol
Indicates
,
ALL elements occur and in specific order in the document instance.
&
Elements may appear in any order but MUST appear at least once.
|
One element OR the other may appear but not both.
The attribute list names properties associated with an element that typically have nothing to do with the formatted appearance of the element, such as an identifier used for cross-referencing. Attributes appear in the Modify Attributes dialog box for the tag. (To open the Modify Attributes dialog box for a tag, position the cursor after the tag and from the menu bar, choose Edit > Modify Attributes.)
The following table contains a list of definitions and keywords for attribute declared values.
Keywords for Attribute Declared Values
Keyword
Definition
CDATA
Zero or more characters, no markup is recognized other than the delimiters that end the character data.
ENTITY
A unit of information that may be referred to by a symbol in a DTD or in a document instance.
ID
SGML construct that must have a name start character and be unique.
IDREF
A name previously entered as the unique identifier (ID) of another element.
MINIMIZATION
The context of a textual element implies how it should be marked up. Usually end tags may be omitted in these cases. This is called markup minimization.
NAME
Definition of a name, must begin with a name start character (alpha).
NAMES
A list of NAME separated by a space.
NMTOKEN
A name token may start with a digit (numeric) or an alpha character.
NMTOKENS
A list of NMTOKEN separated by a space.
NOTATION
Non-SGML content specified in a NOTATION declaration.
NUMBER
Consists of all digits.
NUMBERS
A list of NUMBER separated by a space.
NUTOKEN
A number token must start with a digit.
NUTOKENS
A list of NUTOKEN separated by a space.
TAG
A symbol delimiting a logical element inside a document. Defined by the SGML standard as descriptive markup . There are start tags and end tags.
The following table contains a list of default values and definitions.
Default Value Definitions
Default Value
Definition
REQUIRED
Attribute value is required (must always be specified).
IMPLIED
Attribute value is implied (optional, value will be supplied by the application if not specified).
CURRENT
Attribute value is current (default becomes the most recent specified value).
The following table contains a list of declared content and definitions.
Declared Content Definitions
Declared Content
Definition
CDATA
Contains only valid SGML characters that do not need further processing.
RCDATA
Contains character references and/or entity references which are resolved to character data.
EMPTY
Content is empty.
The following table contains a list of content and definitions.
Content Definitions
Content
Definition
PCDATA
Contains text which is parsed so that embedded tags and/or references are resolved.
ANY
Means the content is mixed content in which PCDATA and any elements defined in the same DTD are allowed.