4. Essential OEB Elements
You now know the minimum structural requirements of creating a publication in OEB format. You're beginning to understand some of the reasoning behind latest directions of information storage formats such as XML, which OEB is based on. Right now, you could place almost any work in a few OEB documents and an OEB package, and it would at least be legal OEB and work with a compliant OEB reading system. You also know how to create style sheets and associate them with your documents.
What you've learned so far will work fine for your one-paragraph masterpiece. But once you try to use OEB to create a slightly longer sequel, you'll probably very soon run into some very real needs: "How do I display lists?" "How do I change the font?" "How so I create links from one section of my book to another?" The OEB Publication Structure has some very real answers to these questions. Even better yet, since OEB is built upon XML, the following chapters will show you how to create your own answers in areas that OEB does not yet address.
You've had a glance at a few elements defined by OEB, such as <p>
, <em>
and <dfn>
. OEB has a substantial number of other tags already defined for you to use; obviously, you'll need to know at least some of the others to do anything useful with OEB. We'll now look at essential elements and styles that you'll need in day-to-day use of OEB. If you've already started putting together an OEB publication, hopefully we'll answer some of the issues you've already encountered. Otherwise, we'll get a lot of common questions out of the way and prepare you for the real-world OEB creation project found in the next chapter.
It's important to note that the OEB elements discussed here are a basic set of elements defined by OEB. OEB is flexible enough to allow extended documents that include user-defined elements, and this process is likely to be enhanced even more in future OEB versions. Creating user-defined elements is beyond the scope of this current edition, but will likely be addressed in an upcoming version of this book.
Inline Elements
Although XML allows any number of elements to be defined, OEB has defined a certain set of elements to be used in documents. You've seen and used a few of those, such as <p>
, <em>
and <dfn>
. Besides defining which tags can be used, OEB also specifies the location and contexts in which these elements can appear. You've probably understood intuitively, for example, that emphasized text goes inside a paragraph, like this:
<p>
This is<em>
emphasized</em>
text.</p>
You probably didn't even consider putting a paragraph inside of emphasized text:
Illegal!<p>
This is<em>
emphasized text with a<p>
paragraph</p>
inside</em>
the emphasized text.</p>
As you might have guessed, this sort of construction is not allowed. The elements <em>
and <dfn>
can only appear inside paragraphs (and lists and other similar elements), and are therefore considered inline elements. Those are the elements we'll examine here.
The <em>
Element
You've already seen the <em>
element — perhaps more than you've wanted. It bears repeating that the <em>
element should be used instead of the <i>
(italics) element in most cases, designating that the text should be emphasized but not specifying how the emphasized text should appear. An example of how specific text could be emphasized might be this:
<p>
Although the venture capital company seemed<em>
really</em>
interested in our project, perhaps the representative only<em>
seemed</em>
really interested.</p>
Although the venture capital company seemed really interested in our project, perhaps the representative only seemed really interested.
The <strong>
Element
The <strong>
element is similarly to the <em>
in that it specifies that a section of text should be emphasized, but is used in most cases where bold text would be used. In fact, the default rendering of the <strong>
element is using a bold font, although we've seen that any default rendering can be changed using styles.
Also similar to the <em>
tag, OEB has a carryover tag from HTML that functions similar to the <strong>
tag but that specifies actual formatting: the <b>
tag, representing bold text. For reasons we've explained earlier, we don't recommend using tags that specify presentation information within a document itself. Therefore, you should in most cases use <strong>
rather than <b>
whenever marking up text usually rendered in bold.
<p>
The Hindi letter "ka" is pronounced similarly to the first part of the English word, "<strong>
cu</strong>
p".</p>
The Hindi letter "ka" is pronounced similarly to the first part of the English word, "cup".</p>
The <dfn>
Element
We've already discussed using the <dfn>
element to represent a word or words that are being defined for the first time.
<p>
The Hindi alphabet is usually specified as being a<dfn>
syllabary</dfn>
, since each letter of a word represents a syllable.</p>
The Hindi alphabet is usually specified as being a syllabary, since each letter of a word represents a syllable.
The <code>
Element
OEB has several inline elements, some of which you'll use and some of which you'll never need unless creating certain esoteric documents. We mention that <code>
element here because, since OEB was created by the computer-using community, it's likely that the first applications of OEB (this work included) will refer to computer programs or software.
The <code>
element was created to represent a section of computer program code, or data that should be entered by the user. This element is usually rendered in a monospaced font such as Courier, but as we've repeatedly stressed, you can change this behavior using styles.
<p>
In many programming languages, the statement<code>
variable=16</code>
represents an assignment operation, assigning the value on the right to the variable on the left of the equals sign.</p>
In many programming languages, the statement variable=16
represents an assignment operation, assigning the value on the right to the variable on the left of the equals sign.
The <cite>
Element
Many nonfiction works include information from other sources, and when they do so it is proper to cite the source from which the material was derived. The <cite>
provides a standard way to indicate a cited source.
<p>
"The UN, like the League of Nations before it, was designed around the concept of state sovereignty" (<cite>
Calvocoressi 1996</cite>
).
"The UN, like the League of Nations before it, was designed around the concept of state sovereignty" (Calvocoressi 1996).
The <span>
Element
Our discussion of inline elements has thus far assumed that, if you looked hard enough, you could find an OEB element that represented more or less the meaning of the section of text to which you're referring. We've stressed that you can always later change the style of the particular element you chose.
What if you can't find an element that's appropriate, but still want to specify a style for a section of text? OEB provides (again borrowed from HTML) a generic element, <span>
, that has no meaning other than to specify a section of text. The <span>
element has the normal style
and class
attributes, allowing you to specify style for an arbitrary section of text. For example, imagine you want to somehow highlight the vowels in an alphabet, but don't want to use <em>
because you'd like to use some separate style. You could always create a specific style class for <em>
, but you might rather specify style information from scratch using <span>
, like this:
<p>
English Alphabet:<span style="color: red">
A</span>
B C D<span style="color: red">
E</span>
F G...</p>
English Alphabet: A B C D E F G...
You should immediately protest that actual style information should not be included in the document itself. A slight modification resolves this problem and makes the use of <span>
acceptable. First specify a style class, such as .vowel {color:red}
, and then use this class in the <span>
element:
Style Sheet: .vowel {color:red}
Document:<p>
English Alphabet:<span class="vowel">
A</span>
B C D<span class="vowel">
E</span>
F G...</p>
English Alphabet: A B C D E F G...
The <br>
Element
You learned in an earlier chapter that multiple adjacent whitespace characters, such as spaces and line breaks, are always replaced with a single space character before the text is displayed. This seems reasonable until you encounters a situation in which you'd like to display text on a separate line, perhaps like this:
<p>
...Karl had three siblings:
Kris
Krista
Karla</p>
As you'll soon realize, if you've forgotten our earlier discussion about whitespace, what is displayed is not exactly what was entered:
...Karl had three siblings: Kris Krista Karla
You could always put the name of each of Karl's siblings in a separate paragraph, but they aren't really separate paragraphs. Besides, you don't want to risk their being formatted like paragraphs when displayed (either indented or separated by blank lines, depending on the reading system).
The real solution here is to use a separate list element, which you'll learn about later in this chapter. But you might insist that these items should go inside the paragraph, and you want to choose where the line breaks appear. A better example might be a poem, in which you'd like to guarantee that a line break appears after each line:
<p>
There was a young creature named Karl
Whose siblings would say with a snarl
We'll share what we eat:
Just some bones from the meat
And a little of "ic" from the "garl".</p>
Here again, it would be more preferable if there were a <poem>
element in OEB. There isn't. Short of creating your own tag for this situation, OEB provides an element that specifies that a line break should appear: the <br>
element.
The <br>
element is different than the elements examined so far in that it cannot have content; since it signifies a line break at a particular location in the text, it doesn't display text and has no need to hold text. The <br>
element is therefore referred to as an empty element. You might expect the <br>
element to simply have a beginning and ending tag with nothing in between (<br>
</br>
). However, XML specifies a special format for empty elements by combining the beginning and ending tags into one tag: <br />
- XML Rule 6: (Empty Elements) An element that cannot have text between its beginning and ending tags is classified as a empty element, and has a special form of a single tag with the element name followed by a slash (/):
<name />
Important: While not required by XML, OEB specifies that all empty tags must have a space between the tag name and the slash character. This is to ensure that OEB documents can be displayed more or less correctly in HTML browsers.
The <br>
element might therefore be used in a re-write of Lewis Carroll's Alice's Adventures in Wonderland:
<p>
Alice fell down the rabbit hole...<br />
Down...<br />
Down...<br />
Down...</p>
This would be correctly displayed as expected:
Alice fell down the rabbit hole...
Down...
Down...
Down...
The <br>
element, however, has the potential of being abused and overused. In most places, items might more appropriately be placed in separate paragraphs, or perhaps in a list. In keeping with our goal of using markup to encode meaning into a document, it would probably be better to place a poem inside a <poem>
element or something similar, although in this case we would have to define such an element before it could be used. The use of <br>
to show the plight of Alice, above, is certainly the easiest and perhaps even an appropriate way to create the desired visual effect. We'd just like to encourage you to make sure that the <br>
element is appropriate for the situation before using it.
The <a>
Element
The anchor element, <a>
, was first made popular by HTML. Since the <a>
element is responsible for linking documents and sections of documents, this element is responsible for the "hypertext" part of HTML. Without <a>
, HTML might otherwise have only been "TML", a text markup language with no linking capabilities. The OEB Publication Structure incorporated <a>
into its tag set with hardly any modifications to its fundamental form.
With its linking capabilites, <a>
is the first tag we've discussed that starts to allow static pages in a book to come to life, to allow interaction with the user. The Open eBook specification begins with the assumption that a user will be provided with a paging function that will allow traversal through the contents of a book. The most fundamental purpose for hypertext anchors might be to link to reference sections or a glossary; however, the <a>
element allows the author to provide many more complex navigation capabilites, allowing readers to even choose an arbitrary path through the book as they read.
The most important attribute of the <a>
element is href
. As you've seen in elements both in the OEB package and in OEB style sheets, the href
attribute specifies a "hypertext reference" location. Usually, the value of this attribute refers to a file; in other instances is can refer to a specific location within a file.
Let's revisit a section from the first work we created.
<p>
Years ago, when strange creatures ruled the earth, the seas were beginning to form, and humans had yet to appear, there lived a young blovjus named Karl.</p>
Now, some uninformed readers may not know what a "blovjus" is. You may wish to provide a definition a reader can read. Having a definition in the text is unacceptable; you don't want to bother your many readers who know exactly what a blovjus is and do not want to be told again. Instead, you elect to place the definition in a separate OEB document file named blovjus.html
:
<?xml version='1.0'?>
<!DOCTYPE html PUBLIC "+//ISBN 0-9673008-1-9//DTD OEB 1.0.1 Document//EN" "http://openebook.org/dtds/oeb-1.0.1/oebdoc101.dtd">
<html>
<body>
<p><strong>
blovjus</strong>
A strange, mythical creature which lived many years ago; sometimes it stole supper from its siblings.</p>
</body>
</html>
Using the <a>
tag, it's a simple job to link "blovjus" in the text to its definition:
<p>
Years ago, when strange creatures ruled the earth, the seas were beginning to form, and humans had yet to appear, there lived a young<a href="blovjus.html">
blovjus</a>
named Karl.</p>
Anchor elements can also be used to mark, or anchor, a section of text in a document (although this function is less important since each element in OEB contains an id
attribute). This way, two <a>
elements can be used together, one to mark a location and another to link to that location. This allows links not only to files, but to a specific location in a file. The tag serving as an anchor will use the id
tag to provide a name for the anchor. The tag serving as a link will use the href
as before to refer to a file, except that a pound sign (#) will be appended followed by the id of the anchor which serves as the link target.
This is actually quite simple in practice. Assume that you have so many uninformed readers that you've created an entire glossary with many definitions. This glossary replaces the blovjus.html
document file you created earlier, containing "blovjus" and other terms:
<?xml version='1.0'?>
<!DOCTYPE html PUBLIC "+//ISBN 0-9673008-1-9//DTD OEB 1.0.1 Document//EN" "http://openebook.org/dtds/oeb-1.0.1/oebdoc101.dtd">
<html>
<body>
<p><a id="blovjus"><strong>
blovjus</strong></a>
A strange, mythical creature which lived many years ago; sometimes it stole supper from its siblings.</p>
<p><a id="earth"><strong>
earth</strong></a>
The third planet from the sun.</p>
</body>
</html>
Note that we've placed an <a>
element around each term to serve as an anchor to mark the link targer. We've specified a name for each target using the id
attribute. Here, we've used names that match the terms we're defining, but we could have used any names as long as they are unique and we use the same names in the links. Here's what the links look like in our original file:
<p>
Years ago, when strange creatures ruled the<a href="glossary.html#earth">
earth</a>
, the seas were beginning to form, and humans had yet to appear, there lived a young<a href="glossary.html#blovjus">
blovjus</a>
named Karl.</p>
In each link, we specify the document in which the definitions reside (glossary.html
in this example), followed by a pound sign (#) and then the ID of the appropriate definition (here, earth
and blovjus
). It is here that we must always make sure the name in the href
attribute always matches the name in the target anchor tag's id
attribute.
This application of the anchor tag as a true anchor is less useful since OEB provides an id
attribute for most elements. Instead of adding an <a>
element and id
attribute to serve as an anchor, you can instead add an id
to the element to which you want to link. The above example, then, would appear like this:
<?xml version='1.0'?>
<!DOCTYPE html PUBLIC "+//ISBN 0-9673008-1-9//DTD OEB 1.0.1 Document//EN" "http://openebook.org/dtds/oeb-1.0.1/oebdoc101.dtd">
<html>
<body>
<p id="blovjus"><strong>
blovjus</strong>
A strange, mythical creature which lived many years ago; sometimes it stole supper from its siblings.</p>
<p id="earth"><strong>
earth</strong>
The third planet from the sun.</p>
</body>
</html>
This version of specifying a link target is the recommended one — there is no need to specify an anchor for the target because almost any OEB element can contain an ID attribute. The <a>
element will still be needed, of course, to link to the referenced location.
Anchor tags are the first step in leveraging the capabilities of electronic books which static books do not have. Hypertext linking has many uses, from creating tables of contents to providing user-initiated changes in a plot. We'll address some of these uses in the following chapters.
Block Elements
Block elements could be considered the opposite of inline elements. They are the enclosing elements within which inline elements are placed. That is, inline elements have to have some element to be inside; this element is ultimately a block element (although inline elements can appear inside other inline elements). You've already seen one example of a block element, the <p>
element representing a paragraph. Block elements could also be classified as elements that automatically have a line break before and after them; they might therefore also be appropriately called "out-of-line" elements.
Traditionally, block elements in HTML also had a blank line immediately before and immediately after them, but OEB reading systems may prefer to display block elements differently, indenting the first line of text in a <p>
element, for example.
The <p>
Element
The <p>
element is probably the most straightforward block-level element, and probably the most common. Every paragraph in OEB should be surrounded by <p>
...</p>
. As we've certainly used plenty of paragraphs up to this point, we won't give any examples here. Instead, we'll just give one precaution: be careful not to overuse the <p>
tag. Make sure the block of text you're marking up is really a paragraph and not, say, a list or a heading, both of which are covered in the sections below.
The <h1>
...<h6>
Heading Elements
Perhaps the second most common block element is actually a group of similar elements: <h1>
, <h2>
, <h3>
, <h4>
, <h5>
, and <h6>
. These elements represent different levels of headings in your document.
What does "heading" mean? It represents whatever you want it to represent. You could use <h1>
to represent the title of your book on the title page, and use <h2>
to represent the title of each chapter. Alternatively, you could use <h1>
to represent each chapter title, <h2>
to represent each chapter subtitle, and use a completely separate style class (or custom XML element) to represent the book title on the title page.
All this is at your discretion because the heading elements do not directly correspond to any particular division of a book; they do not have a particular meaning, such as "chapter title" or "subtitle". The only thing that you can be sure of is that the default rendering method for each higher-numbered heading (such as <h1>
) will be larger than a lower-numbered heading (such as <h2>
).
The lack of a particular meaning for the <hX>
elements makes them slightly less useful than one might expect. Some early eBook reading systems assigned meanings to the <hX>
elements, allowing the reading system to automatically find and understand when chapters begin, for example. Other markup languages have specific tags that represent chapters and other divisions. OEBPS 1.0, however, has no elements that specifically indicate book structure, so you'll have to make do with the heading elements. While OEB may introduce such elements in the future, for now using the heading elements is highly preferable to specifying the styles of headings manually, of course. Just remember that the meanings assigned to the heading elements are completely up to you. You might choose, for example, to use them like this:
<h1>
Karl the Creature</h1>
<h2>
Chapter 1: Karl as a Kid</h2>
<p>
Years ago...</p>
Karl the Creature
Episode 1: Karl as a Kid
Years ago...
Lists, Ordered (<ol>
) and Unordered (<ul>
)
Almost every work, especially non-fiction educational works (like this one), have instances in which a list of items must be displayed. Many times the items in these lists are shown in a particular order, each item with a particular number. These lists are called ordered lists:
The names of the first three planets from the sun, in order, are:
1. Mercury
2. Venus
3. Earth
Not only must the numbers of each item in the list be carefully considered, care must be taken to ensure that the list is formatted correctly. Whenever the list is modified or reordered, care must be taken in modifying the numbering involved. Furthermore, there's no indication encoded in the file that this is a list; no meaning has been added to text that a computer or data-retrieval program could extract.
OEB provides elements that solves all of these problems. In this case, we can use markup to specify that we have an ordered list (using the <ul>
element), and that each item in the list is (as you would expect) a list item (using the <li>
element). Ordered lists in OEB therefore consist of two separate elements, <ol>
and <li>
, used in conjunction like this:
<p>
The names of the first three planets from the sun, in order, are:</p>
<ol>
<li>
Mercury</li>
<li>
Venus</li>
<li>
Earth</li>
</ol>
The names of the first three planets from the sun, in order, are:
- Mercury
- Venus
- Earth
Notice two things: first, the number of each list item does not need to be specified; it is supplied automatically when the list is displayed. Second, the introductory statement, "The names of the first three planets...", is not technically part of the list, so it is not placed within the <ol>
...</ol>
tags. The OEB publication structure allows you to specify the formatting, the type, and even a language-specific representation of the numbers used in a list.
Some types of lists do not have numbers associated with them; they are unordered lists. If you're listing items in a grocery list, for example, you may not care about the order in which they are purchased. You would therefore use the <ul>
for the unordered list, which functions exactly like the <ol>
element used for ordered lists:
<p>
Please purchase the following items:</p>
<ul>
<li>
Bread</li>
<li>
Eggs</li>
<li>
Milk</li>
</ul>
Please purchase the following items:
- Bread
- Eggs
- Milk
The default rendering method for unordered lists is to display a small round circle next to each item. You'll learn later how to use styles to modify this behavior. Most importantly, we've specified that the information is actually a list of items, and we can later, if we wish, arbitrarily change the way this list is displayed using styles, without changing the actual text of the document.
The <div>
Element
The <span>
element, as you saw earlier, provided a convenient way to specify style information about an arbitrary set of characters inside a block element. That last aspect somewhat limits its applicability, though: since <span>
is an inline element, it can't be used to specify style information for more than one block element. That's why <div>
in included in OEB.
The <div>
element is the block-level equivalent to the inline <span>
span. It has no meaning in itself; its sole purpose is to group several block-level elements for the purpose of applying styles, for example. To see an example of where the <div>
element could be applied, let's revisit an example of an inappropriate use of the <em>
element:
Illegal:
<em>
<p>
Paragraph 1</p>
<p>
Paragraph 2</p>
</em>
As we noted when discussing the OEB content model, the <em>
element, being an inline element, cannot enclose the <p>
element, a block element. We explained that the <em>
element could simply be moved inside the <p>
, like this:
Legal OEB:
<p><em>
Paragraph 1</em></p>
<p><em>
Paragraph 2</em></p>
The same effect could be achieved using the <div>
element in a similar manner to that used in the first example. An emphasis style class could be created and applied to a surrounding <div>
element:
Style Sheet: .emphasis {font-style: italic}
Document:
<div class="emphasis">
<p>
Paragraph 1</p>
<p>
Paragraph 2</p>
</div>
As with the <span>
element, the <div>
element is another carryover from HTML that allows one to simulate the creation of a custom tag. Also similarly to how the <span>
element is used, XML allows true custom elements to be created, making <div>
, although convenient, somewhat redundant. It's this convenience that makes <div>
quite attractive and perhaps acceptable in some situations. Before using it, however, make sure that a custom XML element wouldn't be more appropriate.
The <center>
Element (deprecated)
OEB includes <center>
but marks it as deprecated: its use is allowed so that HTML documents will not require much modification, but its use is discouraged. In fact, the <center>
element is mentioned here only because its use has become very popular over the years in HTML documents. We echo the exhortation of the OEBPS specification (and the latest version of HTML) that the <center>
element should not be used in new OEB document; explicitly specifying that a section of text should be centered goes against the concept of separation of content and presentation.
Deprecated:<center>
Chapter 1: Karl as a Kid</center>
Chapter 1: Karl as a Kid
As an alternative, OEB (and HTML) allow text to be centered using styles. Specifically, the text-align
property, which we'll discuss later in this chapter, allows a "center"
value that gives the desired effect. Using style classes with the <div>
element we just discussed might yield something like this:
Style Sheet: .chapterhead {text-align: center}
Document:<div class="chapterhead">
Chapter 1: Karl as a Kid</div>
As is usually the case with <div>
, there are better ways to specify which text should be centered. If you're already using <h1>
for chapter headings, for example, specifying that the chapter headings should be centered is quite easy using styles, and illustrates how convenient and appropriate style sheets can be:
Style Sheet: h1 {text-align: center}
Document:<h1>
Chapter 1: Karl as a Kid</h1>
Whatever method you choose to use to center text, we encourage you that it not be the <center>
element.
The <blockquote>
Element
In contrast to the <center>
element, <blockquote>
is a good example of how elements should encode meaning into a document and assist in separating content from presentation. Many nonfiction works have sentences quoted from other works. If a quote is several sentences or even several paragraphs long, it is usually placed in a separate, indented paragraph or group of paragraphs. The <blockquote>
element allows text to be specified as a block of quoted text without worrying how it will be formatted. Usually, the default indented style is acceptable, but this can easily be changed using styles.
The <blockquote>
element has one optional attribute, cite
, which allows the web address location of the quote to be specified. Note that the inline element with the same name as the attribute, <cite>
, is often used in conjunction with the <blockquote>
element:
<blockquote cite="http://www.un.org/Overview/rights.html">
Everyone is entitled to all the rights and freedoms set forth in this Declaration, without distinction of any kind, such as race, colour, sex, language, religion, political or other opinion, national or social origin, property, birth or other status. Furthermore, no distinction shall be made on the basis of the political, jurisdictional or international status of the country or territory to which a person belongs, whether it be independent, trust, non-self-governing or under any other limitation of sovereignty. (<cite>
UN Declaration of Universal Human Rights, Article 2, December 10, 1948</cite>
)
</blockquote>
Everyone is entitled to all the rights and freedoms set forth in this Declaration, without distinction of any kind, such as race, colour, sex, language, religion, political or other opinion, national or social origin, property, birth or other status. Furthermore, no distinction shall be made on the basis of the political, jurisdictional or international status of the country or territory to which a person belongs, whether it be independent, trust, non-self-governing or under any other limitation of sovereignty. (UN Declaration of Universal Human Rights, Article 2, December 10, 1948)