HTML5 And The Document Outlining Algorithm
By now, we all know that we should be using HTML5 to build websites. The discussion now is moving on to how to use HTML5 correctly. One important part of HTML5 that is still not widely understood is sectioning content: section
, article
, aside
and nav
. To understand sectioning content, we need to grasp the document outlining algorithm.
Understanding the document outlining algorithm can be a challenge, but the rewards are well worth it. No longer will you agonize over whether to use a section
or div
element — you will know straight away. Moreover, you will know why these elements are used, and this knowledge of semantics is the biggest benefit of learning how the algorithm works.
What Is The Document Outlining Algorithm?
The document outlining algorithm is a mechanism for producing outline summaries of Web pages based on how they are marked up. Every Web page has an outline, and checking it is easy using a really simple free online tool, which we’ll cover shortly.
So, let’s start with a sample outline. Imagine you have built a website for a horse breeder, and he wants a page to advertise horses that he is selling. The structure of the page might look something like this:
That’s all it is: a nice, clean, easy-to-follow list of headings, displayed in a hierarchy — much like a table of contents.
To make things even simpler, only two things in your mark-up affect the outline of a Web page:
- heading content (
h1
toh6
andhgroup
), - sectioning content (
section
,article
,aside
andnav
).
Obviously, the sectioning of content is the new HTML5 way to create outlines. But before we get into that, let’s go back to HTML 101 and review how we should all be using headings.
Creating Outlines With Heading Content
To create a structure for the horses page outlined in figure 1, we could use mark-up like the following:
It’s as simple as that. The outline in figure 1 is created by the levels of the headings.
Just so you know that I’m not making this up, you should copy and paste the code above into Geoffrey Sneddon’s excellent outlining tool. Click the big “Outline this” button, et voila!
An outline created with heading content this way is said to consist of implicit, or implied, sections. Each heading creates its own implicit section, and any subsequent heading of a lower level starts another layer, of implicit sub-section, within it.
An implicit section is ended by a heading of the same level or higher. In our example, the “Mares” section is ended by the beginning of the “Stallions” section, and each section that contains details of an individual horse is ended by the beginning of the next one.
Figure 3 below is an example of an implicit section that ends with a heading of the same level. And figure 4 is an implicit section that ends with a heading of a higher level.
Creating Outlines With Sectioning Content
Now that we know how heading content works in creating an outline, let’s mark up our horses page using some new HTML5 structural elements:
Now, I know what you’re thinking, but I haven’t taken leave of my senses with these crazy headings. I am making a very important point, which is that the outline is created by the sectioning content, not the headings.
Go ahead and copy and paste that code into the outliner, and you will see that the heading levels have absolutely no effect on the outline where sectioning content is used.
The section
, article
, aside
and nav
elements are what create the outline, and this time the sections are called explicit sections.
One of the most talked about features of HTML5 is that multiple h1
elements are allowed, and this is why. It’s not an open invitation to mark up every heading on the page as h1
; rather, it’s an acknowledgement that where sectioning content is used, it creates the outline, and that each explicit section has its own heading structure.
The part of the HTML5 spec that deals with headings and sections makes this clear:
“Sections may contain headings of any rank, but authors are strongly encouraged to either use only h1
elements, or to use elements of the appropriate rank for the section’s nesting level.”
I would strongly advise that until browsers — and, more critically, screen readers — understand that sectioning content introduces a sub-section, using multiple h1
elements is less safe than using a heading structure that reflects the level of each heading in the document, as shown in figure 6 below.
This means that user agents that haven’t implemented the outlining algorithm can use implicit sectioning, and those that have implemented it can effectively ignore the heading levels and use sectioning content to create the outline.
At the time of this writing, no browsers or screen readers have implemented the outlining algorithm, which is why we need third-party testing tools such as the outliner. The latest versions of Chrome and Firefox style h1
elements in nested sections differently, but that is very different from actually implementing the algorithm.
When most user agents finally do support it, using an h1
in every explicit section will be the preferred option. It will allow syndication tools to handle articles without needing to reformat any heading levels in the original content.
One other point worth noting here is the position of the paragraph “All our horses come with full paperwork and a family tree.” In the example that used headings to create the outline (figure 2), this paragraph is part of the implicit section created by the “Brown Biscuit” heading. Human readers will clearly see that this text applies to the whole document, not just Brown Biscuit.
Sectioning content solves this problem quite easily, moving it back up to the top level, headed by “Horses for sale.”
Mixing It Up
So, what happens when implicit sections and explicit sections are combined? As long as you remember that implicit sections can go inside explicit sections, but not the other way round, you will be fine. For example, the following works well and is perfectly valid:
And it creates a sensible hierarchical outline:
However, if you hope to achieve the same outline by nesting an explicit section inside an implicit section, it won’t work. The sectioning element will simply close the implicit section created by the heading and create a very different outline, as shown below:
This would produce the following outline:
There is no way to make the explicit sections created by the article
elements become sub-sections of the Mare’s implicit section.
You can use headings to split up the content of sectioning elements, but not the other way round.
Things To Watch Out For
Untitled Sections
Until now we haven’t really looked at nav
and aside
, but they work exactly the same as section
and article
. If you have secondary content that is generally related to your website — say, horse-training tips and industry news — you would mark it up as an aside
, which creates an explicit section in the document outline. Similarly, major navigation would be marked up as nav
, again creating an explicit section.
There is no requirement to use headings for aside
and nav
, so they can appear in the outline as untitled sections. Go ahead and try the following code in the outliner:
The nav
appears as an untitled section. Now, this generally wouldn’t be a problem and is not considered bad HTML5 code, although in his recent HTML5 Doctor article on outlining, Mike Robinson recommends using headings for all sectioning content in order to increase accessibility.
Untitled section
and article
elements, on the other hand, are generally to be avoided. In fact, if you’re unsure whether to use a section
or article
, a good rule of thumb is to see whether the content has a natural, logical heading. If it doesn’t, then you will more than likely be wiser to use a good old div
.
Now, the spec doesn’t actually require section
elements to have a title. It says:
“The section element represents a generic section of a document or application. A section, in this context, is a thematic grouping of content, typically with a heading.”
Your interpretation of this probably hinges on your understanding of the word “typically.” I take it to mean that you need a damn good reason not to use headings with section
elements. I do not take it to mean that you can ignore it whenever you feel the urge to use a new HTML5 element.
Where the article
element is specified, the spec goes even further by showing an example of blog comments marked up as untitled article
s, so there are exceptions. However, if you see an untitled section
or article
in the outline, make sure you have a good reason for not giving it a title.
If you are unsure whether your untitled section is a nav
, aside
, section
or article
, a very handy Opera extension will let you know which type of sectioning content you have left untitled. The tool will also let you view the outline without leaving the page, which can be hugely beneficial when you’re debugging sections.
Sectioning Root
The eagle-eyed among you will have noticed that when I said that sectioning content cannot create a sub-section of an implicit section, there was an h1
(“Horses for sale”) not in sectioning content immediately followed by a section
(“Mares”), and that the sectioning content did actually create a sub-section of the h1
.
The reason for this is sectioning root. As the spec says, sectioning elements create sub-sections of their nearest ancestor sectioning root or sectioning content.
“Sectioning content elements are always considered subsections of their nearest ancestor sectioning root or their nearest ancestor element of sectioning content, whichever is nearest, regardless of what implied sections other headings may have created.”
The body
element is sectioning root. So, if you paste the code from figure 7 into the outliner, the h1
would be the sectioning root heading, and the section
element would be a sub-section of the body
sectioning root.
The body
element is not the only one that acts as sectioning root. There are five others:
\1. blockquote
\2. details
\3. fieldset
\4. figure
\5. td
The status of these elements as sectioning root has two implications. First, each can have its own outline. Secondly, the outline of nested sectioning root does not appear in, nor does it have an effect on, the outline of its parent sectioning root.
In practice, this means that headings inside any of the five sectioning root elements listed above do not affect the outline of the document that they are a part of.
The final thing (you’ll be glad to hear) that I’ll say about sectioning root is that the first heading in the document that is not inside sectioning content is considered to be the document title.
Try the following code in the outliner to see what happens:
I won’t try to explain this to you because it will probably only confuse both of us, so I’ll let you play with it in the outliner. Hint: try using different heading levels for the implicit sections to see how the outline is affected; for example, h3
and h4
, or two h5
s.
Untitled Documents
If no heading is at the root level of the document (i.e. not inside sectioning content), then the document itself will be untitled. This is a pretty serious problem, and it can occur either through carelessness or, paradoxically, by thinking carefully about how sectioning content should be used.
Roger Johansson addresses this issue in his excellent article on document outlines and HTML5 and the follow-up article.
Johansson asks how a proper document outline is supposed to be created for a blog post or other news-type item using HTML5. If you subscribe to the belief that your logo or website name should not be in an h1
element, you could mark up your blog post along the lines of the following:
<body>
<article>
<h1>Blog post title</h1>
<p>Blog post content</p>
</article>
</body>
The document is untitled. Somewhat reluctantly, Johansson settles on marking up the website’s title in h1
and using another h1
to mark up the article’s title. This is a sensible solution and is backed up by the results of the WebAIM screenreader user survey, in which the majority of respondents stated a preference for two top-level headings in exactly this format.
This same approach is also widely used on static pages that are built with HTML5 structural elements, and it could be very useful indeed for screen reader users. Imagine that you are using a screen reader to find a decent recipe for chicken pie, and you have a handful of recipe websites open for comparison. Being able to quickly find out which website you are on using the shortcut key for headings would be much more useful than seeing only “chicken pie” on each one.
Not too far behind two top-level headings in the screen reader user survey was one top-level heading for the document. This is probably my preferred option in most cases; but as we have already seen, it creates an untitled body, which is undesirable.
In my opinion, there is an easy way around this problem: don’t use article
as a wrapper for single-blog posts, news items or static page main content. Remember that article
is sectioning content: it creates a sub-section of the document. But in these cases, the document is the content, and the content is the document. Setting aside the name of the element, why would we want to create a sub-section of a document before it has even begun?
Remember, you can still use div!
hgroup
This is the final item in the list of things to watch out for, and it’s very easy to understand. The hgroup
element can contain only headings (h1
to h6
), and its purpose is to remove all but the highest-level heading it contains from the outline.
It has been and continues to be the subject of controversy, and its inclusion in the specification is by no means a given. However, for now, it does exactly what it says on the tin: it groups headings into one, as far as the outlining algorithm is concerned.
In Conclusion
The logic behind the document outlining algorithm can be hard to grasp, and the spec can sometimes feel like physics: understandable as you’re reading it, but when you try to confirm your understanding, it dissolves and you find yourself re-reading it again and again.
But if you remember the basics — that section
, article
, aside
and nav
create sub-sections on Web pages — then you are 90% of the way there. Get used to marking up content with sectioning elements and to checking your pages in the outliner, because the more you practice creating well-outlined documents, the sooner you will grasp the algorithm.
I promise, you will have it cracked after only a handful of times, and you will never look back. And from that moment on, every Web page you create will be structured, semantic, robust, well-outlined content.
Other Resources
- “Creating an Outline” The full W3C specification of the outlining algorithm.
- “Document Outlines” An explanation of the document outlines from the HTML5 Doctor.
- “Sectioning content” and “Sectioning root” The W3C specifications of sectioning content and sectioning root.
- “HTML5 Sectioning Elements, Headings, and Document Outlines” and “HTML5 Document Outline Revisited” Roger Johansson’s articles on the issue of sectioning content creating untitled documents.
Further Reading
- The Importance Of HTML5 Sectioning Elements
- Coding An HTML 5 Layout From Scratch
- Our Pointless Pursuit Of Semantic Value