The doctype is how a browser makes sense of the HTML content of a page. Over the years, there have been many different doctypes. They are usually long and cumbersome:
Here are some points to remember when dealing with doctypes:
The doctype must be the first thing that a browser reads; this means that it should usually be present on line 1 of an HTML document
If this is not the case, older versions of IE will go into quirks mode, which can cause many strange and difficult to troubleshoot issues
If a server-side language is rendering the page, code may show up above the doctype (as with a PHP page that does a server-side include at the top of the page), as long as this code does not render anything in the final HTML output
The doctype must be valid and well formed
It is becoming a standard practice to use the HTML5 doctype, even if other HTML5 features aren't being used. It is simply:
This doctype is easy to remember and is recognized by all modern browsers (including Internet Explorer 6). It is flexible in the sense that both HTML and XHTML syntax is valid under this doctype, and it is also necessary to use this doctype for HTML5 features to work correctly.
- The doctype must be the first thing that a browser reads; this means that it should usually be present on line 1 of an HTML document
the new hotness a newthe lastest version of HTML and XHTML. The HTML5 draft specification defines a single language that can be written in HTML and XML. It attempts to solve issues found in previous iterations of HTML and addresses the needs of Web Applications, an area previously not adequately covered by HTML.
Making HTML5 Work
The new semantic elements available in HTML5 are helpful—they make markup more readable, both by humans and machines—but there's one major issue to fix before you can use them: Internet Explorer (below version 9) won't display unrecognized elements correctly.
Mark Pilgrim of Dive Into HTML5 goes into some detail about all of this here.
HTML5 and Friends
When people talk about HTML5, they are usually referring to the 75 or so related features and specifications that make up the "modern web" movement. A comprehensive list of these technologies (along with browser support and usage recommendations) can be found at html5please.com.
While the adoption of new technologies is encouraged, the recommendation is to do what makes sense for your project. If your team wants to explore the possibility of leveraging the
<canvas>element (for example) to do some rich animations and interactions, that's awesome. Just make sure of the following:
Don't agree to something that is totally beyond your skill set; creating a quick proof of concept to ensure that the solution is feasible can help keep you from getting in over your head
- Flexibility of the experience
Most clients require us to support browsers that don't yet handle HTML5 technologies; make sure that everyone knows that there will be a baseline experience for these browsers, with additional functionality for those browsers that can handle it (this is called progressive enhancement)
Having different experiences for different browsers requires increased collaboration between the development team and the design team; make sure that the creative and project leadership are aware of this fact
- Budget and Deadlines
Increased technical complexity, multiple versions of one experience, and increased back-and-forth with designers all take time; ensure that all of this is baked into your estimates regarding level of effort and confidence in meeting deadlines
Generally speaking, it is a good idea for your markup to conform to the XHTML syntax rules. This is obviously necessary when using an XHTML doctype (syntax errors will cause your page to fail validation, and can cause rendering issues). It is wise to use the XHTML syntax even when using the HTML5 doctype mentioned above, for a number of reasons. Markup conforming to XHTML syntax rules is:
flexible and stable—HTML5 allows XHTML syntax, but XHTML does not allow HTML syntax; this means that the doctype can be changed between the two types without worry of causing syntax-related issues
familiar—both to interactive developers and to back-end developers who are used to working with XML
So what are these rules?
Make all your tags lower case (
Close all your tags, even empty ones (
<hr />instead of
Make all attribute names lower case and quote all attribute values; for example,
<td colspan="2">instead of
Give empty attributes a value, such as
<input type="checkbox" checked="checked" />instead of
<INPUT TYPE=checkbox CHECKED>
Nest all your tags correctly
Validating the correctness of your HTML markup can be one of the easiest ways of finding otherwise difficult to explain issues. Validation can identify errors that, if left uncorrected, can have negative impacts on page performance, accessibility, and search engine optimization. The W3C validator can be found here.
Avoid Deprecated Elements
This is related to the note on validation, above, as including deprecated elements in your markup can potentially cause validation errors.
Many elements are no longer part of the HTML specification and should be avoided (this text is taken directly from the specification).
The following elements are not in HTML5 because their effect is purely presentational and their function is better handled by CSS:
The following elements are not in HTML5 because using them damages usability and accessibility:
The following elements are not included because they have not been used often, created confusion, or their function can be handled by other elements:
<acronym>is not included because it has created a lot of confusion. Authors are to use
<applet>has been made obsolete in favor of
<isindex>usage can be replaced by usage of form controls
<dir>has been made obsolete in favor of
<noscript>element is only conforming in the HTML syntax. It is not included in the XML syntax as its usage relies on an HTML parser (so don't include it if you're using an XHTML doctype)
It is important that your HTML document be well organized and contain elements that follow a formal structure. This makes it readable both to humans who aren't necessarily seeing the visual experience of the page (such as a blind user who is using a screen reader), and to the machines that read every page on the internet (like the bots employed by Google to determine search results).
Header Tags in HTML5 and XHTML
Regardless of the doctype being used, one of the best ways to make sure that your document has a correct outline is to properly use header tags.
<h1>tag is the most important of the semantic tags, and its contents are weighted heavily by search bots when determining page relevance.
In XHTML, there can be only one
<h1>tag on the page (often the company name/logo, or the title of an article for blogs). Subsections are each headed by
<h2>elements and may be subdivided further by using headings of the subsequent level (all the way to
<h6>). Heading levels should not be skipped.
When using an HTML5 document structure, you can check the outline of your document by using something like this HTML5 outliner bookmarklet. To use it, simply download the page and open it, then bookmark the link shown. When you click the bookmark, the outline of the page you are visiting will be shown.
HTML5 introduces the concept of "sections" in documents, and each section can have an
<h1>element, followed by the rest of the headers. For a fantastic explanation of semantics in HTML5, and the header elements in particular, see this article by Mark Pilgrim over at diveintohtml5.info.
Use Tables for Tabular Data (Only)
For years, HTML tables had been used to visually lay out elements on a page. This was necessary at first, as there was no other way of creating many common page layouts. The practice continued even after CSS made table-less layout possible because developers were used to tables, and it can be difficult to get the hang of using CSS for complex layouts.
Using modern techniques, there are no layouts that can't be built by using CSS. Unless you need to display tabular data in rows and columns, tables should not be used. This is for a number of reasons:
Tables are not good for accessibility; in order to describe a set of tabular data, screen readers have to wade through numerous bodies, rows, headers, and cells; imagine listening to a screen reader saying "table cell" over and over again when you're just trying to read the content of a page
Tables negatively impact page render time beyond simply adding markup to be parsed; because table width is dictated by a sum of column widths, and column widths can be influenced by the width of all contained elements, tables must go through multiple render cycles before the browser can move on to the rest of the document
Avoid Unnecessary Elements
<div>tags are useful block level elements for organizing sections of code (especially when something like a
<article>wouldn't really make sense), but too often developers moving away from table-based layout think that divs do something more than that–they don't.
If your markup has divs within divs within divs, take a step back and ask yourself what you're really trying to accomplish with these elements. There may be a more semantically appropriate way to meet your needs.
Back in 2009, Smashing Magazine (which is a great resource for all aspects of web design and development) posted an article that dives deeply into the topic of "divitis".
More generally speaking, keeping the number of elements on a page to a minimum is a good practice for optimizing the page's performance. Read about this in Yahoo!'s developer documentation.
Unnecessary Classes and IDs
It is sometimes tempting to add
classattributes to any element that needs to be styled. When taken to the extreme, each element ends up with its own
class. This can not only make the source of the page difficult to read, but all of those extra characters need to be downloaded by the browser, resulting in slower page loads.
As a best practice, wait to assign a
idto an element until you know that you need it. This isn't to say not to use them at all, of course. Appropriate usage of classes is fundamental to writing modular CSS, and the CSS parser has an easier time applying rules to class-based ancestral selectors than to generic tag names. So don't be stingy with classes when it makes sense to use them.
A case can be made for avoiding the use of
idattributes altogether, since they are difficult to override in the cascade. See writing good selectors for more information.
When using image tags, always specify
altis required for accessibility, and specifying
widthattributes improves page performance.
The reason for the latter is that a browser, when it encounters an
<image>tag, will initiate the download of the image, and then move on to the next element in the HTML page. If the
widthattributes were specified in the HTML, the browser knows what size placeholder to put in for the image as it is downloading and doesn't have to look back.
widthare not specified in the markup, the browser has to wait until the image finishes downloading, then go back to check its dimensions, and adjust the layout of the entire page accordingly (this causes what are known as reflow and repaint cycles, which are costly in terms of page performance).
Lastly, never use spacer images. If you know what spacer images are, then you should enough by now not to use them.
All markup should be delivered as UTF-8, as its the most friendly for internationalization. It should be designated in both the HTTP header and the head of the document.
Make use of
<dl>(description list) and
<blockquote>elements, when appropriate.
Items in list form should always be contained within a
<dl>, never a set of
<label>elements to label each form field. The
forattribute of the
<label>should match the ID of the associated input field, so that users can click on labels to bring focus to text inputs, toggle checkboxes, etc; You should also set the
cursorCSS property of the label to "
pointer", to give a visual cue that this interaction is possible
Do not use the
sizeattribute on input fields. The
sizeattribute is relative to the font-size of the text inside the input; Instead use the CSS
widthproperty; It is often best to use CSS
<textarea>elements, as well, since it can be difficult to match the dimensions of the design while using the
Place an html comment on some closing
<div>tags to indicate which element you're closing. This helps with readability when there are a lot of nested elements
Use microformats where appropriate, specifically
Make use of
<th>elements (and the
summaryattributes) when appropriate
Always use title-case for headers and titles. Do not use all caps or all lowercase titles in markup, instead apply the CSS property