Quick XHTML
So You Wanna Be a Boxer?
You want to use XHTML. That, my friend, isn’t as easy as the W3C, the Web 2.0 crowd, and every second—rate web—whacker this side of Carthage want you to believe. There are several drawbacks, and few advantages. Let’s get to it.
Question: “Why do I want to use XHTML?”. There are no stock answers but here are a few we have run across before:
- “It is stricter than HTML” – This is a common misunderstanding. Because XHTML is an XML—based language, as opposed to HTML which is an application of SGML, different rules apply; for the most part these are no different from an author point of view. (footnote 1)
- “It is more semantic than HTML” – Also a misunderstanding. No elements were added between HTML 4.01 and XHTML 1.1 (footnote 2), nor will there ever be. Only by adding parts of other markup languages through the namespaces mechanism can XHTML 1.0 or 1.1 have extended semantics – but there is a but.
- “With XHTML I can extend the language through the use of namespaces” – You could, in theory, do that. However, it requires that the XHTML document is not only parsed by the browser as if it was XML, but that the browser understands namespaces and is able to create a coherent whole out of the various languages involved. Semantically, none can do that today. (footnote 3)
- “XHTML is compatible with HTML” – This may be the greatest misunderstanding of all – no, it isn’t. Only by telling browsers that the page is actually HTML will a majority of today’s browsers handle the content, and then only because they have gotten very good at handle incorrectly written markup.
- “I use it to be forwards compatible.” – The likelyhood of any current, or future, web—browsers actually giving up support for HTML is rather on the low side. Do you really believe that, tomorrow, a user—agent won’t handle today’s HTML?
Remember, regardless of your arguments, one thing remains fact: a majority of web browsers do not, nor will they for the next several years, support XHTML, while it is highly unlikely that any of them will ever cease supporting HTML.
I don’t remember names …
Here is the list of things you must remember:
-
Always, without fail, write element— and attribute names in lower case. There is
no
onLoad
, onlyonload
; noP
, onlyp
. This way, or the highway. -
Always, without fail, close all elements. Those that do not have ending tags should be
written as follows and in no other way:
<area … />
<base … />
<br … />
<hr … />
<img … />
<input … />
<link … />
<meta … />
<param … />
-
Configure your text editor to save using UTF—8 encoding. Configure your web server to
send
charset=utf—8
when it serve pages. Use UTF—8 throughout the process; it will cut down on your problems now, and later. -
Don’t include the XML declaration – that is, make certain your document does not
start with
<?xml version="1.0" encoding="utf-8"?>
. -
Use both the
lang
andxml:lang
attributes to specify the natural language used in the document – for instance:<html lang="en-GB" xml:lang="en-GB">
. -
Move your scripts and your styles out into external files, and reference them using
<link … href="…" … />
and<script … src="…" …></script>
Don’t use historical “comment” hacks to hide them, as this can have peculiar effects indeed in XHTML. - Always use a DOCTYPE. Always. Select the one which best fit the markup language you have written. Don’t select it based on whether or not it will change a particular browser’s rendering mode. You have, in effect, two choices: XHTML 1.0 Strict or XHTML 1.1 (footnote 4)
-
Configure your webserver to send XHTML documents with the
content—type
text/html
. This way the browser will always parse the content using the legacy HTML parser. Sounds illogical? Yes, and it is. However, if you want your XML to be parsed as XML and set theapplication/xhtml+xml
content—type, a majority of browsers – including Internet Explorer up to and through version 7 – will prompt a download, as they don’t understand the type. (footnote 5)
At this point in the narrative you might have begun wondering what the point is of using XHTML if, in order for it to be understood at all, you must pretend that it is oddly coded HTML – and thereby losing out on any benefits XML might give you. This is an excellent question, and with it in mind we suggest you go back and yet again ask yourself why XHTML?
I‘m feeling Fine
-
Study the HTTP
Accept
header, and study it well. Not only will it, under ideal circumstances, tell you which markup languages the user—agent support, but also to which degree it prefer one over the other. (footnote 6) -
Decide how you want to store your information. There are several options, of which
some even make sense:
- As XHTML 1.0 or 1.1. If a browser requires HTML, you simply transform the structure, using either XSL or a similar method. No information – save all the Ruby markup – is lost.
- As XHTML 2.0. The same applies, but here you’ll lose information, since several constructs in the 2.0 version cannot be transformed to equivalent structures under either 1.0, 1.1, or HTML of any version.
- As a more expressive, public or private, XML— or SGML—based language. The same effect – lost information – as for XHTML 2.0 apply.
- In a database. From there you can produce the markup language desired. Given, of course, that it can express the structures you need.
- And, finally, make sure you have a little caching turned on. After all, you are now dynamically changing the structure of documents. The question, of course, remain: why are you spending all these resources – just to support XHTML?
Yes. You could just change the content—type to text/html
and
leave the error handling to the browser. With such a suggested “solution”
in mind, please re—read the argument above regarding how XHTML is stricter than HTML …
Tomorrow
“What about XHTML 2?” – I can hear you ask. At the time of writing – June 2006 – this new markup language is a proposal, and nothing more. In time it might become a standard, and with even more time current browsers might support it.
Being a draft, the new language does not yet have a media type associated with it, but we can assume two things:
-
It’ll be
application/xml
-
text/html
will still work.
It has already been announced that Microsoft shall not support XHTML 1.0 or 1.1 in their
upcoming Internet Explorer 7. They do, however, already support XML parsing, which means
that sticking with a generic application/xml
content—type will make everything
groovy.
Not quite. IE, like Opera, Firefox, Mozilla, Safari, and any other XML—enabled user—agent
are entirely able to read, parse, and even apply CSS to a generic XML—based language,
which is what XHTML sent as application/xml
would be. Neither of them will,
today, understand any of it. For that to happen, they need to be taught
the language. We leave – as yet another exercise – the analysis of when, given the
current browser population and development cycles, it may be feasible to deploy XHTML 2
in a production environment.
Despite claims such as those made by the W3C in
http://www.w3.org/MarkUp/2004/xhtml-faq:
Much of XHTML 2 works already in existing browsers, …
, it is a trivial
task to find a user—agent today their linked example does not work in. Hint, try
finding the document through Google.
As an entirely hypothetical question, we wonder: what will happen to an XHTML 2 document
served as text/html
? It surely can’t perceived as worse tag soup than what
real life user—agents have to day with on an everyday basis. The question, perhaps, is
whether their error correction algorithms are good enough to, for instance, apply styles
to elements they don’t otherwise know anything about. Idle minds.
Notes
(footnote 1) This is
something of a myth, which springs from the fact that the XML specification
clearly states that, for all fatal errors, processing should
stop. Leaving those complications aside that relate to the
accessibility of such a philosophy, it is worth noting that
only fatal errors should be treated in this
manner. Hence,
<html><head></head> <body><html /></body></html>
would lead to an error which doesn't lead to a stop (it violates a validity constraint in not
having a title
element), while
<html><head><title>foo</head> <body><html /></body></html>
will be a fatal error for violating the well—formedness
constraint. And then we’ll stop.
(footnote 2)
Well caught indeed – yes, the Ruby elements were added
to XHTML 1.1. The support is, to put
it mildly, limited.
(footnote 3) You are right – if the browser in question understand
XHTML (Firefox and Opera do), understand namespaces (again, they do), and via them pull
in other XML—based languages it also understands – MathML,
SVG, etc. – then it can create a
coherent understanding of even semantics. Congratulations, you have stumbled across the
one reason for employing XHTML. It still won’t work outside a minority of browsers, but
it is a reason.
(footnote 4)
Yes, there exist both frameset and transitional DTDs for XHTML 1.0, but
we have purposefully ignored them. This is also why
<basefont … />
, and
<frame … />
are not mentioned above.
(footnote 5)
Yes, even with XHTML 1.1. It is a common myth, but still a myth, that
the 1.1 version cannot be sent as text/html. You would be well within the standards to
do so, tho it is recommended against. It is also … pointless.
(footnote 6)
This, however, seems terribly difficult. The reader is encouraged to
take a look at the article
The Road to XHTML 2.0: MIME Types for an illustration in how not
to go about this task. The important thing to notice is how the author suggest handling
the q—parameters in the
Accept
header. For an illustration of the exact
opposite, see the
HTTP::Negotiate Perl library by Gisle Aas.
References
-
Bray, Tim, et al.
Extensible Markup Language (XML) 1.0.
W3C. February 2004. -
Raggett, Dave.
HTML 4.01 Specification.
W3C. December 1999. -
Pemberton, Steven et al.
HTML and
XHTML
Frequently Answered Questions.
W3C. July 2004. -
Fielding, R. et al.
Hypertext Transfer Protocol — HTTP/1.1.
RFC 2616. June 1999. -
Sawicki, Marcin et al.
Ruby Annotation.
W3C. May 2001. -
Hickson, Ian.
Sending XHTML
as text/html Considered Harmful.
Online. September 2002. -
ISHIKAWA, Masayasu et al.
XHTML Media Types.
W3C. August 2002. -
Pemberton, Steven et al.
XHTML 1.0 The Extensible HyperText Markup Language (Second Edition).
W3C. January 2000, August 2002. -
Altheim, Murray et al.
XHTML 1.1 – Module—based XHTML.
W3C. May 2001. -
Axelsson, Jonny et al.
XHTML 2.0.
W3C. May 2005. -
Clark, James.
XSL Transformations (XSLT).
W3C. November 1999.
Examples
XHTML 1.0 Strict
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
XHTML 1.1
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
Acknowledgements
- David Dorward, for grammar and catching my most horrid mistakes …
- Jörgen Andreasen, for explanations of where I have assumed prerequisites the target audience cannot be expected to have …
- Bugsy Malone for the headings.