Skip to main content

The Coming HTML 5 Train Wreck (Revisited)

Kurt Cagle's picture
Posted in

On the older site, this particular article was one of the most popular - or at least the most controversial. As part of my restoration, I'm reposting it, but I also wanted to add a couple of additional thoughts I've had about the issue since I first wrote it.


Every so often developments in the technology world just baffle me - standards that I would think would be widely adopted stay on the very fringes, while technologies that seem half-baked, poorly designed, and completely counterproductive manage take off like rockets. A few of them fall into the category of "oh, this just can't end well." I'd like to formally nominate HTML 5 for that particular category.

Now, I'm all for the idea of revisiting HTML. It's been more than a decade since the last time this particular pandora's box was opened, and the state of the art has changed pretty dramatically, with the emergence of AJAX, a rather dramatic shift towards RESTful architectures and the quiet rise of XML as an intrinsic part of the server side environment, if not necessarily the client side one. Moving HTML so that it is at least well formed XML would be a huge step forward, one that more and more vendors are now doing anyway because of their recognition that the various XML tools provide a remarkably powerful piece of any workflow solution. So opening up the HTML box makes sense.

Unfortunately, somewhere along the line, bad things happened. One of the first was that the HTML 5 process ended up becoming driven largely by a small cabal of developers who all seem to have been struck from the same anti-XML mold. Far from requiring that HTML 5 be at least well formed with appropriate closures, they have done their damnedest to throw out all vestiges of XML and come up with HTML content that teeters precariously even on the edge of compliance with SGML. For those of you who were entranced by the potential power of RDFa to marry the semantic web with the "regular web", HTML 5 thumbs their nose at you - the technology is considered too advanced for all of those granny coders out there, and CURIEs, which use a namespace like prefix notation for building a better microformats format, are tainted by the presence of the too obscene to be mentioned colon character ":".

At Balisage a few months ago, this particular facet was brought up in a heated discussion with some of the biggest names in XML - from Uche Ogbuji and Micah Dubinko to Liam Quin - trying to find some kind of namespace harmonization mechanization that could be used within HTML 5. The solution they came up was nicely elegant, and could serve as a model for improving namespaces in the future in the XML world. However, by all indications, its fallen on completely deaf ears in the HTML 5 community, as has a related extensibility proposal by Microsoft (a good indication of just how mature these deliberations are can be found in these IRC threads from a recent WHATWG meeting). The running consensus from this cabal seems to be that since they personally don't have any use for extensibility beyond their pristine garden language, then no one else does either. For a supposedly international standard, this naivity is not just discouraging, its frightening.

Similar problems can be seen with the widespread and extensive use of binary attributes in the HTML 5 spec. Binary attributes are attributes that don't take a specific value, meaning that they are simply tokens in an HTML tag. Such attributes currenly in use in HTML 4 include such thing as the @selected attribute for elements and the @checked attribute for input controls. These attributes are not only a massive source of confusion for HTML developers but are also (deliberately?) not XML conformant - they require special HTML processors to handle them. This makes them a pain to store in XML databases (you have to store them in text, losing a lot of the indexing goodness that comes with having XPath compliant XML), a pain to parse, a pain to transform. They are on the wishlist that almost everyone has for an improved HTML spec, except, apparently, for the WHATWG.

Oh, there is a tag-along XHTML spec as part of the document, though its development has been stunted from birth. There's almost no real documentation on the XHTML side, not even answers resolving such issues as what to do with these aforementioned attributes, and I fully expect, given the very autocratic manner that this whole spec has taken through the development process that it will be quietly jettisoned between candidate recommendation and formal recommendation status. Earlier this year, the XML side of the W3C, in a bid to promote harmonization around an HTML standard, quietly shut down their XHTML 2.0 development efforts and agreed to play in the HTML sandbox. The problem, of course, is that these experts, many of whom were around to create the original HTML standard, are being treated like five year olds by eight year old bigger brothers, their own efforts ignored because the "big boys" don't really want to play with them.

In principle I applaud the idea of saying we need to have a better containment model for documents, but after reading the spec I'm rather at a loss to tell what many of these structures actually look like, or how they interact. I'm trying to figure out the difference between a element and a regular

, particularly problematic since it's often hard to determine what constitutes a navigation space in most contemporary web portals; A better capability sounds intriguing, but even here the ambiguity of the spec made me scratch my head repeatedly trying to figure out what exactly it was supposed to do. and support are welcome additions, and should go a long way towards creating better multimedia support in web pages, but even here the spec spends a great deal of time discussing the element without telling me exactly WHAT this element is supposed to be doing).

Ambiguity is unfortunately one of the consequences of standards development - the larger the spec and more complex the task at hand, the more potential points need to be nailed down, so a certain level of ambiguity in a working draft simply indicates that it is, well, a working draft. However, HTML 5 is seemingly being pushed inexorably towards recommendation status with most of these ambiguities still intact, with almost no community feedback (beyond the half-dozen or so committee implementers) and with remarkably little rigor being applied to what is, without a doubt, one of the most important potential specifications to emerge from the W3C in a long time.

My prediction? We're looking at a train wreck about to happen. Vendors will implement those parts of the HTML 5 spec that happen to best fulfill their own particular objectives, and will be sloppy about implementing anything else - sloppy specs produce sloppy conformance. We'll be back to the days of the late HTML 3 spec, where web designers despaired of having their web pages act even remotely consistently between browsers, where coders will continue to learn bad habits that not only create more headaches for other coders but also contribute to the overall cost of products, will have web browsers on the desktop that are increasingly out of step with their XML-compliant counterparts in mobile devices. Already, I'm hearing from people who should know better that HTML 5 shouldn't be seen as a complete spec, but as a grab bag of features that can be implemented or not as budget and desire allow.

My hope? At some point in the very near future, older and wiser heads will prevail, will take the kids out of the sandbox and take them inside for a bath, then spend some time cleaning up the mess. There are some good ideas buried in HTML 5, but shoveling the muck to get to them is probably not worth the effort.


This article, when it was originally published, drew a lot of commentary and not a little bit of ire from developers, standards gurus and others, enough so that I had to think long and hard about my own position.

I was a little disturbed by the infantilism of some of the responses - a great deal of hay was made around the fact that the last version of this site didn't properly validate, primarily due to a bug which I went to some lengths to finally track down. To call me out on this was legitimate, to turn it into ad hominem attacks was not, and only seems to confirm at least a few of the assertions I made above.

However, despite a few bad eggs, there were some people that also made some very good observations. The first is that in terms of specificity, HTML 5 is far better specified than HTML 4. I looked - it was. The interfaces are tight and well designed, there has been a lot of thought given to some of the more complex issues involved, especially with regard to video and audio, which, from personal experience can be maddeningly difficult to get right.

Not meaning to be self-serving, but my own experiences after looking closer at the spec made me decide to experiment with these components and write up the results, which I've since had published on DevX as http://www.devx.com/webdev/Article/43324. Similarly, I spend some time looking at the forms interfaces, and wrote up those experiences as Exploring HTML 5 Forms.

Similarly, while I still contend that the process with regard to HTML 5 needs to be more sensitive to namespace issues and other factors that come about because of XHTML co-implementation, there are some signs that this particular issue is gaining sufficient visibility that it is being at least revisited by all involved. (I particularly like this recommendation by Liam Quin http://www.barefootliam.org/xml/20091111-unobtrusive-namespaces).

My central concern is, and long has been, unchanged - that, in the rush to implement HTML 5 specification functionality, vendor browsers don't decide to cut corners by dropping support for the XHTML 5 side of the spec. It may seem a comparatively small share of the market to those same vendors, but being able to have that support is critical to those of us who work in the XRX space.

I'm not completely sold yet on HTML 5, even though I've been writing about it fairly extensively of late, but nor am I at the stage anymore where I'm deeply suspicious of the motives of the players involved.

Re: The Coming HTML 5 Train Wreck (Revisited)

Dominique Rabeuf's picture

Without XML Schema HtmL5 is just nothing (much a do about nothing as Shakespeare wrote)