Skip to main content

The pain of XML in Web 2.0

aspyker's picture

I have continued to think about the end to end XML story across database, middle tier, and the browser client. I have talked to many organizations that work with standard industry XML documents (HL7, OAGIS, ACCORD, etc) where the XML view unifies the data of their entire enterprise. To these organizations, they work data out from the message queues to data storage to middle tier. However, what does it look like when they want to expose this data to the web tier? There are products that handle this well like Lotus Forms based on XML centric standards like XForms. But what about Web 2.0 libraries like DOJO or jQuery? I took the download sample I described here, and tried to visualize the data to Web 2.0 webpages. The data format of the XML of interest is:

<downloads>
  ... Lots of data before this summary ...
  <monthByMonthDownloadStats>
    <month codeDownloads="100" docDownloads="100" ending="2009-11-30" starting="2009-11-01" totalDownloads="200"/>
    <month codeDownloads="200" docDownloads="200" ending="2009-12-31" starting="2009-12-01" totalDownloads="400"/>
    ... many other months here
  </monthByMonthDownloadStats>
  ... Other summaries here ...
</downloads>

First, I looked at the DOJO bar chart code and did something like this in a server side XQuery program that generated HTML with the following under JavaScript:

  var chart1 = new dojox.charting.Chart2D("simplechart1");
  chart1.addPlot("default", {{ type: "Columns", gap: 5 }});
  chart1.addAxis("x", {{ labels: [
  {
    for $i in (1 to $numMonths)
    let $label := concat(
      '{value: ', $i, ', text: "', my:getMonthLabel($i), '"}',
      if ($i eq $numMonths) then '' else ',')
    return $label
  }
  ] }});

This works as XQuery can return sequence of primitive types and in this case, I'm just returning a string and inserting it inside of the JavaScript code that expects value/text values. But what if I want to have a REST endpoint serve up XML directly and have the browser consume it? DOJO DataGrid can read from a DataStore which can be hooked to an XmlStore. This means I can use a browser side control to read from my server side XML. All seems good until you get into the details. Here are snippets of the code to make this "work":

<div dojoType="dojox.data.XmlStore"
 url="http://server.com/summary.xml"
 jsId="summaryStore" label="title"
 attributeMap='{"codeDownloads":"@codeDownloads",
   "docDownloads":"@docDownloads",
   "starting":"@starting"}'

 rootItem="month">
</div>
<div id="grid" style="width: 600px; height: 300px;" dojoType="dojox.grid.DataGrid"
  store="summaryStore" structure="layoutDownloads" query="{}" rowsPerPage="40">
</div>
<script>
  dojo.require("dojox.grid.DataGrid");
  dojo.require("dojox.data.XmlStore");

  var layoutDownloads = [
    [{
      field: "codeDownloads", name: "Code Downloads", width: 10,
      formatter: function(item) {
        return item.toString();
      }
    },
    {
      field: "docDownloads", name: "Documentation Downloads", width: 10,
      formatter: function(item) {
        return item.toString();
      }
    },
    {
      field: "starting", name: "Starting Date", width: 10,
      formatter: function(item) {
        return item.toString();
      }
    }
  }*/]];
</script>

What are the some of the issues with this? First, the XmlStore has to map to a simpler format for the DataGrid to understand the XML data. That is why I had to manually tell the XmlStore to promote all the attribute values to similarly named element names. Nicely, the XmlStore supports allowing the ability to drill down to something other than the root item for the data, but it really just allows you to pick the name of an element (you'll see I specified "month").

The second problem is that for any complex industry specific data, likely that wouldn't be sufficient. What if I had multiple month elements at different parts of the XML tree? I'd end up getting a table that combined months that meant different things. What I'd really want is XPath as the root selector. Third, even though the Store abstraction is nice for handling multiple data formats, if I wanted data to be combined from different parts of the XML tree or multiple trees, what I really would like is XPath from the DataGrid formatter function itself.

Assuming this might be easier in the other very popular library for JavaScript query, I went off an investigated jQuery. I quickly found articles that talked about jQuery and XML. I patterned the next part of the article after this example. So, rewriting, I ended up with:

<script>
  $(document).ready(function(){
    $.ajax({ type: "GET", url: "http://server.com/summary.xml", dataType: "xml",
      success: function(xml) {
        $(xml).find('monthByMonthDownloadStats').find('month').each(function(){
          var cd = $(this).attr('codeDownloads');
          var dd = $(this).attr('docDownloads');
          var st = $(this).attr('starting');
          $('<div class="items" id="month_' + st + '"></div>').
            html(
              '<h2>Month starting ' + st + '</h2>' +
              '<p>Code Downloads:  ' + cd + '</p>' +
              '<p>Doc Downloads:  ' + dd + '</p>'
          ).appendTo("#page-wrap");
        });
      }
    });
  });
</script>
<body>
  <div id="page-wrap">
    <h1>Reading XML with jQuery>/h1>
  </div>
</body>

Now, with jQuery, I'm actually able to do a little more "native" xml query. You'll see that I can access attributes directly. You'll see that I can navigate only to the months or the monthByMonthDownloadStats. However, as someone that knows XQuery, this syntax seems very unnatural (I'm sure it's very clear to JavaScript and/or CSS writers). Unnaturalness aside, this seems more verbose. In XQuery I can write this like:

<div id="page-wrap">
<h1>Reading XML with XQuery</h1>
{
  for $month in downloads/monthByMonthDownloadStats/month
  let $cd := data($month/@codeDownloads)
  let $dd := data($month/@docDownloads)
  let $st := data($month/@starting)
  let $id := concat("month_", $st)
  return
    <div class="items" id="{$id}">
      <h2>Month starting {$st}</h2>
      <p>Code Downloads:  {$cd}</p>
      <p>Doc Downloads:  {$dd}</p>
    </div>
}
</div>

With this I get all of the same benefits that jQuery has (plus more - I'm almost sure jQuery wouldn't support the rich Functions and Operations of XPath 2.0 or any mixed XML content common in document centric XML approaches). XQuery mixes the construction of the content with the query of input much better in my opinion (I believe if we showed date comparison for example you'd see a worse comparison). Of course the benefit of jQuery over XQuery is XQuery doesn't run in the browser. I had to run the previous XQuery sample on the server. That is a pretty big benefit.

I think the summary of all of this, if you stayed with me this long, is that Web 2.0 technology in the browser isn't really ready to handle the complex XML documents that exist within most enterprises. This means if you want to marry Web 2.0 with the enterprise XML data, you'll need to write data conversions essentially extending the presentation tier across the browser and middle tier that simplify the data or use feature like the Web 2.0 Feature Pack to do this for you.

Also, you'll need to learn two languages (arguably three if you consider jQuery a language) and programming styles when dealing the with XML data. Given I look at WebSphere XML Strategy, I'm not sure I'm happy with this answer. I am currently looking towards other solutions to this issue. Given I'm rather new to Web 2.0, feel free to point out other things I didn't consider in the Web 2.0 space for XML processing (outside of XForms of course).

Re: The pain of XML in Web 2.0

Andy Bunce's picture

I am a great Dojo fan, but it is sadly true it is not very XML friendly. XForms has problems both with finding a cross browser implementation, and that it's widgets are difficult to extend, for example a calendar that styles particular dates differently based on other data. Tasks like allowing a user to sort a list, seem to me, to be difficult to modularise for reuse in XForms, and doing then thinking about this via drag and drop?

However I just seen http://www.betterform.de/ This claims to implement XForms 1.1 in Dojo and work with eXist. The dream team?

Re: The pain of XML in Web 2.0

Dominique Rabeuf's picture

I see no reason to cling to libraries like jquery or dojo in order to process XML data while using the interactive and flexible mechanisms on which the so-called Web 2.0 is based (mostly Ajax) when one have an XForms client.

XForms natively provides with submission of instance an equivalent to Ajax operations.

Do not forget that Web 2.0 is not a set of technical specification.

About XPath2 / XSLT2 I do not think that browsers will support them before a long time.

XSLT1 is quite well supported by all recent browsers, while XPath1 is not uniformly supported by these browsers.

Re: The pain of XML in Web 2.0

aspyker's picture

As an XML guy, I agree.  However, in talking with non-XML experts who are writing Web 2.0 its VERY hard for them to consider a XForms based approach - especially for data oriented folks.  Given XForms is still a browser plugin and Ubiquity is still evolving in terms of support, its a hard sell.  Maybe easier for truly document oriented folks who are open to more risk and/or ok with more browser side pre-installed requirements.  Also, considering the HTML5 direction away from well formed XML, I can't say even the technical specifications are moving in our direction.

Re: The pain of XML in Web 2.0

Kurt Cagle's picture

You've hit one of the biggest problems I've seen both in XForms adoption and in web development in general. For the most part, web developers are not declarative programmers. Many come from one of two camps - either the Java/C++/C# developer who sees everything as classes and compiled code, and for whom tools such as GWT are considered the height of web development environments, or the PHP/Ruby/Python crowd who for the most part are looking for the best set of framework tools to minimize the development effort. Most aren't generating XML on the server side, and see XML as being something foreign and unwieldy because it doesn't fit into their notion of classes and class encoding.

This is why XQuery is so very important. XQuery is, especially in conjunction with a low level REST interface, a language designed to work with XML within context - you never have to worry about invoking parsers, establishing transformers, running validators, creating input and output pipes and so forth, because these things flow naturally from the underlying data model. Once that happens on the server - once you are serving XML to the client - then XForms provides a logical UI for this. However, without the XML context from the server (and a move towards RESTful services), XForms is fairly useless.

XForms is going to be a secret weapon for some time. I've been working on updating a PHP legacy application, and it's a real nightmare because, even though at some point there is XML being sent to the server and retrieved from the server, the PHP application is built in such a way that it takes almost no advantage of the XML-ness of the inbound or outbound streams, and as such, things that should be simple - adding a new field into a data model, instead become complex and cumbersome to do. Yet I can (and have) built XForms applications in minutes that might take days to build, debug and deploy otherwise. After a while, results like that will begin to get attention.

Re: The pain of XML in Web 2.0

Kurt Cagle's picture

Andrew,

I've reformatted your article. In general, if you are working on something that contains embedded XML, HTML or other languages, your best bet is to set the Input format (just below the text box in the edit view) to Filtered HTML. I'm actually going to set that as the default. You can then format content by using the [geshifilter-xml]....[/geshifilter-xml],  [geshifilter-javascript]....[/geshifilter-javascript] or generalized [geshifilter-code]....[/geshifilter-code] macros.