Gnome World DOMination

Raph Levien, 14 Apr 99

An earlier draft of this document is online, containing some more detailed design discussions.

The Gnome DOM (Document Object Model) architecture promises to create a framework for seamlessly integrating smaller components into polished applications. Authors choosing to write DOM-based applications need only write code for rendering and editing the document fragments specific to the application. Much of the remainder of the application is provided by the framework, including XML-based loading and saving, undo, and plugins.

Perhaps the most exciting implication for Gnome is the ease with which components can be reused and recombined. For example, once the rich text component is done, authors can include rich text in their applications simply be enabling the component.

The hope of the Gnome DOM architecture is that authors will write a library of useful components--manageable, self-contained packages of code to render and edit document fragments of a specific type. Once this library is in place, other authors should be able to assemble these components into complete applications with a minimum of work.

The central piece of the Gnome DOM architecture is Gdome, the Gnome DOM engine. Generic DOM engines have three major shortcomings that are addressed in the Gdome design:

Forcing apps to use the DOM's data structures for storage of all the app's state.
Extravagant memory usage--10x the document size is typical.
Garbage collection is difficult, particularly in distributed operation such as with plugins.

As we shall see below, Gdome addresses these issues by using the "sliding DOM" concept. In short, the sliding DOM interface shifts some of the responsibility of keeping track of changes in the tree structure from the server to the client. In the traditional DOM scenario, clients are written in scripting languages and servers are generally implemented in web browsers. In the Gnome World DOMination scenario, by contrast, the emphasis is on making applications as easy as possible, including integrating with existing codebases that use their own data structures to represent document contents and state.

The sliding DOM is implemented as an extension to the standard DOM interface. All of the traditional DOM methods are available, and code using the Gdome interface interoperates fully with standard DOM clients and servers. When not using the sliding DOM interface, memory usage increases to levels comparable to generic DOM implementations.

Components use the Gnome Canvas as their primary rendering interface, and can make use of the full canvas imaging model, including the alpha transparency, antialiasing, rotation and other advanced features of the Canvas's antialiased rendering engine.

Gdome supports a "model/view" organization. The model (i.e. document contents) is stored in the DOM engine. Views (of which there can be multiple per model) are rendered in the canvas, and automatically updated when the document contents change, using the DOM's listener/event mechanism. Thus, writing code to edit the document is as simply as modifying the document tree using the Gdome interface.

Gdome also exports the DOM interface (extended with the sliding DOM) through CORBA, supporting plug-ins, scripting, and other forms of distributed applications. Code written to the Gdome interface automatically negotiates the sliding DOM extension when connecting to a Gdome remote server.

Why DOM?

DOM is fundamentally a module for storing document contents and state in a tree structure. The DOM is designed to capture the structure of XML documents but not the syntax. One way of thinking about XML and DOM is that XML is a serialized human-readable representation of DOM contents. Thus, loading and saving as XML files are automatic operations for any DOM-based application. Loading and saving non-XML file formats can be done, but of course requires custom code.

A DOM engine stores document contents and makes them accessible through one or more interfaces. The DOM Level 1 recommendation specifies several of these interfaces (basically Java and ECMAScript language bindings and a CORBA IDL). The Level 2 working draft proposes a number of extensions, most importantly an event/listener interfaces for keeping DOM clients notified about changes to the document.

To these basic interfaces, Gdome adds simple C language bindings (based closely on the IDL to C language bindings specified by CORBA) and extra sliding DOM access methods. These additional methods make Gdome more appropriate for the Gnome context of high performance C applications, as opposed to the original design context of the DOM.

The origins of the DOM were to standardize JavaScript language bindings for modifying HTML document appearance. A ubiquitous example on the Web is "roll-over" replacement of images in the document. In this example, the JavaScript code attaches listeners to UI events (such as onMouseOver, etc.) then replaces the images by setting the img.src attribute. The DOM implementation implicitly listens to these mutation events and updates the screen.

The Gdome architecture makes the listening to mutation events explicit. It is not the responsibility of the DOM engine to directly update the screen. Rather, when the document is displayed, the display code attaches listeners to the tree, and mutation requests are forwarded to the display module, where the actual screen update takes place.

Component rendering architecture

Aside from the DOM engine itself, the core of the DOMination architecture is support for rendering individual document fragments of the components and composing them into a unified display.

Individual renderers take a fragment of the document tree as an argument, and return a DOMination Rendering object, which contains size info and a Gnome Canvas item. The renderer also installs event listeners at this time, so that when the fragment changes, the Canvas item and size info automatically get updated.

For simple "documents," this method is all that's needed--the application just passes in the DOM tree and displays the resulting Canvas item. However, the real power of the DOMination architecture comes from rendering contexts and DOMination containers.

The primary function of rendering contexts is to dispatch different renderers depending on the tagname of the DOM node. For example, in a simple HTML-like document structure, imagine that we have a paragraph renderer, an image renderer, and a table renderer. The rendering context would contain a mapping that looks something like this:
   <p>      -->  render_paragraph
   <img>    -->  render_image
   <table>  -->  render_table
Now, to render a document fragment, you don't have to know which renderer to invoke--you just look it up in the rendering context.

Rendering contexts make generic containers practical. Consider the table example--the table renderer simply invokes the appropriate renderer for each cell, lays out the table using the returned size information objects, and places the returned Canvas items into a Canvas group. It also places event listeners on the DOM for inserting and deleting cells, and also on the size info for redoing the layout when cells change size.

This framework is sufficient to build all kinds of generic containers, including hbox and vbox, tables, frames, collapsible trees, layers, and so on. If these, and the leaf node renders are pre-made components, all the application author has to do is construct the rendering context referencing them, invoke the toplevel renderer, and display the Canvas item it gets back.

Composition of DOMination components is very much like composition of UI elements. The primary differences are the full power of the Canvas imaging model (as opposed to the rectangular windows of widget allocation) and opening up the data structure so that it can be loaded, saved, edited, etc.

For the size info, I plan to reuse the size negotiation logic from Gzilla. This was designed to support word wrapping and related forms of size negotiation, in which the height of the item can depend on the width allocated (i.e. to display a paragraph of text, you can have a tall skinny box or a short wide one). This logic was also designed to support HTML tables (even though Gzilla doesn't actually implement these yet).

Generic application services

One of the great promises of the DOM approach is that it enables generic implementations of application services such as loading, saving, undo/redo, and plugins.

Loading and saving are perhaps the easiest--it's just a question of hooking an XML parser to the DOM. The Gdome prototype already contains hooks to Daniel Veillard's Gnome-XML library, so this is not a problem. Of course, this is for "native" XML file formats only. To load and save other file formats, you'll need to write the conversion code.

Undo is not quite so trivial, but is still conceptually simple. The "undo engine" can be implemented by listening to and recording all mutations to the tree, then playing back the reverse mutations when called upon to undo. The cool thing here is that it only needs to be written once for all applications that use the DOM.

Similarly, a generic plugin module can be developed that takes care of launching the plugin, passing it a CORBA object reference, cleaning up afterwards, and keeping track of which plugins are available. The actual work of communicating the document to the plugin and getting the modifications back is already taken care of--it's the generic CORBA DOM interface (with optional sliding DOM extensions, described below).
Plugin renderers

Let's say you want to build a component for displaying charts. You want to design it so that the DOM contains the chart data at a fairly high level, so that for example you can switch between bar and pie charts by changing an attribute.

Now, it would be nice to be able to embed such a chart into a word processing document, but the word processor knows nothing about your chart datatype.

One solution is to enable dynamic loading of renderers. On startup, the application can load a (systemwide maintained) table of tagnames and the corresponding .so files containing renderers. Then, when the word processor comes across the <chart> tag, it loads the .so file and calls that to render.

Another possibility is to convert from one fragment type to another. For example, maybe the word processor doesn't understand your charts, but does know how to call the appropriate library to render SVG (Structured Vector Graphics). It can invoke a plugin to do the conversion (not necessarily in the same address space) and render the SVG it gets back.

It's not hard to imagine implementing a simple such converter in a few dozen lines of Perl.

Note: one concern here is that if not all the tagnames "belong" to the same application, there may be conflicts in the space of tagnames and thus ambiguity in which renderer to invoke. We need to work out how we want to manage this. Partial answers may resign in the XML namespace and RDF specifications.

Scaffolding

One of the other areas where the DOM architecture may be helpful to developers is to provide scaffolding during application development. The core modules of any app are rendering and editing. However, in the absence of these modules, totally generic tree viewers and editors may be substituted. Even after the rendering and editing modules are complete, the tree view may be helpful for more precisely visualizing internal application state and putting plugins through their paces.

Not just Canvas items

I anticipate that most applications will use the Gnome canvas as the basis for rendering. However, several other interesting possibilities exist. For one, it may be desirable to render to Gtk+ widgets. Gtk+ widgets may be more appropriate for rendering in some cases, for example to use a CTree to render generic XML trees. It may also be useful for implementing XML-based UI building tools such as XUL and Glade.
Another interesting rendering target is not an on-screen display at all, but rather to a printing engine such as Gnome-print.

Not just DOM

The "rendering context" framework may be useful outside the DOM context. For example, the implementation of the <img> tag may be to receive a byte stream and MIME type for the image, either from the file system or over HTTP. The MIME rendering context dispatches the renderer based on the MIME type (for example, a JPEG decoder for image/jpeg), which converts the byte stream to a DOMination Rendering object. This approach is similar to the GzillaGzwWeb dispatching mechanism in Gzilla.

It's worth noting that MIME dispatching has none of the namespace issues of XML tagnames, as the MIME content type namespace is well managed by the IANA. With no significant exceptions, no two distinct content types map to the same MIME content type, and conversely, each content type has a single MIME name.

Of course, going outside the DOM framework leaves behind some of the integration advantages, such as generic load/save and undo. But the DOM is simply inadequate for binary data such as images. It may be worth considering a document type that includes a forest of DOM trees and "blob" objects. RDF deals with some of these issues and deserves a closer look. Another possibility is to attach a MIME type to each DOM tree and blob, and use the MIME type to choose a single rendering context for all the content it dominates.

The sliding DOM

The basic DOM design, while quite appropriate for its original purposes, runs into problems when exporting highly structured application data over a CORBA interface.

The fundamental problem is that the CORBA interface exports object references that must persist across changes to the tree. Even nodes which get removed from the tree are still accessible through a CORBA object reference--a nightmare for trying to reclaim the storage used by deleted nodes (at least until distributed GC for CORBA becomes reality). A lesser but still significant problem is keeping track of the mapping between object references and locations within the document even as the tree is rearranged.

These are serious concerns because one of the goals of Gdome is to allow efficient custom representations of parts of the tree. For example, a charting application may store a list of (x, y) data points simply as a flat array of doubles. Doing it this way gets you better compactness of representation, processing speed, and integration with existing codebases.

The problem is that a flat representation such as an array of doubles has no place to store removed nodes, nor does it automatically keep track of the location of object references as specific (x, y) nodes as they move around due to insertions and deletions.

Note that the "traditional" DOM implementation of allocating a unit of storage for each node doesn't have these difficulties with mapping. So, the DOM contains an inherent bias towards node-based representations and away from flat representations. Using nodes doesn't help reclaim leaky storage, however.

The Gdome solution to these problems is the "sliding DOM" interface, designed as both an internal interface for accessing custom representations, and as an extension to the standard DOM interface exported over CORBA. The essence of the sliding DOM is that nodes are accessed by their position in the tree rather than by creating object references when traversing the tree. Thus, after a node is deleted from the tree, there is no way to access it through the CORBA interface, so the storage may simply be deleted. Similarly, data in a flat representation can be moved around without having to worry about keeping references into the tree consistent.

Gdome uses a "slide" to reference the tree--simply a list of child indices. Thus, the slide [1, 2] refers to the second child of the first child of the document root. Queries (such as getting the tagname or inserting new nodes) are resolved directly without creating object references to nodes internal to the tree.

Gdome retains compatibility with the standard DOM interface by building an explicit mapping table between object references and tree locations. It then uses the mutation event listener mechanism to keep this mapping up to date.

Of course, if you traverse a large tree using the standard DOM interface, the mapping table can grow quite large, and it may be difficult to reclaim the space. But this tradeoff is not significantly worse than a node-based representation.

The Gdome design is such that the negotiation between interfaces is done automatically. Applications may use the standard DOM or sliding DOM as appropriate, and the "Right Thing" happens.

Limitations of the DOM

I'm very excited about the possibilities of the DOMination architecture for enabling a wealth of useful component-based applications. However, the DOM, like anything else, is not "magic pixie dust" that can be sprinkled on an applications to slash development time, integrate with other modules, and (perhaps most importantly) turn them into full-fledged distributed systems. Yes, CORBA does give you remote invocation of objects, but it doesn't really deal with locking, transactions, distributed garbage collection, partial failure, and so on. It also doesn't automatically solve performance problems and in fact creates some of its own.

Probably the biggest single problem is that DOM operations are very fine-grained, thus a great many of them are required to traverse a large tree--each incurring its own roundtrip latency. Given that interprocess CORBA latency is about 0.3 ms on a good Linux box, it's clear that this can result in noticeable sluggishness. Things get much worse when you export DOM interfaces across the Internet. It's tempting to try to build things like distributed whiteboards on top of the DOM, but I wouldn't recommend it.

Open issues

This whitepaper presents an outline of how we plan to build the DOMination infrastructure and applications. However, there are quite a few open issues. I believe that the best way to resolve these is to build the prototypes and see what happens.

Some of these issues include:

Namespaces of XML tagnames.
Images and other embedded binary data.
Locking and/or transactions.
Transitory state such as cursors and selections.
Dispatching of UI events.

Conclusion

The DOMination architecture is a framework for component software. It combines the integration advantages of the standard DOM interface with the graphics power of the Gnome canvas and the performance improvements of the sliding DOM. Generic services such as load/save and undo should substantially cut the amount of work needed to build polished apps.

Component software has had a mixed track record. While it sounds good in theory, in practice when you buy a component, you often find that it doesn't do quite what you needed, or that there are performance problems or other limitations.

The combination of free software and component architecture has the potential to be a killer. For one, individual components are far more manageable in scope than monolithic applications. Much of the vitality of the Gimp stems from the plug-in architecture, a somewhat crude but effective form of components (in fact, Peter Mattis has said that he wished that he had designed all of Gimp to be plugins). Similarly, the Linux kernel scales as well as it does largely due to the modular driver architecture.

Free software tends to be a lot more "cumulative" than the throwaway culture of the proprietary world. As more components are developed, future programmers get a growing library to depend on. It's easy to imagine that components for big, rich XML languages such as XHTML (basically HTML with XML-compliant syntax) and SVG will find widespread use.

Of course, all these rosy projections are useless if the component framework presents too much of a drag on getting the actual work done. This is exactly the goal of the DOMination architecture--to provide high performance implementations of all the interfaces needed to integrate components, then get out of the way and let the real work begin.

With luck, Gdome and related pieces will play a key role in achieving World DOMination of the desktop, extending the technical and popular successes of free software seen in the kernel and server arenas.

Join us now and share the software!

levien.com Gnome home