Plans for world DOMination

Draft
Raph Levien, 27 March 1999

See also: Design considerations for a Gnome DOM and Proposal for a Gnome DOM Engine . The Gdome mailing list archives may also be of interest.

For some reason, lately I find myself rather obsessed with the Document Object Model. It seems to me that with some good design and implementation, it could serve as a very central component of Gnome's component integration, and I think could drastically simplify what people have to do to write polished applications.

First, I'm going to explain roughly what I'm trying to do. Some of the pieces will probably be old news for many of you, and other pieces will be rocket science. Bear with me.

I want to build Gnome applications around the concept of the DOM. A DOM, very briefly, is a generic tree structure (think XML) plus a bunch of API for accessing in a uniform way. The DOM CORBA interface has already been standardized by the W3C.

I'm going to use a type1 font editor as a running example here. Currently, gfonted has its own data structures for representing the font, pretty much what you'd expect from a hardcore C programmer (rtfs if you're really curious about the details). To go the DOM route, I would ditch all these hand-rolled data structures and represent it as an XML document instead. Something like this:
<font>
 <glyph name="A" width=650>
  <bezier>
   <moveto x="0" y="0"/>
   <lineto x="700" y="300"/>
   <lineto x="0" y="600"/> 
   <closepath/>
  </bezier>
 </glyph>
</font>
As you can see, not rocket science, and actually a moderately sane way to represent a font internally. There's also some XML-based "Font Description Language" work out there (Apple folks, I think) that we might be able to just use.

Ok, so how do I structure my application? We'll start with how to display the bezier on the screen. The core interface here is a function that takes a bezier DOM node as an argument and returns a canvas item. Pass in this particular bezier node, and you get a canvas item that displays a triangle.

Now for cool part #1. when you invoke this function, it also attaches gtk+ signal handlers to the bezier node so that whever something changes inside the bezier, the canvas view updates as well.

To edit the bezier curve, there's a module defined somewhere that takes mouse events from the canvas and updates the curve through the DOM api. Thus, whenever you update, the signal handlers will invoke the appropriate code to update the screen.

A lot of the framework for the application can now be made quite generic. A native file load/save function is trivially implemented by just plugging in an XML parser. Further, a generic undo capability can be defined by writing a module that expands the DOM by adding "undo_begin", "undo_commit", "undo", and "redo" methods. The back end of this module is the generic DOM. So if you do something and then invoke the "undo" method, it just sends the appropriate DOM edits to the back end. A plugin interface (about more later) can also be made quite generic.

Now for cool part #2. Containers can also be built the same way, and the pieces just compose. I'm thinking of a simple example of a "vbox" container. For this to work, the canvas items need to be extended with some kind of size negotiation methods. The vbox node to canvas function simply invokes node to canvas methods on each of the children, goes through size negotiation, and positions each of the child canvas items in a canvas group.

The size negotiation should of course be done with "request" signals from child to parent and so on, so that if a particular element needs to change size, the parent gets notified, and moves stuff around appropriately. The container need only maintain child items for children that are visible in the canvas window, so that all these operations remain constant time even for huge scrolling vboxes (the canvas code may need some extensions for visibility. Federico?)

All the other forms of containment are equally possible, including tabbed notebooks, tables, layers, etc.

I hope you're now starting to see how this might decrease the amount of work needed to build an app.

Ok, now for the hard parts. I've mentioned the mapping of node to canvas item in a general way, but haven't said how it's done. Writing these functions isn't such a big deal, but in a large app, finding which function to use may not be trivial. Basically, you need a map that specifies such a function for each (interface, node type) pair, interface being "make gnome canvas item" in this case. Within an app, you just keep this as a table, and register the various components in. I.e. you'd register "bezier_component_to_canvas_item()" at app startup. Then, you'd have a generic function to map DOM nodes to canvas items, and when it came across a bezier node, it would call that function. Similarly for vbox nodes, etc.

All this is pretty straightforward within an app. The "type" of a node can simply be the tag name, and the map described above uses the appropriate tag names for the XML structure chosen by the application. But when we move to the plugin and interapp communication case, we instantly have a problem. "bezier" in my app means something different than Gimp's "bezier" node type. How do we find out what is really meant?

Fortunately, XML namespaces provide an approach to this problem. If a component is going to exist in interapp contexts, then we have to define a namespace for it. The XMLNS spec uses URI's to identify namespaces. Thus, we'd have "http://www.levien.com/gfonted" as the namespace for my bezier component, and "http://www.gimp.org/" as the namespace for the Gimp one. As far as I can tell, these URI's aren't actually resolved at any time, but that's likely to change. It would also seem likely that www.gnome.org would get into the namespace assignment business. I personally would say that these assignments are as important as MIME content types. That's a sterling example of a namespace assignment done well, and there are plenty of cautionary tales of namespace assignments done really badly (ASN.1 object identifiers come to mind immediately).

Thus, the map now defines a function for each (interface, namespace, tag name) triple. It's not hard to imagine a global map that specifies .so files for plugins. Thus, if gnumeric comes across a bezier curve, and the node type resolves correctly in the table, it would dynamically load the appropriate .so file containing the converter to canvas item.

Managing this system-wide map is quite nontrivial (you want all its entries to goddamn work, plus you don't want to be missing any important ones), but it seems to me like this is a problem Gnome will have to solve anyway if it is ever going to fly.

I've talked about the canvas interface, but it's quite clear that there are a number of others worth considering. For one, you can convert to Gtk+ widget, or perhaps CTree. The latter example is particularly interesting because it gives you a generic XML editor. I think this is particularly useful for lazy application developers who don't want to write display and edit engines for every single node type in the universe. In other words, it's the Y2K equivalent of using vi to edit application files directly :)

Plug-ins can be written in one of two ways. The most obvious way is to export the document structure to the plugin using the CORBA DOM interface. This has the neat property of allowing interactive modifications by the plugin.

Even simpler plugin architectures are possible. For example, you could just have XML on stdin and stdout. If you ran such a plugin on a document node, it would spit XML out, then parse the XML returned and stick it into the DOM. This seems to open up extremely lightweight (in terms of loc) scripts.

Ok, so what's needed? For one, the Gnome DOM needs quite a bit of work. Daniel Veillard has gotten a good start on gnome-dom, but I see a few initial problems. For one, there is no mechanism for gtk+-style signal handlers. This is the main thing I need right now (I'm happy to build non-corbafied apps that hook the dom for the time being). Also, I'm concerned that the tree representation Daniel has chosen is profligate enough in memory usage that it could discourage app writers (especially those who have invested time and energy in nice compact data structures). I'd like to define a DOM interface for people to use that does not allow traversing the node structure directly, but uses accessor methods. This would allow a simple implementation such as Daniel's, but would also allow much more compact representations. With sufficiently clever code, tree layout in memory could be so good that it would be a compelling reason for apps to use the dom as opposed to handrolling. You could even gzip pieces of the tree that haven't been touched in a while.

This is all pretty roughly formed in my mind right now. There are enough layers of abstraction to be really concerned about both performance and impenetrability. There is enough complexity in the alphabet soup of TLA's to frighten away even experienced programmers. But if it flies, I think it could really have profound implications for Gnome

Raph

links for the hotbot-impaired:

DOM: http://www.w3.org/DOM
XML: http://www.w3.org/XML
XMLNS: http://www.w3.org/TR/REC-xml-names/
RDF: http://www.mozilla.org/rdf/doc/

random notes follow:

I'm concentrating on rendering to canvas as the main interface. This is partly because I like the canvas, but also because I had a chance to test-drive Ian Main's edox. This is still in pretty early form, but he uses gnome-canvas as the rendering back-end, and it has a very solid feel. A lot of the "alpha" word processors I've seen have jerky cursor movement and flickery redisplay, but I think edox proves that the canvas is a viable rendering platform for this type of app.

I also turned on the aa flag just for fun, and found that performance degraded noticeably but I'd still consider it in the usable range. There is quite a bit of optimization left to be done on the aa canvas renderer, and of course chips are constantly getting faster :)

One other interface I'm considering is format conversion. For example, SVG (scalable vector graphics) is a DOM-based format that looks like it's going to be powerful enough to display just about anything you'd want to use a canvas for. I'm envisioning a plotting component that just takes simple lists of numbers as input and produces SVG as output. This opens the possibility of applications being able to load an SVG renderer into their address space, but including the plotting component through DOM.

Further, leveraging the power of SVG ought to make writing such a plotting component easy. It's not hard to imagine writing a simple plot component in a few dozen lines of Perl.

I'll want to attach gtk+-style signals in two ways: notify me when this node changes, and notify me when this node or any of its children change.

I'm talking to Elliot and Chris about how to actually implement a DOM representation with a flattened representation. The best idea we've come up with so far is to store unique node id's for each node in the flattened representation (i.e. a 4 byte per node overhead), then store a small cache mapping node id's to physical locations in the flat representation. In case of a miss, you can search the tree to find the tag. The ORBit object id for objects of type Node would include the node id. DOM traversal methods would always insert the resulting node id into the cache. Thus, remote tree traversal starting from the root would always hit in the cache.

It's not hard to see that this would be a workable framework for configuration files. Support for "advanced" features such as "hot config" would be quite straightforward.

levien.com Gnome home