The configuration module of Rebar

Raph Levien <raph@acm.org>
31 Mar 2000

This document describes the configuration and dependency analysis modules of Rebar, a proposal for the SC Build and SC Config sections of the Software Carpentry design competition. These modules fall under the SC Config category. A separate submission covers the Build module. That document should be read after this one.

Rebar is a radically simple proposal, motivated in large part by the needless complexity of the existing toolset. One can expect many advantages of this simplicity. Further, if the design is found to err on the side of too simple, it should be easy to add needed features later.

A great deal of the needless complexity of the existing make/autoconf toolset comes from improper factorization of the problem. The Rebar design addresses this by proposing a single tool to take on the functions traditionally performed by both make and autoconf, as well as automake, libtool, and the gnome-config. The Rebar design is quite simple in spite of the ambitious scope.

The operation of Rebar on a project consists of three phases:

  1. Resolving system-dependent configuration parameters.
  2. Analyzing the dependencies of the project.
  3. Resolving these dependencies.

This document covers the first two steps, as these are the functions corresponding most closely to the role of autoconf and friends in the traditional toolset.

Interface

The interface at the heart of Rebar is the project file. This file contains a description of all targets in the project in a declarative format. The syntax is XML-based and is designed to be very simple. A specific goal is that many tools besides Rebar can process the format easily, including visualizing it in a graphical user interface, analyzing it, etc. Hence the first radical simplification: the project file contains no code.

Namespaces

The configuration module of Rebar depends on resolving names in three namespaces. First, system-dependent configuration parameters (such as byte order, word size, and so on) get resolved into actual values. Second, the location of libraries is resolved. Third, the actual commands needed to compile targets are resolved from the declarative descriptions in the project file.

Some of these namespaces will require coordination to avoid collisions. To this end, the namespace of libraries is hierarchical, so that for example Gnome/libxml can exist without conflict with any other library named libxml. Obviously, each level of the hierarchy must be managed to remain collision-free.

Namespace lookup vs. checking

The Imake mechanism for configuring X is based on looking up configuration variables in a system-specific configuration file (for example, /usr/X11R6/lib/X11/config/linux.cf on my system). By contrast, autoconf eschews a system-dependent config file and instead determines the values of the configuration parameters by checking at ./configure-time.

Both approaches have their merits, and Rebar will support both. In particular, lookup enables cross-building (an extension of cross-compilation) if the config file for the target system is available. Conversely, performing the checking detects problems when the config file is out of sync with the actual state of the system, a common problem noted with Imake.

Version management

A major problem faced by modern software projects is version management, especially across modules such as libraries. It is quite common for different applications on a system to require different versions of the same library. A number of mechanisms are in place to try to deal with this issue, but they are all quite error-prone, and many are dependent on the specifics of version numbering specified by the dynamic loading library mechanism of the underlying operating system (such as the soname mechanism inherited by Linux from Sun).

Rebar proposes an extremely simple, yet hopefully effective and reliable technique for managing these multiple versions. Basically, each version of a module declares a version range of previous versions with which it is compatible. Separate version ranges for binary and source compatibilty may be declared if desired.

Applications using a library (or any other intermodule dependency) also declare a version range of versions they're willing to accept. In the most common case, this is simply the version of the library installed at the time the application is written. For binary compatibility, it always is the version of the library that the application is compiled against.

Compatibility is simply defined as the two ranges overlapping.

A range can be one of: a list of ranges (separated by commas in the ascii syntax), a single version number (an arbitrary string not containing space or comma), or a range from earliest to latest (separated by whitespace and a hyphen in the ascii syntax). If the latest version is omitted in a library, it defaults to the version number of the library. Testing inclusion in a range uses lexicographical order with all sequences of decimal digits replaced by the corresponding integer.

Examples of version ranges:

range                                 matches
2.2.5 - 2.3.99pre12                   2.2.12, 2.3.98, 2.3.99pre8-ac3
1.2.13 - 1.9.99pre12, 2.0.0 -         1.2.100, 1.3.0, 2.0.4beta

To mitigate the expected common mistake of forgetting to update the version range in a library when releasing a new version, the declaration of compatible version ranges in the library's project file shall be of the form "[current version number] is compatible with version range [range]". If the current version number does not match the actual version number, the range defaults to simply the actual version number.

A particularly nice feature of this mechanism is that version B can be compatible with more previous versions than version A.

For some protection against libraries declaring more compatibility than is actually the case, Rebar provides for writing checks for the existence (or even functioning) of specific library functions. Note that these checks are very commonly used in autoconf scripts to detect version numbers.

Dependency analysis

The sources listed for each target comprise only some of the dependencies. Header files included by reference in the sources also count as dependencies.

There are two ways for Rebar to handle these additional dependencies. First, it can be conservative and simply assume that all sources depend on all header files. This approach is quite simple, and the only negative consequence is the rebuilding of more targets than necessary when a header file changes.

Second, Rebar can actually perform a dependency analysis quite similar to the one in the "make depend" target of autoconf-generated makefiles.

The implementation details and granularity of the dependency analysis are entirely up to the implementation. They are not specified as part of the design. Thus, an implementation is free to do fine-grained dependency analysis, so that for example making minor changes to a library will cause minimal recompiles of .o files in applications which use the library.

In addition, the dependency analysis is likely to depend on features specific to a given compiler. The "make depend" mechanism depends on gcc's -M option. SGI's compiler also has this feature, but with a different syntax. Thus, if the specific Rebar installation is aware of compiler support for this feature, it can use it, but if not, it should still work.

Library database

One important feature of Rebar over autoconf is explicit support for a database of libraries. Modern software is usually built using many libraries, which are often developed in parallel with the application itself. By contrast, autoconf generally assumes that libraries are installed in a single directory for system libraries.

An interesting approach to this problem is the gnome-config mechanism (and its progenitors gtk-config and gimp-config). This is a small utility program that returns header directory info (gnome-config --cflags) and library location (gnome-config --libs).

This general approach is interesting, and does in fact support multiple versions of the same library, distinguished at build time by different $PATH directories to select different versions of the gnome-config script. However, it does have its share of limitations, including not directly supporting optimized, debugging and profiling builds of the same library.

In keeping with the philosophy of Rebar, the project file specifies the desired library (and version range), but does not dictate how to find it. Thus, it's up to the Rebar implementation to find the appropriate library "by any means necessary." In practice, the most useful technique is a database mapping library name, version, and build options to the location of header files and the library .a or .so file itself, and optionally sources. The sources are useful for debugging (to display the context of a breakpoint within the library, for example), and also to create custom builds of the library, for example for profiling.

Rebar should be able to merge the results of both system-wide and user-specific databases for libraries. The system-wide database can be provided by the Linux distribution and include all libraries shipping with that distribution. The user-specific database can default to ~/.rebar/libs but should also be a command line and/or environment variable option to facilitate "sandbox" development using unstable libs. The "rebar install" command should update the information in the library database appropriately.

If these databases fail to resolve the library, or resolve it incorrectly, it should be possible to manually override. The file format should be simple XML for easy hand-editing and analysis when (not if) things go wrong.

Error reporting

One of the gravest shortcomings of the make/autoconf suite is the error reporting. Error messages tend to be cryptic in the extreme, containing lots of unfamiliar syntax expanded from the guts of the m4 macros comprising autoconf, and with virtually no help how to actually fix the problem.

The monolithic architecture of Rebar is well suited to clear, concise error reporting. A simple principle underlies the error reporting philosophy of Rebar: catch errors as early as possible. Note that this principle is in conflict with including code in the project file; generally, an error in code cannot detected until it is actually run.

Typical error messages should resemble the following:

Library Gnome/gdk-pixbuf was not found, version 1.3.11 desired.

Library Gnome/libxml version 2.0.3 (or compatible) required, only
version 1.6.8 found.

Configuration option REBAR_DEV_THEREMIN_PRESENT unknown.

Library Apache/Xerces 1.0.11 present in database, but check for
Xerces::createDocumentNS failed. Check was written for version 1.0.5
or compatible.

No compiler for source type "eiffel" found.

Examples

Here's a simple hello example:

<rebar>
  <target type="bin" name="hello" version="0.0.1" install="bin">
    <source type="c">
      hello.c
    </source>
  </target>
</rebar>

By default, this is stored in the file project.rebar in the toplevel directory of the project. At this point, "rebar" drops a "hello" executable in the toplevel directory. "rebar install" installs this executable in /usr/local/bin, or whichever directory has been mapped to "bin" by the user. "rebar --profile" builds a version of the "hello" executable with profiling.

A simple example of a Gtk+ app needing a byte ordering test:

<rebar>
  <target type="bin" name="yairc" version="0.0.1" install="bin">
    <autogen name="config.h">
       <define name="REBAR_WORDS_BIGENDIAN/>
    </autogen>

    <source type="c">
      main.c
      irc.c
      properties.c
    </source>

    <lib name="gtk+" version="1.2.0 - 1.2.7"/>
    <lib name="Gnome/libxml" version="1.8.6"/>
  </target>
</rebar>

A simple library:

<rebar>
  <target type="lib" name="libfoo" version="0.0.1" install="lib">
    <source type="c">
      foo.c
      foo_guts.c
    </source>

    <source type="h" install="include">
      foo.h
      foo_guts.h
    </source>

    <lib name="zlib" version="2.2.6"/>
  </target>
</rebar>

Hierarchical projects

A good way to structure Rebar projects spanning multiple directories is to have a toplevel project.rebar file that has multiple <include> stanzas for subsidiary directories. File references for included .rebar files are relative to the included file (as opposed to a purely textual inclusion). Thus, it should be quite easy to move subdirectories around and still have things work.

This is more or less the same approach recommended in Recursive Make Considered Harmful, but with explicit syntactic support.

XML syntax

This section outlines the XML syntax. Quite a bit of detail remains to be filled in beyond this preliminary design sketch.

The toplevel project file consists of a <rebar> element. Its children are stanzas, roughly corresponding to make's stanzas.

The most important stanza is the <target> element, which describes a target to be built. Options include the type ("lib" for libraries, "bin" for binaries, "data" for data files, etc), the name of the target, its version, and an install path (generally relative to /usr/local or whichever prefix is desired). The children of the target element are the inputs used to build it.

The <include> element is also a valid stanza, and includes subsidiary .rebar file, as described above. As a way to make the build process overly complex for those desiring it, this file may itself be autogenerated.

Inputs to a target can be: sources, libraries or autogen files.

The <source> element is the main mechanism for building. It specifies a type of file (used to select a compiler for building). It contains a whitespace-separated list of file names.

The <lib> element specifies a reference to a library. The name and accepted version range are parameters. Tests for specific functions are

Finally, the <autogen> element specifies an autogenned file. The contents of the file are given inside. Sources of content include the results of tests, running executable commands (usually built from other sources), etc. It is worth noting that autogeneration of source files makes up for most, if not all, of the loss of expressive power resulting from the decision not to include code in the project file itself.

Writing checks

The process of writing autoconf-style checks is at its root one of writing a program to generate the desired output. Here is a simple such program for checking the size of integers:

#include <stdio.h>
int main (int argc, char **argv) {
   printf ("#define REBAR_SIZEOF_INT %d\n", sizeof(int));
   return 0;
}

The remaining problem is to associate these checks with the appropriate name. As with other configuration parameters, Rebar itself maintains a system-wide database, while users and individual projects can override them as needed.

Conclusion

The Rebar design explores the radical edge of simplicity in the design of a build system. It avoids complexity due to improper factorization of the problem, and is able to straightforwardly address issues that are considered difficult in the traditional make/autoconf toolset, including generation of reasonable error messages, finding libraries, and tracking multiple versions of a module. Further, the simple XML syntax of the project file allows other tools to generate, analyze, and visualize the project files easily.

Thus, I believe that the Rebar design, when fully fleshed out and implemented, will be a very useful tool for building software. Its simplicity, in particular, will pay many dividends.


This document is released under the Software Carpentry version of the Open Publication License.