1999 Linux Symposium



1999 home

InterMezzo: A distributed file system prototype for Linux

Peter J. Braam , Stelias Computing, Inc.
Michael J. Callahan , Ask Jeeves, Inc.
Philip Schwan , Stelias Computing, Inc.

InterMezzo: a distributed filesystem prototype for Linux

Advances in clustering, failover systems, and distributed data storage are some of the important developments in storage software technology made during the last several years. Although decreasing hardware prices have made large storage subsystems available to a much wider audience, the cost of software with these advanced features has remained prohibitive. Commercial filesystems such as AFS and those sold by Veritas remain too costly for individual users, and free projects like Coda--though impressive--are not ready for production use. The goal of the InterMezzo project is to provide the features and techniques developed by AFS and Coda, but to do so symmetrically, asynchronously, and an order of magnitude more simply.

The core of InterMezzo is a Linux kernel module named Presto. Presto acts as a filter driver for application VFS requests; it pre- and post-processes requests, fulfilling cache misses and maintaining a log of updates which are lazily shipped to the server. Presto also receives VFS requests for operations made by the userlevel cache manager; these requests are passed directly to the underlying filesystem. In essence, InterMezzo is layered on top of a normal disk filesystem (such as ext2), and therefore enjoys its robustness, performance, and scalability. Furthermore, unlike the Arla and Coda projects, we avoid the burden of writing and maintaining our own disk filesystem. By keeping the kernel module simple and small, we minimize both overhead and the possibility of bugs.

The cache manager and file server is known as Lento. It uses Perl and the Perl Object Environment (POE) package to implement an asynchronous framework; our mechanisms favour asynchronous request processing over multiple threads of control for better management of the complicated state of the file server and cache manager. Processing multiple outstanding requests from the kernel and from TCP sockets and parallelising outgoing RPCs to multiple systems is done using Perl object classes. The cache manager and file server enforces security, negotiates version stamps and callbacks to avoid seeing stale data, and handles write permits to try to avoid write conflicts. Furthermore, all of the semantic decisions that the file server makes during these processes are cleanly separated from the rest of the code, allowing one to customize InterMezzo behaviour for a certain task or environment.

Careful examination of Coda's callback and reintegration protocols revealed an opportunity to implement Lento with an extreme degree of symmetry and code re-use by allowing it to act as both a client and server. A Lento client downloads directories and files from the server, while the server uses the same mechanisms to download journalled updates from the client. This code re-use helps keep Lento small, and makes it easier to understand and modify.

InterMezzo is fundamentally different from systems like NFS in using a write-back cache, avoiding synchronous interactions with the server. This allows operations to proceed at nearly the speed of the local disk filesystem, while journalled modification logs can be optimised and lazily shipped to the server in the background. This is clearly a big win in terms of performance.

The InterMezzo code base (~2,500 lines of C for Presto, ~3,750 lines of Perl for Lento) is small enough to provide an attractive research platform. Future plans include features such as server replication and disconnected client operation that Coda currently supports.

© 2000 Linux Symposium.  All Rights Reserved.