
|
navigation
|
intermezzo
|
related
|
|
keynote speakers schedule events sponsors 1999 home |
InterMezzo: A distributed file system prototype for Linux
Peter J. Braam
InterMezzo: a distributed filesystem prototype for Linux
Advances in clustering, failover systems, and distributed data storage
are some of the important developments in storage software technology
made during the last several years. Although decreasing hardware
prices have made large storage subsystems available to a much wider
audience, the cost of software with these advanced features has
remained prohibitive. Commercial filesystems such as AFS and those
sold by Veritas remain too costly for individual users, and free
projects like Coda--though impressive--are not ready for production
use. The goal of the InterMezzo project is to provide the features
and techniques developed by AFS and Coda, but to do so symmetrically,
asynchronously, and an order of magnitude more simply.
The core of InterMezzo is a Linux kernel module named Presto. Presto
acts as a filter driver for application VFS requests; it pre- and
post-processes requests, fulfilling cache misses and maintaining a log
of updates which are lazily shipped to the server. Presto also
receives VFS requests for operations made by the userlevel cache
manager; these requests are passed directly to the underlying
filesystem. In essence, InterMezzo is layered on top of a normal disk
filesystem (such as ext2), and therefore enjoys its robustness,
performance, and scalability. Furthermore, unlike the Arla and Coda
projects, we avoid the burden of writing and maintaining our own disk
filesystem. By keeping the kernel module simple and small, we
minimize both overhead and the possibility of bugs.
The cache manager and file server is known as Lento. It uses Perl and
the Perl Object Environment (POE) package to implement an asynchronous
framework; our mechanisms favour asynchronous request processing over
multiple threads of control for better management of the complicated
state of the file server and cache manager. Processing multiple
outstanding requests from the kernel and from TCP sockets and
parallelising outgoing RPCs to multiple systems is done using Perl
object classes. The cache manager and file server enforces security,
negotiates version stamps and callbacks to avoid seeing stale data,
and handles write permits to try to avoid write conflicts.
Furthermore, all of the semantic decisions that the file server makes
during these processes are cleanly separated from the rest of the
code, allowing one to customize InterMezzo behaviour for a certain
task or environment.
Careful examination of Coda's callback and reintegration protocols
revealed an opportunity to implement Lento with an extreme degree of
symmetry and code re-use by allowing it to act as both a client and
server. A Lento client downloads directories and files from the
server, while the server uses the same mechanisms to download
journalled updates from the client. This code re-use helps keep Lento
small, and makes it easier to understand and modify.
InterMezzo is fundamentally different from systems like NFS in using a
write-back cache, avoiding synchronous interactions with the server.
This allows operations to proceed at nearly the speed of the local
disk filesystem, while journalled modification logs can be optimised
and lazily shipped to the server in the background. This is clearly a
big win in terms of performance.
The InterMezzo code base (~2,500 lines of C for Presto, ~3,750 lines
of Perl for Lento) is small enough to provide an attractive research
platform. Future plans include features such as server replication and
disconnected client operation that Coda currently supports.
|
2004 2003 2002 2001 2000 1999 |
| © 2000 Linux Symposium. All Rights Reserved. |