The Coda Distributed Filesystem for Linux
Introduction to Coda

Bill von Hagen
Monday, October 7, 2002 11:12:44 AM
The initial article in this series provided an overview of the basic
principles of distributed filesystems, and highlighted several of the
most popular and up-and-coming distributed filesystems that are
available for Linux today. The previous articles explored the
InterMezzo distributed filesystem and explained how to install and
configure a simple InterMezzo client and server. This article explores
the Coda distributed filesystem that provided much of the inspiration
for InterMezzo and which is also readily available for Linux.
Coda is a well-established distributed filesystem that was developed at
Carnegie-Mellon University, is actively in use there, and is also
still actively under development. Coda began life as a variant of the
AFS distributed filesystem (version 2) from Carnegie-Mellon
University, but has since taken on a complete life of its own. Led by
M. Satyanarayanan, the Coda filesystem project is focused on specific
distributed filesystem functionality required for mobile computing,
such as support for disconnected operation. As explained in the
article on InterMezzo, "disconnected operation" is the term used to
describe the situation where a system that is ordinarily a part of a
networked, distributed filesystem, is used without being connected to
a network.
Due to its heritage, Coda shares a basic set of terminology and
features with AFS. (The Open Source version of AFS, OpenAFS, will be
discussed in the next article in this series.) Coda provides a number
of features that make it an excellent, high-performance distributed
filesystem. Beyond its focus on mobile and disconnected operation, one
of Coda's most significant features is its extensive use of
caching. Caching means that copies of files or portions of file
retrieved from Coda servers are preserved on Coda clients as long as
they can be verified to match the master data stored on the Coda
server. This is therefore known as "client-side caching". Client-side
caching reduces the amount of time that it takes to restart a Coda
client by minimizing the amount of data that needs to be transmitted
over the network after a Coda client is restarted. It's a fact of
computer life that people tend to work on the same files and in the
same directories--these change over time, of course, but the files
you are working on today are probably more-or-less the same ones that
you worked on yesterday.
Client-side caching reduces network communication and minimizes client
restart times, but doesn't always guarantee that the files that you
need to work on are present on the client. In a networked environment,
this is fine--the client system can simply retrieve the file from the
file server on which it is located. Given Coda's focus on disconnected
operation, Coda also provides command-line commands that let users
manipulate the contents of the cache, guaranteeing that specific files
and directories will be present in the client's cache. Coda's "hoard"
command therefore enables you to anticipate being disconnected from
the network. uses any preloading your system with cached copies of
specific file and directories. This function is typically used before
you disconnect a laptop from the network prior to working in a
disconnected fashion for some period of time. An example of using the
hoard command is provided later in this article.
Next: What's in a Name? »