File Systems

Andrew File System



File Systems
(alphabetical order)

AFS
      Introduction
      Components
      Configuration
      AFS vs NFS
      Links
CIFS
CODA
Ext2
GFS
MS-DOS
NFS
NTFS
Odyssey
Samba
UNIX


Some definitions and general technology
Introduction


The Andrew File System, AFS, was developed at the CMU and named after the first names of the founders of the university: Andrew Carnegie and Andrew Mellon.
The idea behind the design was to create a network with computers for all the students and employees of the university (up to 10000 users), aka: a major aspect was that it should be capable of scaling up easily. The maximum is about 200 clients with 1 server, but in practice it is advised to use a ratio of 50:1.

Currently AFS is marketed, maintained, and extended by Transarc Corporation (and at the moment of writing, Transarc is part of IBM.). Check their website for currently supported platforms and there's a client-only implementation "AFS Client for Windows/NT". The latest version is AFS 3.6 (search their list for most up-to-date information and white papers).


back to IT stuff

Components


The main stuff you really need to know about:
- Venus runs on the client PCs, hence providing the interface between the client and vice.
- Vice runs on the server and is cache manager as well.
- fid = file identifier, similar to the i-node in UNIX.
- A volume is a collection of directories.
- A cell is a collection of volumes (IRL often a collection of servers grouped together administratively) and presenting a single, cohesive filesystem. Typically, an AFS cell is a set of hosts that use the same Internet domain name. So, these cells may be local on same LAN or remotely somewhere on the WAN.
- The Cache Manager maintains information about the identities of the users logged into the machine, finds and requests data on their behalf, and keeps chunks of retrieved files (64K at a time) on local disk.


back to IT stuff

Configuration


The configuration is a combination of clusters, where each cluster consists of a file server and tens of PCs.
It uses the client-server implementation of file use (up/download), see also File Service. It is assumed that most users use a PC of the same cluster, i.e. a PC residing in the same area with the very same file server running vice and thus containing their data. This doesn't mean that the users aren't allowed to login on a PC outside the cluster, but it does mean that when using a PC outside their let's say "home" file server, the up- and download time of the specified file will be considerably slower.
Further, there's replication for reliability. Multiple copies of application binaries and AFS databases are replicated on multiple file servers within a cell. When accessing this information, a client will choose among the available servers. (But I'm still curious how they implement that algorithm when the user does have write rights and the file is cached locally. IIRC it's trying to maintain consistency via the callback mechanism.) Replication also reduces the load on any particular server by placing frequently accessed information on multiple servers.
The developers list the follwing as the main strengths of AFS:
  • Caching facility
    (part of) A file is cached locally on the client machine to improve performance (check File Service). Local caching also significantly reduces the amount of network traffic, thus improving performance when a cold read is necessary.
  • Security features
    Kerberos authentication method is used (requires that users prove their identities before accessing network services) and ACLs. You can decide if you want to use NIS or not.
  • Simplicity of addressing
    AFS does its mapping (filename to location) at the server. This has the tremendous advantage of making the served filespace location independent. Related to this, is the Single System Image (SSI): this means that regardless the PC a user logs on to, the view of the tree structure is the same.
  • Scalability
    The client:server ratio mentioned in the introduction does not have one particular optimum. The overall performance depends on number and size of files, file access and change rate, server hardware and network bandwidth.
  • Communications protocol
    The communication is based on RPC and IP, and is optimized for WANs


back to IT stuff


AFS versus NFS

AFS 3 NFS 3
Architecture
File servers and clients form a logical administrative unit called a cell. File servers and clients. Each file server is managed independently.
Administration by collections of files called volumes. Administration by individual files.
Common, global name space viewed by all machines. Name space not always viewed consistently across machines.
Automatic file location tracking by system processes and Volume Location Database. Mountpoints for tracking file's physical location set by administrators and users.
Stateful servers. Nearly stateless servers.
Performance
Robust disk caching reduces file server and network load. Memory caching with small buffers.
Server callbacks guarantee cache consistency. Open-to-close semantics. Attributes cached several hours. Time-based cache consistency may cause inconsistencies to occur. Attributes cached 3-30 seconds.
Network traffic reduced by callbacks, large buffers. Network traffic increased by limited caching.
Replicas spread the load among preferred servers. No replication to reduce load. No replication to reduce load.
Excellent performance in wide-area configurations. Inefficient in wide-area configurations.
Scaleable; maintains performance in any size installation. Best in small- to medium-size installations.
Availabililty
Read-only replication by volume. Automatic switchover to available replica. No standard data replication.
Files remain available to users during reconfiguration. File names remain the same. Users lose access to files during reconfiguration. File moves require mountpoint changes to adjust file names.
Management
Management tasks executed from any machine. Management tasks frequently require telnet to designated machines.
Disk quotas based on volumes; easy for user to check status. Disk quotas based on user ID; difficult for user to check status.
No system downtime with AFS Backup System. Standard UNIX backup requires system downtime.
Backup clones often used for user-controlled restores. All restores require administrator assistance.
Security
Kerberos version 4 authentication. Unencrypted user IDs, trusted users and hosts. Can be kerberized.
Access control lists for fine tuning directory access. UNIX mode bits for the owner. Access control with standard UNIX mode bits on files and directories.
User-definable groups. Groups defined by system administrator.
Mutual authentication by system processes and databases. Always uses secure RPC. Can use secure RPC.
source: AFS white paper at http://www.transarc.ibm.com/Library/whitepapers/AFS/afswp.html



References and more information
Modern Operating Systems, A.S. Tanenbaum, Ch 13.
AFS FAQ
AFS white paper



back to IT stuff