File Systems

Definitions and General Technology

File Systems
(alphabetical order)


Some definitions and general technology

The different FSs use a subset of "the optimal list of features" a distributed file system should have, some say. Others (mainly developers) think they do have good reasons to use one implementation above the other: the efficiency of the technology is dependant on the intention of the use of it, its purpose. Therefore this page first deals with the general concepts of a distributed file system, afterwards the most relevant definitions are explained.
The reason why these pages are focused on (attempts to) distributed file systems is because I think they have the future. With the development of high speed LANs, WANs and the Internet, a stand alone machine is kinda outdated.

back to IT stuff


Traditional File System

First a few notes on what I call Standard FSs, like the one in UNIX and MS-DOS. Especially MS-DOS wasn't designed to function properly in a network; it was designed for a stand alone PC, therefore aspects like file security is not integrated with the FS.

In general, it can be said that there are three essential requirements for storing data:
1. It has to be able to store large amounst of data
2. The information has to be maintained after the process using the data is finished
3. It has to be possible that more than one process can access the data at the same time
Files can be structured in different ways, of which two are used most: the unstructured row of bytes. That is, the OS only sees series of bytes, interpretation is on the user level programs. AN advantage of this approach is maximum flexibility. Examples are MS-DOS and UNIX. Second option is to structure a file as a series of records with fixed length. An example is CP/M with record of 128 characters. The third option is the tree of records and the use of a key field. The OS determines the position of the new (added) records, not the user. This structure is often used in maiframe computers.
There several file types: normal files with user data, directories (are system files), character special files for I/O devices and block special files (model for disks).
Older OSs had just one way of accessing a file: sequential, from start to end, wich is nice when using tapes. However, with the use of drives, random access files were introduced. This means that you can access data of a file in random order instead of going from beginning to the end every time.
File attributes are additional descriptive information about the specific file. There are certain flags set (e.g. access type, archive, hidden, read only, etc) and things like key position, record length and files size are administered.
Memory mapped files is a way for an OS (like MULTICS) to represent files on the addess space of a running process. Imagine the two system calls MAP and UNMAP, where the first one provides a virtual addess, facilitating the OS to project the file on the virtual addess in the address space.
Directories contain elements, normally one per file, with file name, (optionally attributes) and pointers to the disk addresses.

..... more ....

Distributed File Systems

An important development towards a distributed FS was/is the seperation between the File Server and the File Service. In short, the file server is the process that runs on a certain machine and takes care of the implementation of the file service. The file service can be described as the specification what exactly the file server can offer the client machines.
..... more ....

back to IT stuff


..... more ....

File Server
The file (directory) server defines the alphabet and syntax for the nameing scheme of files and directories. The structure is with either trees or "grafen" (ok, translate later).
In the situation of a time sharing system, the FS will have the same visual representation for each process. An important factor is the name transparency, which can be devided into location transparency amd location independence. Naming itself is on the two levels with the symbolic and binary name.
..... more ....

File Service
The file service provides the operations to read/write certain file attributes; the capability is a specification of the types of access rights(compare the ACL).
There are two methods for accessing a file on a file server:
- upload/download model: the whole file requested is downloaded to the client machine, read/changed, and then sent back to the server.
- remote access model: the requested file resides on the server during client operations on that file.
Here's a nice example that you can't say one is better over the other. The determining factor for using either of the technology is the type of network laid. It isn't very user friendly to download a whole file over a slow WAN connection, or say a huge logistics database over a fast LAN. The performance per action may be a fraction slower when accessing a file lots of times via the remote access model, but you can't expect from a user to up/download e.g. a 5 MB PowerPoint Presentation from his laptop with the 28K modem. With a read-ahead mechanism the performance can be improved somewhat.
The upload/download model is a good option in a (fast) LAN environment, and when the file is cached locally, access speed is highly reduced. However, there's a drawback: what happens in the situation that one user cached the file, is in the process of making some changes, and somebody else requests the file? The second user doens't receive the most up to date version if there's no session information of the file maintained by the server. There are four options to provide (at least some sort of) a solution: do nothing, session semantics, server callback initiated by the client software and ..
..... more ....

File time stamps
Information maintained by e.g. the
i-node. it logs the time of creation of the file, as well as the browse, read (access) and/or modify time (depending on the system). In distributed systems file time stamps are way more important than on stand-alone machines, largely because of the file sharing aspects. Related to that is the requirement of synchronized time in the network in the client/server (download) model. When two clients want to access and modify the same file, or when one starts modifying it, and the second one wants to read the file, time stamps are used to get the latest version of the file (in some systems the first client who has the file in write mode has to do the update to the server first before the other may open it). Or when the specific volume is duplicated for redundancy/faster client access, the two files are compared every preconfigured interval, the one with the "highest time" will win (unless you set the duplicate as a read-only backup of course).

Location independency
Location independence means that a user does not need to know which fileserver holds the file, the user only needs to know the pathname of a file. Of course, the user does need to know the name of the AFS cell to which the file belongs.
To understand why such location independence is useful, consider having an x amount of clients and two servers and need to move a filesystem "/home" from one server to the other. Using e.g. NFS, you would have to change the /etc/filesystems file on all clients and take "/home" off-line while you moved it between servers. On the contrary with AFS (or any other FS with location independency implemented), you simply move the AFS volume(s) which constitute "/home" between the servers. You do this on-line while users are actively using files in "/home" with no disruption to their work.

In Linux (its ext2 file system) and UNIX each file is represented by a structure, called an i-node, containing the description of the file: file type, access rights, owners, timestamps, size, pointers to data blocks. The addresses of data blocks allocated to a file are stored in its i-node. When a user requests an I/O operation on the file, the kernel code converts the current offset to a block number, uses this number as an index in the block addresses table and reads or writes the physical block.

File locks are important in a situation where two people want to access a file in write and/or modify mode. One can get the exclusive right to write to the file and to prevent inconsistencies the subsequent users are only allowed to open the file in read mode. Several people agreed that this is not quite a flexible method, so it is possible now to put a lock on several block of a file:
a flexible granulated mechanism to lock something between 1 byte and the whole file in one action. The call has the parameters of the first byte and the amount of bytes requested for the lock. The rest of the bytes of the file are still available for other prcesses, which is really convenient for database implementations. There are two types of locks: shared and exclusive. Exclusive speaks for itself. When there's a shared lock on part of a file, it is allowed to add another shared lock on the bytes who aren't locked exclusively, but a request for an exclusive lock over an shared lock will be denied. (FYI: implemented in e.g. UNIX)

Loosely connected
Has to do with send and receive messages for communication in distributed OSs, instead of RPC. A system that is loosely connected has relatively "autonomous" (heavy) client components to perform certain, but not all, processes. Examples of such loosely connected distributed systems are the SETI-at-home and the rc-5 projects where there are a couple of servers connected to the internet and the client computers ("at home") try to decode blocks of information; after crunching the blocks of information, the output is sent back to its key server and the client is sent new blocks for the crunching (in the processor idle time).

race problems

..... more ....

Acronym for Remote Procedure Call. RPC is invented to avoid the classical I/O communication between PCs, to make it look like a true distributed OS. Imagine a remote read procedure. The client PC wants to read a file residing on a server. When the OS found out the file is not local, it places another version of the read (the client-stub) in the library, calling procedure and then a kernel trap. Here's the difference with the standard situation: after the kernel trap the parameters aren't placed in the registers, but encapsulated in a packet (called parameter marshalling) and send to the server-stub. The server-stub unpacks the packet and calls the server procedure as usual. The response is going the same way back to the client PC. Both stubs have buffers. So, from the clients view, the whole process is like a local read process, because the stubs are hidden; it avoids the send and receive in the traditional way of network communication because all details are hidden in the two library procedures.
However, the implementation is not as straightforward as presented. When two processes are running on two machines, they may be executed in different address spaces and that may lead to complications. Transfer of parameter information, which is different on different types of machines (e.g. big and little endian). And there must be some system to recover from errors during the communication/binding (defect PC, cable: at-least-once semantics etc.). But there are several solutions for that. See references, or do a search on google.
RPC protocols: IP/UDP or a specialized protocol. Speed is the most important aspect in the design and the realted packet size (e.g. the Sun limit is 8K). Use of stop-and-wait or blast-protocol.
More aspects involved are the critical path and timers. See references.

Shared files in distributed systems
The following table shows the four principles used in distributed file systems. Each of them do have their advantages as well as disadvantages.

Method Description
UNIX-semantics Each action on a file is visible immediately for all processes involved (a change is sent back to the file server immediately).
Session semantics Changes performed on a file are visible after closing the file.
Unchangable files no changes possible, eases sharing and duplication
Transactions All changes are according to the all-or-nothing concept (several changes combined to 1 action).

Session semantics obviously do have the problem that it is not realtime at all, and easily may lead to inconsistencies. On te other hand, this is implemented in the client-server download model and increases client access speed and consumes less bandwith, which is desireable in older and slower networks.
Transactions are mainly implemented for databases (e.g. Microsoft's Transaction Server), and seems promising for critical transactions like online banking.

Transaction logging, among other things, offers a much needed degree of data recovery. As it stands now, our metadata friend known as $LOGFILE is written to every several seconds, hence monitoring recent changes to the directory structure and files. These transactional records are recorded as actions that either need to be undone or redone, pending a system failure. This kind of "roll-forward" and "roll-back" preservation serves to eliminate any loss of data in the event of failure.

References and/or more information
Modern Operating Systems, A.S. Tanenbaum, Ch 4, 10 and 13.
Remote Procedure Call from the Open Group Technical Standard
RFC 1057 RPC: Remote Procedure Call Protocol Specification Version 2
Remote Procedure Call, Moving To Distributed Processing Standards ...
Design and Implementation of the Second Extended Filesystem