Virtual File System Model

File Systems hold persistent data (that survives power outages). Networked File Systems make that data available across a network. Distributed File Systems make the data more available, perhaps with higher performance, as well. See figure #fn101#2349>. <#2465#>#tex2html_wrap4266#<#2465#> Distributed File Systems exist because of certain cost/performance tradeoffs. At various points in history, the relative costs of diskstore compared with Monitor, CPU and memory, as well as network costs, together with the reasonable performance of Local Area Networks, has meant that it is easier to put mass storage on centrally managed servers, than on all the workstations in a site. The main gain has been one of manageability, but there with replication of disks, there are also increases in fault tolerance, and in performance. However, the economics of such systems are by no means fixed. Indeed, many sites in the 80s and early 90s have had disks on all machines, and have distributed executable files (typically a large proportion of filestore requirement) to those disks periodically, and only kept the rapidly changing priceless users files on centrally managed filestores. With a clean replication system, the central servers would shrink to nearly nothing, merely existing as caches for replicas to changes, before they are stored onto safer media such as tape or writable optical storage. See figure #102#2351>. <#2466#>#tex2html_wrap4268#<#2466#> To start with, let's outline a taxonomy of a networked file system:

Disk
Server machine
Server Software
network
Client Software
Client machine
Client Application

Note that there are a number more components than with local file access. This means that extra mechanism must be introduced to improve availability, even just to the level of local access.

#figure2355#
Figure: Remote File Access

Note that there are several places that protocols are involved. Between the client and server machine, and between the client application and the client filesystem access code, and between the server and the filestore. Typically, the client and server protocols are made as similar to local access as possible (for example by using remote procedure calls that are nearly the same as the system procedures for accessing local files). However, there are more modes in which the system can fail, and so the failure semantics are changed. There are also more opportunities for concurrency, and therefore chances of inconsistency since a server cannot tell what a client application is doing with data once it has given that data over, and there may be multiple clients of a given server (and of a given file on that server).

#table2360#
Table: Changing from local to remote access

Since disks are relatively slow, even local access is typically cached in memory. When file access is remote, this leads to the further choice of whether there is caching at the server, or at the client or both. The choice is dependent on the semantics of remote file access.

#table2366#
Table: Examples of choice

When designing a network or distributed filesystem, performance is a key parameter. To select the right paradigm, first we must look at the underlying access patterns for local file access. Then we try and predict if this access will remain the same for remote access, and if not, how it will change. There have been many studies of file access patterns, though most have been on the same kinds of system (Unix). They mainly find that there is a very high degree of what is called ;SPM_quot;locality of reference, both in time and space. Put simply, if you access a part of a file you are much more likely to access the next part of the file, and soon, than some other part of the file, or another file, at some far off time in the future (through symmetry arguments, the past behavior resembles the future). Another part of the picture is that the majority of files are opened for reading only, and rarely for writing. This is very important when considering how expensive a concurrency control scheme one should use since there's no need to invoke it for read only file access. When looking at the service used to access the file, we should distinguish between the service seen by the application, and that actually carried out by the system. This is no different from local access: Local file access in many systems is provided by a <#2373#> stream<#2373#> abstraction. In fact, hardware access to the file consists of a possibly arbitrary scattering of blocks across a disk, but layers of software conspire to hide this. A Remote File Access protocol may well preserve the stream appearance of access, while actually translating it into unique access to blocks (NFS works approximately this way). Alternatively, it may provide a lower layer <#2374#> side-effect<#2374#>, whereby the entire file is moved from server to client as a stream, and access at the client is mapped into access to the copy (AFS works roughly like this). See figure #fn103#2375>.