Chapter 6 File Management
Exercises
6-1 Which component computers of what types of
system would you expect to have local disks with associated disk management
software and filing system software?
Examples are: any system which is not networked ranging from a PC to a multi-user system. A single user, networked workstation may have local disks or be diskless . Shared systems, similarly, may have local disks and also be augmented by shared fileservers. File servers are dedicated to this purpose.
6-2 What are the advantages and disadvantages of
providing filing system software inside and outside the operating system?
The filing system is large and self-contained. It is not needed in every component of some systems (e.g. diskless workstations with remote file service). A small kernel is easier to develop and maintain. Also, providing the filing system as a service above the OS allows substitution of alternative filing systems tuned to meet the needs of applications.
There is more overhead involved in invoking a user-level service (IPC -Vs-system call, non-privileged service must make calls for privileged actions).
6-3 Explain how a directory may be considered as
an abstract data type or object.
Although a user may own a directory he/she may not read or write it arbitrarily (even though we are told it is stored as a type of file). It must be used via the interface operations. The directory service functions as a directory type manager and maintains the integrity of the directory structure.
6-4 How might the modules and processes
introduced in Chapter 2 be used to implement a filing system? Chapters 4, 6 and
9 will give a full explanation, although enough ideas have been presented in
this chapter for a first attempt to be made.
At this stage we know that a number of clients may have requests in progress at the same time. Any one might be delayed for a DMA transfer. We have seen data structures in the filing system implementation that need to be read and/or written as part of satisfying a client request. We suggest a number of processes are able to provide the filing service function, each able to access the shared system data.
The modular structure at a coarse grain comprises a directory service and a file storage service. The latter may have a low level component which is a block storage service. At this level we may have a separate buffer management module. Below the storage service we have the disk device drivers.
6-5 Is a multi-level filing system such as a
tree structure essential or just convenient in a multi-user system? How might a
two-level system be used?
Many small or early systems have not had a full hierarchical structure.
CP/M has a single directory with the metadata on a file held in the file’s entry in this directory.
A two level arrangement could comprise a master file directory with an entry for each user’s user file directory (used in the early Tops10 OS for the Dec10).
The inconvenience of this kind of arrangement has become unnecessary as processor speed, memory and disk sizes have increased.
6-6 How can file sharing be supported in a
filing system? What additional problems over normal file sharing are introduced
if directory sharing is also allowed?
By access rights and by hard or symbolic links.
If hard links may be made to directories then existence control becomes more difficult than simply keeping a reference count with each file. The file material cannot be removed on a delete and we have to garbage collect periodically.
6-7 In what ways might a filing system provide
concurrency control; that is, control of the simultaneous use of a shared file
or directory? What are the advantages and disadvantages of the methods you
suggest?
Assume that there is an open operation. There may be a policy to enforce single writer, multiple reader exclusion on open. In this case, you specify that you want to open the file for reading only or for reading and writing. This could be made more general and flexible by allowing you to specify shared or exclusive mode.
Concurrency control therefore takes place on open and each subsequent request must be checked as in accordance with the mode of use specified on open. An alternative is that there might be a separate lock service.
6-8 What are the functions of the directory
service level and the file storage service level of a filing system?
DS: Maintaining the naming graph. Pathname resolution, access checking and possibly concurrency control checking on open. Handling any specific directory operation or any file operation with a pathname as argument (as opposed to a file "handle" or UFID for an open file). Existence control if this requires that the naming graph is traversed to detect unreachable objects.
FS: Managing storage allocation on disk. Allocation of disk blocks to files from free storage. Location of the disk blocks required to satisfy a given file-read or write request. Management of concurrency control - the FS maintains the data structures on open files.
6-9 File protection is achieved by a combination
of authentication and access control (authorisation).
Describe different kinds of access control policies that you might wish to
implement. Discuss how these policies could be implemented.
Self, group and rest is commonly used and easy to implement but crude.
One may wish to have: a list of individuals and groups (with recursive evaluation of names in groups); a group with certain members excluded or with different access rights, access by anyone who is running some program, and so on.
Multics allowed lists to be specified flexibly (with groups and exclusions), Titan allowed lists (with groups and exclusions)+ access by program. The UNIX set-uid is simple to implement and allows access by program, see Chapter 23.
Once access by program is possible, then arbitrarily fine-grained access controls can be enforced dynamically. The program may interrogate you or determine the time of day, location of the terminal etc.
An example from Titan:
rights: F(full) R(read) D(delete) W(write) C(copy) N(none)
4 categories specified for users: Owner, Colleague, Friend, Anyone.
A directory entry contains, e.g.
FRRN % general access rights for categories
NFNN if user=jmb % check if jmb is a Colleague, if so F rights
FWWW if program=P % rights if running program P
FFNN if key-quoted=X % a colleague has full rights
% if she quotes the password X
6-10 Contrast the access control list approach
to file protection with the capability approach.
Figure 2.20 gives a general context for discussing the two approaches. Instead of keeping a large, sparse access matrix, access to different types of object in a system may be managed either through ACL’s or through capabilities as appropriate.
A general and flexible ACL (see question 9) takes time to process.
Checking a capability is relatively efficient and constant in time.
Revoking traditional capabilities is difficult. Several solutions exist.
Could both methods be combined in a filing
system?
Yes. The DS interface could use an ACL approach on open. The FS interface (particularly if remote from the DS) could be capability based. The DS translates from a pathname to a capability for a file.
6-11 Sketch an implementation of symbolic links
held as separate (small) files. Outline the corresponding directory
implementation.
Directory should allow a type S (symbolic link) in addition to D (directory) and F (file) etc.
Directory entry for symbolic link might then be:
Sharers name for file -- S -- SFID
Where SFID is the system file identifier which gives access to metadata. The metadata points to a file containing the full pathname of the linked-to file.
This is done in some variants of UNIX. It is useful to have a small as well as large block size available.
6-12 In some systems a file must be opened
before it is used and the filing system maintains a context for open files;
other filing systems are "stateless". What are the advantages and
disadvantages of these approaches?
We assume a remote file server with many clients. If the server crashes the clients continue to work. A stateless server may offer a service immediately without trying to recover state by interrogating its clients.
A stateless file server cannot offer a concurrency control service. There is no notion of a file open for shared or exclusive use. It cannot assist in cache coherence if many clients have a file cached in their workstations.
6-13 It has been proposed that an immutable
filing system should be used. Once a file is named by the filing system it may
not be changed. A new file name must be generated if a change must be made to a
file. Directories are still to be mutable. What are the advantages and disadvantages
of this approach?
The advantage is that concurrent write access to a file version can go ahead without delay. The resulting updates will cause new versions to be created.
Immutability does not cause any problems to vanish. Concurrency problems re-emerge as version control problems. Can different versions be merged? How does one know whether one has the most recent version?
Note that if directories were also immutable we should create a new superior hierarchy every time we made an update to a file.
6-14 Filing systems often attempt to hold as
much current data and metadata as possible in main memory. The advantage is to
be able to satisfy clients’ read and write requests faster than if a disk had
to be read or written. What are the possible dangers of this approach? Which
application areas could not tolerate it?
There is no danger in holding a read cache. If you guess right you save a lot of time. The problem on holding a write cache in volatile main memory is that you might lose it all on a crash.
Some systems keep a read cache but not a write cache, others write metadata synchronously but delay writes of data, others use non-volatile RAM.
6-15 Take the file service operations given in
Section 6.4 and work out a modular design with file service interface
operations and nested modules for directory object management and file object
management.
To do a complete solution to this exercise would be a huge project
the design of a filing system. The idea is to sketch the start of an object oriented design and to show how the file service interface operations invoke lower level services and objects. Some obvious simplifying assumptions are that we make the service single threaded and ignore crash resilience i.e. allow operations to be non-atomic. Let us also assume that a byte sub-range is specified explicitly rather than relative to a pointer.
The client interface to the whole filing system contains the operations:
a) those with pathname arguments:
create directory create file
delete directory delete file
list directory contents open file
set access rights to file or directory
link
b) those with UFID arguments:
read file write file close file
The interface to a directory object manager contains:
create directory % and enter in naming graph
delete directory % and remove from naming graph
lookup name in directory % for pathname resolution, returns an SFID
add entry to directory % invoked on create file or directory
remove entry from directory % invoked on delete file or directory
The interface to a file object manager contains:
create file % (assume the DS has managed the naming graph)
% create a metadata table entry, possibly allocate some storage, return a SFID.
delete file % (assumption as on create), remove metadata table entry,
de-allocate storage.
read file % return data
write file % write and acknowledge
At this level, the file object manager maintains a metadata table and can convert from a file identifier to a metadata table entry. From the metadata table entry it may determine the required disk pages within a filing system.
A lower level block object manager may be invoked to give or take the specified bytes. It, in turn, may invoke a cache manager to determine whether the desired block is in main memory in the cache or whether a request must be made to the appropriate disk driver.
Students could be asked to trace the invocation path through these object managers for each system call.
A paper from 1969 gives a basic layered design in detail:
Madnick S L and Alsop J W "A Modular Approach to File System Design" Proc AFIPS 1969 SJCC 34:1-12, (reprinted in Freeman P "Software Systems Principles, A Survey" SRA 1975).
6.16 For each of the allocation algorithms
described in Section 7.5.2 work out the best case and worst case access costs
for a given block n of a file in terms of disk and memory reads.
6-17 Section 6.6 suggests that translation from
disk blocks within a logical volume to disk blocks upon a physical disk is
similar to translation within a virtual memory system. What are the limitations of this
analogy? Would it be useful for a disk
controller to provide the equivalent of a two-level page table?
The
translation from simple partitions to blocks on a physical disk is similar to
the translation of virtual memory segments using the scheme described in
Section 5.4. Beyond this, there are a
number of differences: (i) the analogy holds less
well for page-based memory translation in which adjacent virtual addresses may
be dispersed in physical memory – the access characteristics of hard drives
make it important to encourage long sequential read/write operations, (ii)
there is no analogy for RAID in the case of virtual memory. If hardware support is provided then this is
likely to be in the form of simple volume management, providing logical volumes
built from partitions. RAID is often
implemented in hardware.
6-18 A logical volume is held in RAID-5 format
over three physical disks. Describe the
accesses that will be made to those disks when a block in the logical volume is
updated. How will the number of accesses
involved scale if the number of physical disks is increased?
When a
single block is updated, the RAID controller must update the corresponding
parity block in addition to performing the actual update. Note that although the parity block must be
read and then re-written, the properties of XOR mean that this can be done without
reading the other blocks which share that parity block.