Additional exercises on filing systems

Additional exercises on filing systems

6-17 We have seen that metadata is held for each file and that the location of the blocks of the file on disk may be found via the metadata. Discuss the following alternative ways in which storage allocation might be organised and recorded. Discuss the suitability of the methods for various types of filing system.

a) Chaining: The metadata contains a pointer to the first block of the file. That block contains a pointer to the next, and so on. In addition, each block may identify itself as block x of file y to aid crash recovery.

b) The metadata contains a table of pointers to blocks of the file. To prevent the table becoming too large for large files, only the first few pointers point directly to blocks of the file. After that come pointers to blocks which contain pointers to blocks of the file and so on. An example is given in Section 24.6 and exercise 24-3.

c) The metadata contains an "extent list" for the file. An extent records the start-block and end-block of a contiguously allocated range.

d) A separate "file map" data structure is maintained for each filing system. The map contains an entry for each block of the range of blocks the filing system may allocate. Each file is represented by a chain through the file map. The metadata points to the first entry for that file (the first block of the file). The rest may be found by following the chain through the map.

Chaining has been used for PC systems such as MS-DOS and single user workstations such as Pilot. OK for sequential access only.

Tables of pointers with some direct pointers and some indirect are suitable for environments with a large number of small files and some large files e.g. campus systems. Not especially suitable for environments where files are usually large e.g. DBMS or multimedia (such as large voice or video files) where equally rapid access is required to all parts of a file.

A disadvantage of extent based allocation is its visibility to the user. There is likely to be a maximum number of extents because of metadata design. Above this number the system may have to consolidate by copying. It is a good method for systems where large and small files coexist and very large files are likely to have contiguous allocation.

A file map was used in the Titan OS. The method fell out of use as filing systems became large and main memory not so large that it could easily contain the data structure. It might be suitable for systems with a single large block size.

6-18 The choice of block size might also be discussed. Is there a single block size for the whole system or might it vary with the application domain of the system or the file type? A large block size gives an efficient unit of transfer but an inefficient use of storage if there are many small files such as those containing symbolic links.

Examples are that the UNIX BSD4.1 file system used only a 1Kbyte block size. BSD4.2 used two sizes: a block size and a fragment size. All the blocks of the file except the last are of some size fixed during configuration and depending on the intended use of the system, say 4Kbytes. The last block may be of some minimum size say 512 bytes or multiples of this minimum up to the standard block size. This can lead to a great deal of copying when a new file is being written from the beginning.

6-19 The general topic of measuring file system performance and organising disk storage to achieve high performance is not discussed in the main text. Performance optimisation can be taken as a specialist topic at a later stage.

A suitable topic at this stage is to discuss the idea of partitioning the disk into cylinder groups and partitioning the filing system so that the metadata and data for a file are allocated to the same cylinder group. The idea is used in Berkeley UNIX and aims to minimise long seeks.

We mention the likelihood of an I/O bottleneck as processor speeds continue to increase. As memory size increases, clients will be able to cache more files and write operations will dominate reads at file servers.

A popular approach is to organise writes to disk in a log structure; that is, instead of moving the heads to update files in place, all updates are appended to a log. The updates may be consolidated with the corresponding files from the log at times of light load.

A similar approach is to locate the function of acquiring free-blocks for writes at the lowest level in the filing system. It can be arranged that free blocks are scattered uniformly across the disk and a free block near to the heads can be acquired when needed. [Private communication from John Wilkes of HP Research Laboratories, Palo Alto].

Proc. USENIX workshop on File Systems, Ann Arbor, Michigan, 21-22 May 1992 is suitable for browsing for issues of current interest in file system design.