Spring 2021
Before diving into this unit, it is helpful to refresh you understanding of these topics that were covered in CSCI 237:
If we focus on Linux, there are many system calls (~300), but only some of them specifically relate to the storage subsystem. The system calls discussed in OSTEP Chapter 39 are enumerated below, and they are the ones that you will likely use most in this class. You should be familiar with all of them: both how to use them (or how to look up their usage using Unix man
pages) and how they affect the state of the storage system’s key data structures. Memorization is much less important than thinking about why the interface is designed the way it is.
open
/creat
close
read
write
lseek
pread
/pwrite
fsync
rename
stat
/fstat
unlink
mkdir
readdir
rmdir
unlink
mount
umount
There are multiple ways to refer to files, and each identifier has its own advantages and disadvantages. You should be familiar with the types of identifiers that are passed to each FS-related system call, and why that particular identifier is used.
What logical objects do each of the following types of identifiers refer to?
Each process contains a private table that maps file descriptors (per-process integer identifiers) to file data structures. So we should try to be precise when we use the term file: colloquially, file has a meaning that is similar-to-but-very-different-from the file
data structure that is part of the file system API in the OS. Unfortunately always being precise is difficult, so context should be helpful when determining what is meant by the word file.
Directories do not store “data” in the typical sense. Directories are a particular type of file that contains a mapping from pathnames to inode
numbers.
.
and ..
. What are these files and what is their purpose?
rmdir
deletes a directory. What are the necessary preconditions for the successful deletion of a directory? Why do you think this decision was made?If the process has sufficient permissions to access a file, the open()
system call creates an entry in the process’s file table so that the process can interact with the file. The file desriptor returned by open
is an index into that table. The same “colloquial file” can be opened multiple times, and each file structure instance keeps helpful state about the process’s interactions with that “colloquial file”, including a current offset and the “access mode” (e.g., read-only).
open
?open
return?open
accept/return an inode number?open
and creat
? Given this relationship, how might you implement creat
?Indirection is a very powerful tool. There are two useful types of indirection provided by links.
inode
’s reference count?
inode
’s reference count?which python
at the command line, and then check the long listing (ls -l
) of the resulting pathname)Directories give the file system namespace a hierarchical structure. This gives a powerful way to express relationships between objects. These relationships are often used by file systems to influence their low-level data placement/organization policies.
We use the mount
and unmount
system calls to create a unified namespace from a set of independent file systems: mount
takes the root of a file system and attaches it to a directory in the global namespace. Unmounting a file system disconnects the root of that file system from the namespace, making that file system’s directory tree unreachable.
Caching is an important tool for improving file system performance. Yet caching data exposes the system to data loss: what if the machine crashes before the cached data is written? It is important that applications have a way to enforce reliable guarantees so that they can protect themselves from corruption.
fsync
(i.e., can you describe possible initial and final states of a file before/after calling fsync
)?fsync
at all? Why not immediately persist all data?fsync
, what guarantees does the write
system call actually provide?fsync
and proper care is not taken in the application/file system implementations? What types of guarantees might be desirable for a system to support? Why are they not standard guarantees?Renaming a file is a seemingly simple task. Yet the deeper you dive into the rename
system call, the more interesting it becomes. rename
is the first time we encounter the concept of atomicity. An atomic operation is one that either happens completely and all at once, or it does not happen at all. In an atomic operation, no intermediate state is ever revealed. Fo rename
, what that means is that the file either exists at its original location or at its new location—it is never in both places and it is never “gone”.
rename
?rename
can fail?
rename
failure?rename
atomicity, rename
is often used to update files. What combination of system calls could you use to perform a series of file modifications so that either all of your modifications are reflected in the final state of the file, or none of your modifications are reflected in the final state of the file?The lseek
system call updates a file
data structure’s internal offset. This is useful for issuing non-sequential reads and writes (commonly referred to as random reads and writes, even when the operations are not random in the mathematical sense).
lseek
modify any persistent file state?lseek
, read
, pread
, write
, and pwrite
?