Spring 2021
You have become (relatively) familiar with FUSE though your “Hello FUSE” pseudo-filesystem implementation. So far, you have successfully created a simplified file system that contains a single in-memory file.
In this lab, we will implement FS behavior to support the creation, deletion, and modification of files and directories.
Through this lab, we will explore the challenges of managing persistent data in a file system.
After completing this lab (parts A and B), you will:
Developing a working file system is very hard. For that reason, the assignment is divided into two parts. In part I (this part), you will develop a lot of scaffolding and enough code for your filesystem to do something testable. In part II (the next part), you will complete the filesystem.
For this assignment, you are again strongly encouraged to work in a group. If your unique situation makes collaboration impossible, please let me know.
Like last lab, each student will be given a private repository under the Williams-CS GitHub organization, and only you will have read/write access to that repository. This repository will contain a copy of the starter code for the lab. You may commit and push code to this repository if that is helpful, but the main purpose of your private repository is so that you may write your own Eval.md
(each group member must submit a private Eval.md
).
Please submit the following group preference form by Wednesday at 9am Eastern.
I will create shared repositories for your team’s code development. You should be committing your code to this repository as you make progress. I highly recommend that you commit code early and often: it will only help because the teaching staff can view your code and easily answer questions using the GitHub interface.
Since your team will be submitting a single repository with shared code, teammates may collaborate without restriction. In fact, I strongly encourage you to pair program whenever possible (Zoom screen sharing makes this much easier to schedule). The code you will write for this lab is significantly longer than the code you wrote last lab, and it requires significantly more thought & planning before you can begin. So it is important that you completely understand and are involved in the design, and I suggest writing a “formal” design doc. But once you have decided upon your concrete design and created well-defined function specs, you should be able to implement functions independently based on that spec. In other words, spend time together planning before you divide and conquer. (Personally, I strongly prefer to pair program throughout—I find that to be the most time-efficient and rewarding way to write correct code. But I realize that not everyone shares this view.)
Tips for pair programming:
You may also discuss high-level questions with other classmates. High level questions include:
union
and why do they keep showing up in this assignment?The rule of thumb is that you should never view any classmate’s code if they are not on your team, but you can (and are encouraged to!) work through ideas together.
Your assignment is to develop a “Reference FS”, which we will call ReFS. Your task for Part I is to:
By the end of Part I, your ReFS should supports the following features:
getattr
, access
, readdir
, and mkdir
. (Note that at this point, it is not necessary to support file I/O, or even regular files. We are only concerned with directories in Part I.)
mkdir
operation must allocate (set appropriate bits in your allocation bitmaps) and initialize (fill with valid contents) an inode
in the inode table and a data block from the data area. (Your mkdir
should also persist these structures and the associated allocation bitmap structures to complete the operation.)#defined
constant (this way you can easily change it later). Remember to watch out for the working-directory gotcha.SIGKILL
(ctrl-C does this if you run your FS with the -f
flag). O_DSYNC
isn’t necessary, because the operating system will make sure your data reaches stable storage unless the entire OS crashes—which isn’t part of the expected testing plan!)inode
data structure definition, but you do not need to implement regular files yet.rmdir
, etc. in the next assignment part, so you are welcome to implement those things. However, they need not be tested in the current assignment.Why this particular set of features? It’s the minimum set of necessary operations to have a filesystem where you can do something visible: create and list directories. You’ll find that you need to create quite a bit of scaffolding to get that far (in particular, the code that creates an initialized ReFS filesystem from scratch). We will discuss much of this scaffolding in conference.
The “completeness” test of your filesystem should be that you can:
ls -la
returning reasonable results, including for "."
and ".."
)cd
into the directories that you’ve created.You are encouraged to write more comprehensive tests (we will in Part II), but testing these directory operations is harder to script than the “file” tests. For now, we just want to get through our implementation of the FS foundation.
When I refer to a “ReFS”-like filesystem, I mean the following:
inode
structure. At a minimum, this should include:
For reference, since my minimal implementation used a 4096 byte block size I used a #define
symbolic constant to document the meaning, and then I used the following union
to make sure my superblock is padded to the size of a filesystem block:
I also found it useful to create a few macros to do things like seeking to a particular block, converting back and forth between byte offsets and block numbers, checking/setting a bitmap bit, etc.
Some of these “stubs” are included in your starter code. You should, of course, feel free to modify those function definitions and add your own helper functions as needed.
Note: You are supposed to be writing a real filesystem. The only differences from a true implementation of a ReFS should be:
pread()
and pwrite()
access your structures on disk.
pread()
/pwrite()
calls directly, you should use something like the read_block()
/read_blocks()
and write_block()
/write_blocks()
wrappers that were in the starter codeTo mimic a real filesystem, your implementation must satisfy the following criteria:
read_block()
and write_block()
should help)You should think carefully about your plan before you start. In this section, I hope to give some advice that will help you plan. Before you start, you should be able to answer:
refs_init()/refs_destroy()
refs_init()
does a lot of work to initialize a file system. Please take some time to familiarize yourself with that function. I suggest walking through that function using gdb
as we did in conference, both to “initialize” new file system state and to read in pre-existing file system state from refs_disk
.refs_destroy()
. What steps that are taken in refs_init()
need to be “finished”?
refs_destroy()
is a good first step.getattr()
getattr()
is called frequently. This is the next function I would tackle. All of the information you need to populate the fields of a struct stat
is (or can be made) available in the refs_inode
, but you need to be able to “find” the target inode.open(bar)
example shows the steps to resolving a path. This is the most useful building block for the first part of this lab.access()
getattr()
, you have most of the infrastructure needed to implement access()
. However, the starter code does not track file permissions. Adding these permissions is not difficult (add a mode_t mode
field in the struct refs_inode
, and read through man 7 inode
for details about the symbolic constants like S_IFDIR
, S_IRWXU
, S_IRWXG
, and S_IRWXO
). However, for now, you may want to just assume your files/directories are as “open” as possible (i.e., all directories that exist can be assumed to have R/W/X permissions).mkdir()
To create a directory, we need to:
readdir()
This is the most challenging function to implement because it requires using a FUSE function, called filler()
, which is does not have superb documentation. When implementing readdir()
, a great resource is the HMC documentation.
The following C library functions may prove useful when implementing your ReFS functionality.
dirname()
and basename()
strdup()
to create a copystrtok()
strtok_r()
strcmp()
, strlen()
, and strcpy()
: we saw that these are not appropriate to use for arbitrary file data, but all paths in our system are null-terminated strings.You may find yourself writing functions that return a value, such as an inode number or an address to a structure, that do not let you express an “error number” if your function fails. For example, if I wrote a function that returned the inode number for some child in a directory:
then I could not return, for example, -ENOENT
if there was not child named child_path
inside the directory with inode parent_dir
(to see why, consider the value of -2
if it were interpreted as an unsigned integer… it would be very large).
Instead, a useful strategy is to separate the mechanism for signaling success/failure from the mechanism for communicating our result:
int get_child_inum(struct refs_inode *parent_dir, const char *child_path, uint64_t *child_inum) {
// on success, return 0 after setting child_inum to the appropriate value:
*child_inum = inum;
return 0;
// on failure, return the appropriate error code, and leave child_inum unchanged:
return -ENOENT;
}
For an example of this, you can even look at the Linux source code for the function link_path_walk()
, which is the main function that does pathname resolution. The result is communicated by updating the pointer to the structure struct nameidata *nd
, and the status is communicated through the return value.
Submit your code (it should be inside a single file named ReFS.c
) to your git repository. If you implement any additional features, be sure to mention them (prominently?) in your README.md
file so that I see them.
I would like everyone to use a “new” feature when submitting this lab: git tags. A “tag” is essentially a label for a specific commit. You can create a tag from the command line (man git-tag
for more details):
This creates a name for the current commit (the name is partI
), and attaches the message “done part 1” to that tag. This feature may help you to see what has changed since you’ve completed your directory operations by using git diff
.
When you have completed your Lab 3a, please do two things:
Since you will continue to work on code in the same repository for Part II, the tag will make sure that I test and give feedback on the correct point in your lab.
This lab borrows heavily in from an assignment created by Geoff Keunning.