Assignment 3a: Directories

CSCI 333 : Storage Systems

Spring 2021

Learning Objectives

You have become (relatively) familiar with FUSE though your “Hello FUSE” pseudo-filesystem implementation. So far, you have successfully created a simplified file system that contains a single in-memory file.

In this lab, we will implement FS behavior to support the creation, deletion, and modification of files and directories.

Through this lab, we will explore the challenges of managing persistent data in a file system.

After completing this lab (parts A and B), you will:

be able to describe the on-disk format for a functional file system
be able to map logical file system abstractions onto a physical address space
create C structures that represent the key components of file system metadata, including the superblock, inode, and allocation structures
implement pathname resolution
implement logic to enforce file permissions
implement logic to list the contents of directories
implement logic to create and delete directories

Overview

Developing a working file system is very hard. For that reason, the assignment is divided into two parts. In part I (this part), you will develop a lot of scaffolding and enough code for your filesystem to do something testable. In part II (the next part), you will complete the filesystem.

Assignment Logistics

Repositories & Starter Code

For this assignment, you are again strongly encouraged to work in a group. If your unique situation makes collaboration impossible, please let me know.

Like last lab, each student will be given a private repository under the Williams-CS GitHub organization, and only you will have read/write access to that repository. This repository will contain a copy of the starter code for the lab. You may commit and push code to this repository if that is helpful, but the main purpose of your private repository is so that you may write your own Eval.md (each group member must submit a private Eval.md).

Please submit the following group preference form by Wednesday at 9am Eastern.

I will create shared repositories for your team’s code development. You should be committing your code to this repository as you make progress. I highly recommend that you commit code early and often: it will only help because the teaching staff can view your code and easily answer questions using the GitHub interface.

Collaboration

Since your team will be submitting a single repository with shared code, teammates may collaborate without restriction. In fact, I strongly encourage you to pair program whenever possible (Zoom screen sharing makes this much easier to schedule). The code you will write for this lab is significantly longer than the code you wrote last lab, and it requires significantly more thought & planning before you can begin. So it is important that you completely understand and are involved in the design, and I suggest writing a “formal” design doc. But once you have decided upon your concrete design and created well-defined function specs, you should be able to implement functions independently based on that spec. In other words, spend time together planning before you divide and conquer. (Personally, I strongly prefer to pair program throughout—I find that to be the most time-efficient and rewarding way to write correct code. But I realize that not everyone shares this view.)

Tips for pair programming:

Discuss together and agree on a plan before you begin typing
- What piece of functionality will you attempt next?
- What functions/structs do you need to implement that functionality?
- Why will perform what role for the upcoming task?
- How will incrementally test the functionality?
Swap roles after every “unit” of functionality (function/task):
- talker(s)
- typer
When debugging using GDB, the talker should advise the typer on what to do. The typer should never execute commands unilaterally; talk it out.

You may also discuss high-level questions with other classmates. High level questions include:

The names of useful C functions that help with particular tasks
Questions and ideas about the parameters of your on-disk format. e.g.,
- How did you decide how many inodes are in your inode table?
- How did you go about encoding variable-sized pathnames in your directory?
- What order does it make sense to write the data/data structures to disk?
- etc.
Questions about utility functions and scaffolding. e.g.,
- How did you decide which disk block contains a given inode number?
- Do you have a particular order that you allocate new data blocks in your data bitmap? Why or why not?
- How did you ensure that no inode straddles a disk block boundary?
Clarifications about the FUSE of file system API
Clarifications about the assignment
Strategies for testing
Any questions about C, Make, gcc, GDB, bash or your programming environment. e.g.,
- What is a union and why do they keep showing up in this assignment?

The rule of thumb is that you should never view any classmate’s code if they are not on your team, but you can (and are encouraged to!) work through ideas together.

The Assignment Spec

Your assignment is to develop a “Reference FS”, which we will call ReFS. Your task for Part I is to:

define the “on-disk format” for ReFS
implement support creating and navigating directories

Required Functionality

By the end of Part I, your ReFS should supports the following features:

The general structure of the filesystem is similar to the ReFS Design described below.
The filesystem supports the following operations, at a minimum: getattr, access, readdir, and mkdir. (Note that at this point, it is not necessary to support file I/O, or even regular files. We are only concerned with directories in Part I.)
- Your mkdir operation must allocate (set appropriate bits in your allocation bitmaps) and initialize (fill with valid contents) an inode in the inode table and a data block from the data area. (Your mkdir should also persist these structures and the associated allocation bitmap structures to complete the operation.)
Your filesystem should be backed by a SINGLE pre-allocated 10-MB file. That file should have a fixed name and exist on a non-FUSE file system. For example, you may have a file named “ReFS_disk” in your project directory (i.e., it lives outside your FUSE file system). The size of the file should be a #defined constant (this way you can easily change it later). Remember to watch out for the working-directory gotcha.
When the filesystem is invoked, if the backing file doesn’t exist, the backing file should be created and initialized (you probably have a fixed layout of your disk that divides it into a superblock, bitmap(s), an inode region, and a data region, so be sure to write the initial state of your metadata structures to their appropriate regions on disk so that your file system is usable). However, if the backing file does exist, the backing file should be used and its previous contents should be visible (i.e., populate your in-memory data structures by reading the contents of the on-disk versions that you have persisted)
When a mutating operation occurs, that operation’s effects must be immediately visible in the backing file. (This means that you can’t do everything in memory and then wait until exit time to write file system state out. You can test this feature by killing your process with SIGKILL (ctrl-C does this if you run your FS with the -f flag). O_DSYNC isn’t necessary, because the operating system will make sure your data reaches stable storage unless the entire OS crashes—which isn’t part of the expected testing plan!)
Subdirectories must be supported (i.e., you can create a directory inside another directory)
Your directories may be fixed-size; it is not necessary to be able to create an arbitrary number of entries in a directory. If you do have a limit, you should document the limit somewhere.
Directory entries (paths) may also be fixed-size, as long as the name length is moderately reasonable. (Nothing under 16 characters is “reasonable” in my book; my minimal implementation compromises with a limit of 32.) Recall that a path is broken into many components; you traverse the file system one component at a time. So imposing a path-length limit of 32 characters limits the length of any one component, not the length of the entire absolute path.
If you choose, file sizes may be limited to either 2³² or 2⁶⁴ bytes (you should think about this now because it affects your inode data structure definition, but you do not need to implement regular files yet.
Other operations are up to you. We will be extending the filesystem to support files, rmdir, etc. in the next assignment part, so you are welcome to implement those things. However, they need not be tested in the current assignment.

Why this particular set of features? It’s the minimum set of necessary operations to have a filesystem where you can do something visible: create and list directories. You’ll find that you need to create quite a bit of scaffolding to get that far (in particular, the code that creates an initialized ReFS filesystem from scratch). We will discuss much of this scaffolding in conference.

Testing ReFS

The “completeness” test of your filesystem should be that you can:

initialize an empty file system
create directories
list the contents of directories (with ls -la returning reasonable results, including for "." and "..")
cd into the directories that you’ve created.
unmount your file system
view identical contents when you remount your file system (i.e., your data is persistent)

You are encouraged to write more comprehensive tests (we will in Part II), but testing these directory operations is harder to script than the “file” tests. For now, we just want to get through our implementation of the FS foundation.

ReFS Design

When I refer to a “ReFS”-like filesystem, I mean the following:

The first sector stores the superblock, which describes the filesystem-level details, including the locations of important file system structures like the bitmaps, inode region, and data region.
The inode bitmap should have one bit that represents the allocation status of each inode in your inode table.
The data bitmap should have one bit that represents the allocation status of each available data block in your data region.
The inode table should be a logical array of inode structures on disk. You should pad your structures so that no inode structure straddles a disk block boundary.
The data region should occupy whatever leftover space is on your disk.
The on-disk copy of the superblock, bitmaps, and inode table can be read at mount (file system initialization) time and are updated at your discretion (but note that your process might be killed at any time, so think about the dependencies between structures!).
All file metadata is kept in an inode structure. At a minimum, this should include:
- the file type (directory or regular file)
- the size (in bytes)
- the inode number
- the locations of the first few data blocks. (Subsequent blocks are located via an indirect block. You do not need to support doubly or triply indirect blocks).
- Other metadata, such as ownership, permissions, and timestamps, are up to you but are not required.
The block size should be 4096 bytes
Like any other file system, the on-disk data structures are stored in a single partition (for us, a 10-MiB file on our local file system) and are kept in binary (meaning the raw data structures are written; numeric values in your structures aren’t converted into ASCII numbers).

For reference, since my minimal implementation used a 4096 byte block size I used a #define symbolic constant to document the meaning, and then I used the following union to make sure my superblock is padded to the size of a filesystem block:

   #define BLOCK_SIZE 4096
   
   union {
       struct simple_superblock s;
       char pad[BLOCK_SIZE];
   }  superblock;

I also found it useful to create a few macros to do things like seeking to a particular block, converting back and forth between byte offsets and block numbers, checking/setting a bitmap bit, etc.

Some of these “stubs” are included in your starter code. You should, of course, feel free to modify those function definitions and add your own helper functions as needed.

Important Notes

Note: You are supposed to be writing a real filesystem. The only differences from a true implementation of a ReFS should be:

Your implementation is backed by a plain file in the filesystem, rather than an actual disk. This means you can use file interfaces like pread() and pwrite() access your structures on disk.
- but instead of using the actual pread()/pwrite() calls directly, you should use something like the read_block()/read_blocks() and write_block()/write_blocks() wrappers that were in the starter code
You do not need to handle things like concurrency and consistency checking. However, by making your updates synchronous, you should never have an inconsistent on-disk state.

To mimic a real filesystem, your implementation must satisfy the following criteria:

All access to the “disk” must be in multiples of the block size, which should be 4096 bytes (read_block() and write_block() should help)
Changes to files and directories must be reflected on disk immediately. It’s “cheating” (and incorrect) to save things in memory and then write them out when you unmount.
Information must persist in the backing store (your persistent file) after unmount. Thus, you should be able to remount your file system and see your data.

Evolving Advice (check back for updates)

You should think carefully about your plan before you start. In this section, I hope to give some advice that will help you plan. Before you start, you should be able to answer:

What order do you plan to implement the FUSE functions?
Are there any necessary building blocks that you will need to get started?
What existing C functions might help?

Example Plan of Attack

`refs_init()/refs_destroy()`

refs_init() does a lot of work to initialize a file system. Please take some time to familiarize yourself with that function. I suggest walking through that function using gdb as we did in conference, both to “initialize” new file system state and to read in pre-existing file system state from refs_disk.
Note that there is no corresponding refs_destroy(). What steps that are taken in refs_init() need to be “finished”?
- The disk file is opened, so we need a “matching close”.
- Metadata structures are allocated, so they need to be freed.
Implementing refs_destroy() is a good first step.

`getattr()`

As we saw in our Hello FUSE assignment, getattr() is called frequently. This is the next function I would tackle. All of the information you need to populate the fields of a struct stat is (or can be made) available in the refs_inode, but you need to be able to “find” the target inode.
Although we are not yet reading/writing file data, the examples from our unit on the VFS/Reference file system are very helpful. In OSTEP Figure 40.3, the open(bar) example shows the steps to resolving a path. This is the most useful building block for the first part of this lab.
I suggest writing a function that finds the inode/inode number of the parent directory of a path. This is a good building block for resolving a complete path, and you will find this functionality helpful in future FUSE functions.

`access()`

Once you have implemented getattr(), you have most of the infrastructure needed to implement access(). However, the starter code does not track file permissions. Adding these permissions is not difficult (add a mode_t mode field in the struct refs_inode, and read through man 7 inode for details about the symbolic constants like S_IFDIR, S_IRWXU, S_IRWXG, and S_IRWXO). However, for now, you may want to just assume your files/directories are as “open” as possible (i.e., all directories that exist can be assumed to have R/W/X permissions).

`mkdir()`

To create a directory, we need to:

Resolve the path to get the parent directory’s inode
Scan the parent directory’s data for an existing file
- If one exists, this is an error. Return the appropriate error code.
- If one does not exist,
  - allocate a new directory inode for the new dir
  - allocate a new directory data block for the new dir fill it with “.” and “..”
  - allocate a new entry in the parent’s directory contents
  - make sure all of the changes are persistent (bitmaps, inodes, data blocks, etc.)

`readdir()`

This is the most challenging function to implement because it requires using a FUSE function, called filler(), which is does not have superb documentation. When implementing readdir(), a great resource is the HMC documentation.

Useful C Functions

The following C library functions may prove useful when implementing your ReFS functionality.

dirname() and basename()
- Note: these may alter the original string, so you may wish to use strdup() to create a copy
strtok()
- Note: this function assumes that you are repeatedly “tokenizing” the same string. If you wish to interleave calls to different strings, you will need to use the “re-entrant version” named strtok_r()
strcmp(), strlen(), and strcpy(): we saw that these are not appropriate to use for arbitrary file data, but all paths in our system are null-terminated strings.

Programming strategies

You may find yourself writing functions that return a value, such as an inode number or an address to a structure, that do not let you express an “error number” if your function fails. For example, if I wrote a function that returned the inode number for some child in a directory:

uint64_t get_child_inum(struct refs_inode *parent_dir, const char *child_path) {
   ...
}

then I could not return, for example, -ENOENT if there was not child named child_path inside the directory with inode parent_dir (to see why, consider the value of -2 if it were interpreted as an unsigned integer… it would be very large).

Instead, a useful strategy is to separate the mechanism for signaling success/failure from the mechanism for communicating our result:

int get_child_inum(struct refs_inode *parent_dir, const char *child_path, uint64_t *child_inum) {
   // on success, return 0 after setting child_inum to the appropriate value:
   *child_inum = inum;
   return 0;
   
   // on failure, return the appropriate error code, and leave child_inum unchanged:
   return -ENOENT;
}

For an example of this, you can even look at the Linux source code for the function link_path_walk(), which is the main function that does pathname resolution. The result is communicated by updating the pointer to the structure struct nameidata *nd, and the status is communicated through the return value.

Submission

Submit your code (it should be inside a single file named ReFS.c) to your git repository. If you implement any additional features, be sure to mention them (prominently?) in your README.md file so that I see them.

I would like everyone to use a “new” feature when submitting this lab: git tags. A “tag” is essentially a label for a specific commit. You can create a tag from the command line (man git-tag for more details):

 $ git tag -m "done part I" partI

This creates a name for the current commit (the name is partI), and attaches the message “done part 1” to that tag. This feature may help you to see what has changed since you’ve completed your directory operations by using git diff.

When you have completed your Lab 3a, please do two things:

create a git tag for the commit that “completes” your submission
Send an email to let me know that you have finished.

Since you will continue to work on code in the same repository for Part II, the tag will make sure that I test and give feedback on the correct point in your lab.

This lab borrows heavily in from an assignment created by Geoff Keunning.