Assignment 3a: Directories

CSCI 333 : Storage Systems

Spring 2021

Learning Objectives

You have become (relatively) familiar with FUSE though your “Hello FUSE” pseudo-filesystem implementation. So far, you have successfully created a simplified file system that contains a single in-memory file.

In this lab, we will implement FS behavior to support the creation, deletion, and modification of files and directories.

Through this lab, we will explore the challenges of managing persistent data in a file system.

After completing this lab (parts A and B), you will:

Overview

Developing a working file system is very hard. For that reason, the assignment is divided into two parts. In part I (this part), you will develop a lot of scaffolding and enough code for your filesystem to do something testable. In part II (the next part), you will complete the filesystem.

Assignment Logistics

Repositories & Starter Code

For this assignment, you are again strongly encouraged to work in a group. If your unique situation makes collaboration impossible, please let me know.

Like last lab, each student will be given a private repository under the Williams-CS GitHub organization, and only you will have read/write access to that repository. This repository will contain a copy of the starter code for the lab. You may commit and push code to this repository if that is helpful, but the main purpose of your private repository is so that you may write your own Eval.md (each group member must submit a private Eval.md).

Please submit the following group preference form by Wednesday at 9am Eastern.

I will create shared repositories for your team’s code development. You should be committing your code to this repository as you make progress. I highly recommend that you commit code early and often: it will only help because the teaching staff can view your code and easily answer questions using the GitHub interface.

Collaboration

Since your team will be submitting a single repository with shared code, teammates may collaborate without restriction. In fact, I strongly encourage you to pair program whenever possible (Zoom screen sharing makes this much easier to schedule). The code you will write for this lab is significantly longer than the code you wrote last lab, and it requires significantly more thought & planning before you can begin. So it is important that you completely understand and are involved in the design, and I suggest writing a “formal” design doc. But once you have decided upon your concrete design and created well-defined function specs, you should be able to implement functions independently based on that spec. In other words, spend time together planning before you divide and conquer. (Personally, I strongly prefer to pair program throughout—I find that to be the most time-efficient and rewarding way to write correct code. But I realize that not everyone shares this view.)

Tips for pair programming:

You may also discuss high-level questions with other classmates. High level questions include:

The rule of thumb is that you should never view any classmate’s code if they are not on your team, but you can (and are encouraged to!) work through ideas together.

The Assignment Spec

Your assignment is to develop a “Reference FS”, which we will call ReFS. Your task for Part I is to:

Required Functionality

By the end of Part I, your ReFS should supports the following features:

Why this particular set of features? It’s the minimum set of necessary operations to have a filesystem where you can do something visible: create and list directories. You’ll find that you need to create quite a bit of scaffolding to get that far (in particular, the code that creates an initialized ReFS filesystem from scratch). We will discuss much of this scaffolding in conference.

Testing ReFS

The “completeness” test of your filesystem should be that you can:

You are encouraged to write more comprehensive tests (we will in Part II), but testing these directory operations is harder to script than the “file” tests. For now, we just want to get through our implementation of the FS foundation.

ReFS Design

When I refer to a “ReFS”-like filesystem, I mean the following:

For reference, since my minimal implementation used a 4096 byte block size I used a #define symbolic constant to document the meaning, and then I used the following union to make sure my superblock is padded to the size of a filesystem block:

   #define BLOCK_SIZE 4096
   
   union {
       struct simple_superblock s;
       char pad[BLOCK_SIZE];
   }  superblock;

I also found it useful to create a few macros to do things like seeking to a particular block, converting back and forth between byte offsets and block numbers, checking/setting a bitmap bit, etc.

Some of these “stubs” are included in your starter code. You should, of course, feel free to modify those function definitions and add your own helper functions as needed.

Important Notes

Note: You are supposed to be writing a real filesystem. The only differences from a true implementation of a ReFS should be:

To mimic a real filesystem, your implementation must satisfy the following criteria:

Evolving Advice (check back for updates)

You should think carefully about your plan before you start. In this section, I hope to give some advice that will help you plan. Before you start, you should be able to answer:

Example Plan of Attack

refs_init()/refs_destroy()

getattr()

access()

mkdir()

To create a directory, we need to:

  1. Resolve the path to get the parent directory’s inode
  2. Scan the parent directory’s data for an existing file

readdir()

This is the most challenging function to implement because it requires using a FUSE function, called filler(), which is does not have superb documentation. When implementing readdir(), a great resource is the HMC documentation.

Useful C Functions

The following C library functions may prove useful when implementing your ReFS functionality.

Programming strategies

You may find yourself writing functions that return a value, such as an inode number or an address to a structure, that do not let you express an “error number” if your function fails. For example, if I wrote a function that returned the inode number for some child in a directory:

uint64_t get_child_inum(struct refs_inode *parent_dir, const char *child_path) {
   ...
}

then I could not return, for example, -ENOENT if there was not child named child_path inside the directory with inode parent_dir (to see why, consider the value of -2 if it were interpreted as an unsigned integer… it would be very large).

Instead, a useful strategy is to separate the mechanism for signaling success/failure from the mechanism for communicating our result:

int get_child_inum(struct refs_inode *parent_dir, const char *child_path, uint64_t *child_inum) {
   // on success, return 0 after setting child_inum to the appropriate value:
   *child_inum = inum;
   return 0;
   
   // on failure, return the appropriate error code, and leave child_inum unchanged:
   return -ENOENT;
}

For an example of this, you can even look at the Linux source code for the function link_path_walk(), which is the main function that does pathname resolution. The result is communicated by updating the pointer to the structure struct nameidata *nd, and the status is communicated through the return value.

Submission

Submit your code (it should be inside a single file named ReFS.c) to your git repository. If you implement any additional features, be sure to mention them (prominently?) in your README.md file so that I see them.


I would like everyone to use a “new” feature when submitting this lab: git tags. A “tag” is essentially a label for a specific commit. You can create a tag from the command line (man git-tag for more details):

 $ git tag -m "done part I" partI

This creates a name for the current commit (the name is partI), and attaches the message “done part 1” to that tag. This feature may help you to see what has changed since you’ve completed your directory operations by using git diff.


When you have completed your Lab 3a, please do two things:

Since you will continue to work on code in the same repository for Part II, the tag will make sure that I test and give feedback on the correct point in your lab.


This lab borrows heavily in from an assignment created by Geoff Keunning.