This unit assumes you know a few concepts that are covered elsewhere. In particular, you should be able to answer the following questions at a high level:
What is a process?
What does it mean for the OS to perform a “context switch”?
At some point, all persistent data is eventually stored on some I/O device. (There are many types of devices, but in this course we will focus on storage devices, like hard disk drives (HDDs) or solid state drives (SSDs).)
Devices can connect to the CPU using a variety of physical interfaces, each with different bandwidth limits. At a high level, we want to reserve the fastest buses for the devices that can fully utilize their available bandwidth—physical space on the motherboard and proximity to the CPU makes the highest-speed buses a scarce resource.
Our OS code chooses among several strategies when interacting with devices. The goal of this chapter is to understand the landscape of choices.
What choices do we make when a device is slow? (e.g., how do we connect to it, how do we communicate with it?)
What choices do we make when a device is fast? (e.g., how do we connect to it, how do we communicate with it?)
How does the Linux software storage stack’s design accommodate such a wide variety of devices?
Connections/Interfaces
Different elements of the memory hierarchy are connected to the CPU using different physical interfaces. The physical interfaces often support different software interfaces (commands) for communicating with the devices and/or have different maximum throughputs.
Main memory (RAM) is connected to the CPU via a memory bus.
Up to a few fast devices (e.g., a GPU) are connected to the CPU via an I/O bus (e.g., Peripheral Component Interconnect Express (PCIe), Accelerated Graphics Port (AGP)).
Non-volatile memory express (NVMe) is a relatively new I/O interface that was designed specifically for high-end non-volatile memory devices (i.e., expensive low-latency SSDs). Since NVMe is relatively new, we will not go into much detail about it. The important high-level ideas are that NVMe devices connect directly to the PCIe bus, and the NVMe interface is optimized specifically for NVM storage devices.
There are potentially many “slow” devices connected to the peripheral bus. In terms of storage devices, hard drives are often connected via SCSI or SATA interfaces to the peripheral bus.
Connections Questions
Why isn’t everything just connected to the fastest bus? What are the practical/design constraints that determine how many and which devices are connected in which ways?
Devices
Every device exports some hardware interface, which describes a format/protocol that the OS may use to communicate with it. Devices also have their own internal structures: for example, some devices have their own CPUs and memory inside them, some have moving parts, etc.
Firmware the term for software that runs inside a device. Storage device firmware manages the internal physical hardware and implements the hardware interface. Actual firmware details are often hidden from device users (in other words, firmware code is often proprietary and closed-source), and firmware is often fixed (in other words, once you buy a device you can’t update its firmware).
Device Questions
What are some example tasks that firmware on a storage device might need to perform? To answer this question, you may want to choose some example hardware device and describe an task you’d want to use that device for. For example, how would you read a block from a particular offset on a hard disk drive?
What types of devices might have complex firmware?
In comparison, what types of devices might be “straightforward” or “dumb”?
Interacting with Devices
The hardware interface gives the OS a vocabulary to use when communicating with a device, but it doesn’t define an algorithm for “structuring the conversation”. There are two important techniques that the OS uses when conversing with device: polling and interrupts.
Polling is a common technique for waiting on I/O. When polling, the current OS process issues a request to the device, then repeatedly checks the device status, waiting for a particular state or action (is it BUSY, or has it finished my request?).
Interrupts allow the CPU to switch tasks while the requesting process is waiting for a slow I/O device to complete an operation. Here is how it works: First, some OS process issues an I/O request. Then the OS context switches to another process while the I/O device does some work. When the I/O device finally completes the request, the I/O device issues a hardware interrupt to the CPU. When the CPU notices that an interrupt was raised, the CPU stops its current task and jumps to the designated interrupt handler (i.e. code registered by the OS to finish any work necessary to complete/verify the I/O request). Then the OS can wake up the original task to resume working, since its request is now ready.
Coalescing is when the device waits a moment before issuing a hardware interrupt so that a single interrupt may be used to “complete” multiple requests. In this way, the OS is interrupted a fewer number of times to complete the same amount of work.
One goal we should keep in mind when communicating with a device is to maximize the utilization of hardware. This can be challenging. In order to communicate with a storage device, we often need to copy large amounts of memory to and from the device, and this is a task that might fall on the CPU. Copying memory is not exactly the best way to spend the CPU’s cycles because it is such a straightforward task: all you need to know is a source address, a destination address, and how much memory to copy. Instead of wasting CPU cycles on this trivial task, we would ideally offload this work.
Direct Memory Access (DMA) is a way for the system to avoid some of the overhead associated with I/O. The CPU gives a specialized piece of hardware the location of some memory (source) and the location of a device (destination); the hardware then copies the data without any mediation by the CPU. Thus, the CPU is able to perform other work while the data copy completes.
Polling vs. Interrupt Questions
What is a major disadvantage of polling?
Interrupts solve one problem with polling: the CPU can do other work while waiting for an I/O to complete, rather than busy-waiting for the device. When might polling actually be more efficient than using interrupts?
When would you want to use interrupts, and when would you want to use polling? Is there a way to get the best of both worlds?
Device Drivers
To keep the OS code as general as possible, device drivers occupy the lowest level of the Linux software storage stack. There are advantages and disadvantages to an OS designed in this way.
One advantage is that relatively little code needs to be changed to support a new device (or a new class of devices).
The kernel is comprised of generic layers with software hooks (a well-defined set of functions called at specific code points that let designers insert/change functionality) that the device driver overrides with its own implementation. The device driver then registers its specific functions with the OS.
When a user compiles their kernel, they only need to include the device drivers for the specific devices that they own (or wish to support). This lessens software bloat.
One disadvantage is that device drivers, which are isolated code chunks that are very specific to a piece of hardware, are often sources of bugs. Since device drivers are part of the OS kernel, they run at the highest privilege levels. If a device driver has a bug, it can compromise your entire system.
Device Driver Questions
If you look at the Linux software stack diagram (OSTEP figure 36.4, 9th page of chapter 36), file systems sit above the generic block layer, and device drivers sit below the generic block layer.
What is one advantage of the OS’s strict layering?
What is one disadvantage of the OS’s strict layering?
Imagine you could completely redesign the OS. How might you strike a balance between this tension?
One trend we are seeing is that devices are becoming more and more powerful internally, often with a significant amount of on-board memory and processing. How might we take advantage of this? Does this mirror any other (hardware or otherwise) trends in computer science?