BƐ-trees

CSCI 333 :: Storage Systems

Spring 2021

Overview

BƐ-trees, like LSM-trees are an example of a write-optimized dictionary. By tuning BƐ-tree parameters, BƐ-trees present a range of points along the optimal read-write performance curve.

Learning Objectives

Operations

BƐ-trees implement all of the standard dictionary operations:

But they add a new operation:

Upsert is a portmanteau of “update” and “insert”.

Upserts

Upserts provide a callback function ƒ and a set of function arguments 𝚫, and the callback function is applied to the value associated with the target key.

Upserts provide a general mechanism for encoding updates, but an important use case is performing blind updates. With upserts, users can avoid the need for a read-modify-write operation; instead, an upsert can encode a change as a function of the existing value.

  1. What type of operations can be naturally encode using an upsert message?

Messages

Internal BƐ-tree nodes contain a buffer for messages. Messages are updates destined for a target key. Messages are inserted into the root of the BƐ-tree, and flushed towards the leaves. When a message reaches its target leaf, the message is applied, and the resulting key-value pair is written.

Tuning Performance

BƐ-trees give users two knobs to turn: B and Ɛ.

Thought Questions

  1. How does the batch size affect the cost of an insert operation?
  2. How does setting Ɛ=1 affect:
  1. How does setting Ɛ=0 affect:
  1. What data structures correspond to each of those settings?
  2. How does a large B affect B-tree:
  1. How does a large B affect BƐ-tree:
  1. How does caching play into BƐ-tree performance? (Hint: where does most of the data live?)
  2. Compare a BƐ-tree to an LSM tree.