B^Ɛ-trees

CSCI 333 :: Storage Systems

Spring 2021

Overview

B^Ɛ-trees, like LSM-trees are an example of a write-optimized dictionary. By tuning B^Ɛ-tree parameters, B^Ɛ-trees present a range of points along the optimal read-write performance curve.

Learning Objectives

Be able to describe the way that B^Ɛ-tree operations are performed, including upserts
Be able to describe the asymptotic performance of B^Ɛ-tree operations
Be able to describe the affects of changing B and Ɛ.
Be able to compare B^Ɛ-trees to B-trees and LSM-trees

Operations

B^Ɛ-trees implement all of the standard dictionary operations:

insert(k,v)
v = search(k)
{(k_i,v_i), … (k_j, v_j)} = search(k₁, k₂)
delete(k)

But they add a new operation:

upsert(k, ƒ, 𝚫)

Upsert is a portmanteau of “update” and “insert”.

Upserts

Upserts provide a callback function ƒ and a set of function arguments 𝚫, and the callback function is applied to the value associated with the target key.

Upserts provide a general mechanism for encoding updates, but an important use case is performing blind updates. With upserts, users can avoid the need for a read-modify-write operation; instead, an upsert can encode a change as a function of the existing value.

What type of operations can be naturally encode using an upsert message?

Messages

Internal B^Ɛ-tree nodes contain a buffer for messages. Messages are updates destined for a target key. Messages are inserted into the root of the B^Ɛ-tree, and flushed towards the leaves. When a message reaches its target leaf, the message is applied, and the resulting key-value pair is written.

Tuning Performance

B^Ɛ-trees give users two knobs to turn: B and Ɛ.

B is generally large (2-8 MiB or more)
- Using large nodes make range queries fast — one seek per B bytes incentivizes large leaf nodes.
- Batching reduces the write amplification problem of using large nodes in standard B-trees.
Ɛ must be between 0 and 1
- asymptotic analysis is often easier at 1/2)
- In practice, you often pick a maximum fanout rather than strictly choosing Ɛ
- A large fanout makes the tree “short and fat”

Thought Questions

How does the batch size affect the cost of an insert operation?
How does setting Ɛ=1 affect:

read performance?
update performance?

How does setting Ɛ=0 affect:

read performance?
update performance?

What data structures correspond to each of those settings?
How does a large B affect B-tree:

read performance?
update performance?

How does a large B affect B^Ɛ-tree:

read performance?
update performance?

How does caching play into B^Ɛ-tree performance? (Hint: where does most of the data live?)
Compare a B^Ɛ-tree to an LSM tree.

How does compaction compare to flushing?
How do the two data structures compare for point queries?
How do the two data structures compare for range queries?
How would an LSM-tree perform in a workload with lots of upserts?