BƐ-trees, like LSM-trees are an example of a write-optimized dictionary. By tuning BƐ-tree parameters, BƐ-trees present a range of points along the optimal read-write performance curve.
Learning Objectives
Be able to describe the way that BƐ-tree operations are performed, including upserts
Be able to describe the asymptotic performance of BƐ-tree operations
Be able to describe the affects of changing B and Ɛ.
Be able to compare BƐ-trees to B-trees and LSM-trees
Operations
BƐ-trees implement all of the standard dictionary operations:
insert(k,v)
v = search(k)
{(ki,vi), … (kj, vj)} = search(k1, k2)
delete(k)
But they add a new operation:
upsert(k, ƒ, 𝚫)
Upsert is a portmanteau of “update” and “insert”.
Upserts
Upserts provide a callback function ƒ and a set of function arguments 𝚫, and the callback function is applied to the value associated with the target key.
Upserts provide a general mechanism for encoding updates, but an important use case is performing blind updates. With upserts, users can avoid the need for a read-modify-write operation; instead, an upsert can encode a change as a function of the existing value.
What type of operations can be naturally encode using an upsert message?
Messages
Internal BƐ-tree nodes contain a buffer for messages. Messages are updates destined for a target key. Messages are inserted into the root of the BƐ-tree, and flushed towards the leaves. When a message reaches its target leaf, the message is applied, and the resulting key-value pair is written.
Tuning Performance
BƐ-trees give users two knobs to turn: B and Ɛ.
B is generally large (2-8 MiB or more)
Using large nodes make range queries fast — one seek per B bytes incentivizes large leaf nodes.
Batching reduces the write amplification problem of using large nodes in standard B-trees.
Ɛ must be between 0 and 1
asymptotic analysis is often easier at 1/2)
In practice, you often pick a maximum fanout rather than strictly choosing Ɛ
A large fanout makes the tree “short and fat”
Thought Questions
How does the batch size affect the cost of an insert operation?
How does setting Ɛ=1 affect:
read performance?
update performance?
How does setting Ɛ=0 affect:
read performance?
update performance?
What data structures correspond to each of those settings?
How does a large B affect B-tree:
read performance?
update performance?
How does a large B affect BƐ-tree:
read performance?
update performance?
How does caching play into BƐ-tree performance? (Hint: where does most of the data live?)
Compare a BƐ-tree to an LSM tree.
How does compaction compare to flushing?
How do the two data structures compare for point queries?
How do the two data structures compare for range queries?
How would an LSM-tree perform in a workload with lots of upserts?