Build your own Database Index: part 2

This is part 2 in a short series about writing a simple index for JSON documents. This index should automatically index JSON documents and enable a relatively simple set of queries over those documents, purely so I can learn more about the problems one encounters. The planned parts:

  1. Encode primitive values.
  2. Add field names to our index. (That’s this part).
  3. Index JSON documents
  4. Update and delete documents from the index
  5. Find documents, extract a query planner and do an optimisation pass.

As a reminder, we want to take this JSON document:

{ "a": { "b": 12, "c": "foo" } }

And be able to find it using queries like:

"a.b" == 12
"a.c" == "foo"
"a.b" < 400
"a.c" > "e"

In part 1, we talked about how to take the primitive values in a JSON document — null, booleans, numbers and strings — and encode them into byte arrays in a way that a natural ordering between the values is maintained. This is the first step in creating the keys we need to be able to search for documents in our key-value store, PebbleDB.

In this part, we’ll see how to add the field name to this encoding. Then we’ll briefly cover how that enables equality and range queries. Let’s get going.

Finding fields

If we want to answer queries like "a.c" == "foo" or "a.c" > "foo" efficiently, we need to be able to quickly find values for given fields within the index. Our key-value store can quickly find things by key, so that means we need to encode the field name as well as the value in the key.

If we were only looking at equality, we could put the field name and value either way around. But to support range queries, we need the field name first. This is because the key-value store can do range queries over key prefixes. Therefore, to retrieve all values for a given field efficiently, we need that prefix to be the field name.

To do this, we have to work out how to encode the field and value into the same byte array key.

Encoding multiple values into a key

The problem that we have is that we don’t want the field name and value to “bleed together” in the key. To see why, let’s take a trivial two document example. Each document is just single field with a string value:

{"foo": "bar"}
{"foob": "ar"}

If we naively prefixed the value with the field, we’d end up with the same key in our index for both documents, even though we have both a different field name and value:

 field   value
  /‾‾‾\ /‾‾‾\
[ f o o b a r ]
[ f o o b a r ]
  \_____/ \_/
   field   value

(Throughout this post, for simplicity, I’ve represented the bytes for the strings as their characters).

We can’t tell from the key what the field name is and what the value is. We need a way to split them up. A common way to do this is to use a zero-byte:

 field      value
  /‾‾‾\    /‾‾‾\
[ f o o 00 b a r ]
[ f o o b 00 a r ]
  \_____/    \_/
   field      value

We can write a wrapper to pack and unpack this zero-separated key, and use the name “tuple” to talk about our key structure:

var sep []byte = []byte{0}

// packTuple packs a set of components into a packed byte array
// representation. Use unpackTuple to unpack.
// 0x00 is used as the separator.
func packTuple(components ...[]byte) []byte {
	return bytes.Join(components, sep)
}

// unpackTuple unpacks packed into its components.
func unpackTuple(packed []byte) [][]byte {
	return bytes.Split(packed, sep)
}

And, goodness, we’re already done

We’ve created all the things we need for our index keys to include field names.

For completeness, we’ll say that we’ll use field name paths as we’ve been informally writing them, a.c. This is directly encoded as []byte("a.c"). We will take the encoding from part 1 for the values: a “tag” for the type and an encoding of the value itself.

From this, we create the key using:

const JSONTagString byte = 0x2c

field := []byte("a.c")
value := append([]byte{JSONTagString}, []byte("foo")...)
key := packTuple(field, value)

As a byte array, this produces:

[ a . c 00 2c f o o ]

From this key, it’s easy to see how each field and its values will be grouped together as ranges of keys within the key-value store. The values for a.b would exist in a separate range.

As a fuller example, three documents with the shape {"a": {"pet": "dog", "food": "pasta"}} might index as follows:

[ a . f o o d 00 2c a p p l e ]
[ a . f o o d 00 2c s p a g h e t t i ]
[ a . f o o d 00 2c y a m ]
[ a . p e t 00 2c a a r d v a r k ]
[ a . p e t 00 2c g i r a f f e ]
[ a . p e t 00 2c p a n g o l i n ]

We can find the entry for "food" == "yam" with a single key query, just requesting the key-value store give us the value for the key [ a . f o o d 00 2c y a m ]. In contrast, the query for the key [ a . f o o d 00 2c h a m ] would return no results as expected.

To find all the values for a given field, we ask the key-value store for all keys in a range. As our field separator is always 00, we can use 01 in the upper bound to retrieve all entries for a given field:

lower bound (inclusive): [ a . f o o d 00 ]
upper bound (exclusive): [ a . f o o d 01 ]

This would retrieve the keys for apple, spaghetti and yam. Note that we have to use an inclusive lower bound but an exclusive upper bound.

For the range query "a.food" > "ham", we’d use:

lower bound (exclusive): [ a . f o o d 00 2c h a m ]
upper bound (exclusive): [ a . f o o d 01 ]

This would get us spaghetti and yam. In this query both lower and upper bounds are exclusive, as our query is greater-than. If the query was greater-than-or-equal-to, it would instead need an inclusive lower bound.

We’ve used strings in our examples because they are easier to read in a blog post, but this will work for any of the primitive values we encoded in part 1.

Next up

After part 1, we were able to query purely for values in a standalone fashion (ie, 12 or "foo"). Now we can search for values in particular fields. But what we haven’t done is connected this to any way of taking the keys that our query might return and map those results to the actual documents. An index that doesn’t tell you the documents your query maps to is pretty useless.

So we’ll come to that next time.

← Older
Build your own Database Index: part 1
→ Newer
Link: Choose boring culture