Database DIY

I have written several posts about this project, and collected them under the database-diy tag.

During 2023-24, my ongoing side project has been slowly building a very simple and naive database sort of from scratch. This is a pure learning exercise, as I wanted to have some experience of writing from the storage up.

The code I’ve written during this project — ToyKV in particular — has helped me a lot in being able to understand more deeply what I read in papers, and other databases’ source code and documentation. By diving down through levels of abstraction during this project, I’ve vastly improved the mental models I use to understand and predict the behaviours of all types of databases.

There are three main codebases I’ve written as part of this:

  • The first was a Go codebase that implements all-field indexing for JSON data and supports simple queries over that data. It’s inefficient and the next stage would be a query planner. This was mostly written late 2023.

    https://github.com/mikerhodes/eaton-docdb

  • In early 2024, I ported that Go code base to Rust. It has similar features.

    https://github.com/mikerhodes/rust-docdb

  • Both the Go and Rust docdb versions used someone else’s underlying data storage. The largest codebase in this project is toykv where I have started to build a super-naive storage engine. This is the part that’s most new to me.

    Toykv starts by defining a data format for individual key-value records, each of which is a sequence of bytes. From that it builds up an LSM-like storage format based on an in-memory memtables and on-disk sstables. I did a bunch of work on this in early 2024. Recently, in late 2024, I’ve picked this up for an hour here and there to work on a scan method, which is the key for range searching and the compaction operation that is key to LSM efficiency.

    https://github.com/mikerhodes/toykv/

Overall, all these database DIY projects progress very slowly. I’ve probably spent 30-50 hours in total, but spread over at least a year so far. But it’s one of my favourite projects, as I’ve learned a ton (including Rust!).

← Back to Programs list