July & August journal: thread-safe toykv, fancier ai-codeexplorer

In June Journal, I left off with a thought to make toykv thread-safe and to implement simple compaction. I did get to thread-safety, which took a while, but I didn’t get to compaction. I did make several other improvements, however.

I also did some work on ai-codeexplorer. Here, I now have a work in progress UI using textual, a python terminal UI toolkit. I haven’t merged it to master yet as it’s incomplete. It was interesting to work with an event driven UI toolkit again, however, it’s been quite a long time since I’ve done that.

Overall, it wasn’t as productive as previous months, but school holidays do that to a person; there’s plenty of other things to do!

toykv

The work to make toykv async safe (Implement support for concurrent reads/writes · Pull Request #17) ended up being a lot of changes. I had a first failed attempt that destroyed performance, even after I added some fancy caching; the underlying architecture I’d used was just broken. So the following talks about the second attempt:

It was quite easy to make the get method thread-safe, although I did have to get a lot more comfortable with Rust’s Arc while doing this.
- To make the sstable iterator thread-safe, I updated the lowest level iterator in the stack to wrap its file handle in a lock, so the file reading code is effectively single threaded. This allowed the iterators to be shared easily using various Arc wrappers in each layer of iterators.
- I cheated a bit for the memtable, and swapped out the BTreeMap in std::collections for a thread-safe map, crossbeam_skiplist. This meant I didn’t have to do the RwLock of the BTreeMap myself, and the crossbeam approach is undoubtably faster.
The scan method was much harder, and I learned quite a bit about borrowing during this process. Scan from the sstables on disk was easy, because I’d already solved the problem of sharing references to sstables and iterating them for the get method.
What ended up being hard was the reference to the crossbeam skipmap, the memtable, because I couldn’t figure out how to let Rust know that the skipmap would continue to exist for the length of time the consumer of the can iterator was using it — as the iterator was passed back to the consumer, who could do what they liked with it.
Even my new best friend, Arc, couldn’t fix this for me.
- In the end, I needed the ouroboros crate, which does some unsafe stuff to allow this to work. But in banging my head against the rust compiler trying to do it myself, I think I learned a lot — in particular, I (mostly) understood why I needed something more fancy than I could build myself.
- Various other little things, like making the metrics atomics.

Overall this turned into a +532 −212 oddessy — quite a chunk in a 6,692 lines-of-rust project. The fact I had to throw away my first attempt was a big chunk of the reason I took a bit of a break from toykv after it was done.

While working on this, I found an amazing book on low-level rust async: Rust Atomics and Locks by Mara Bos. Recommended.

Other toykv improvements

I did several other things before I took a break from toykv:

Use a min-heap for MergeIterator · Pull Request #20.
I was really pleased with this rewrite of my MergeIterator to use a min-heap approach, which is the “textbook implementation” of this kind of thing in a database. It gave a 10-15% speed up for my simple 100-item scan benchmark.
Basic frozen memtable implementation · Pull Request #21
This implemented an active and frozen memtable pair. Eventually, this is designed to decrease write stalls by having the frozen memtable written out to an on-disk sstable in the background while the active memtable still accepts writes.
Again, I wrote an awful first version, where I tried to go fully async from the start and tied myself in knots. So the second version is simpler, but still needs expanding into asysc to get the intended speedup. One issue here is that I haven’t gotten to grips with async testing in Rust yet.
The frozen memtable is a good piece of practice work towards compaction, so I guess I can say I did something towards that 😅
Push down upper bounds check to table iterators · Pull Request #23.
This was a nice little refactoring that removed an ugly wart from the MergeIterator. Previously, the MergeIterator was able to push down the starting bound to the underlying table iterators, but implemented the ending bound itself. This PR updated the table iterators to support the ending bound too, removing an odd implementation choice and making the underlying iterators handle both bounds.

ai-codeexplorer

I didn’t extend the functionality of ai-codeexplorer, instead I experimented with changing the user experience. I think I like it.

I used the textual library, which is a TUI (terminal UI) toolkit written in python. The rich library I previously used for ai-codeexplorer is a part of textual. While rich gives nice output from a terminal application, textual gives you a full UI experience. Even scrollbars and buttons!

Anyway, so textual allows for a fuller terminal experience for ai-codeexplorer. It fills the terminal and has a bunch of affordances, such as having tool calls in foldable sections.

I actually think it looks a little uglier than it did before — I was quite attached to the way rich rendered things — but I think that the UX feels better in use. I was able to easily add a “chat” style UI where you can make further prompts after the model’s turn has finished.

It’s not yet merged as it has a lot of rough edges — I’m only really just starting to get to grips with textual’s event model and ways to break code apart into self-contained units. In the UI itself, the tool call foldables also don’t feel very well spaced against the model commentry around them. The chatbox at the bottom also only displays input on a single line, that needs fixing. The Stop button doesn’t work most of the time. So, yes, quite broken really!

Beyond the UI overhaul, I still think ai-codeexplorer is missing a few key tools:

grep to allow the AI model to search code. Without that, it wastes quite a lot of tokens reading files.
glob_files to allow finding and/or reading files by glob. Useful for reading all the files in a given package/module/crate.
git to allow the AI model to evaluate diffs and make commits. I’d like to have ai-codeexplorer have some kind of built in “review my diff” functionality.
Perhaps fetching web pages, to allow you to ask for how to do things with unfamilar libraries.
I’m quite reluctant to add a generic execute in terminal style command, although I think it would immediately expand the AI’s capabilities. I think I’d be happier to do this in a container, and I wonder about apple/container for this, with a specially built container containing dev tools only, and no network capability.

I wrote down this list because when I’ve done this in previous months, I’ve ended up coding what I wrote down. Not sure that trick will work this time!

Zed has a nice selection of tools, quite simple but enough to get things done.

PostJuly & August journal: thread-safe toykv, fancier ai-codeexplorer

toykv

Other toykv improvements

ai-codeexplorer

Post
July & August journal: thread-safe toykv, fancier ai-codeexplorer