In June Journal, I left off with a thought to make toykv thread-safe and to implement simple compaction. I did get to thread-safety, which took a while, but I didn’t get to compaction. I did make several other improvements, however.
I also did some work on ai-codeexplorer. Here, I now have a work in progress UI using textual, a python terminal UI toolkit. I haven’t merged it to master yet as it’s incomplete. It was interesting to work with an event driven UI toolkit again, however, it’s been quite a long time since I’ve done that.
Overall, it wasn’t as productive as previous months, but school holidays do that to a person; there’s plenty of other things to do!
toykv
The work to make toykv async safe (#17 · mikerhodes/toykv) ended up being a lot of changes. I had a first failed attempt that destroyed performance, even after I added some fancy caching; the underlying architecture I’d used was just broken. So the following talks about the second attempt:
It was quite easy to make the
get
method thread-safe, although I did have to get a lot more comfortable with Rust’s Arc while doing this.- To make the sstable iterator thread-safe, I updated the lowest level
iterator in the stack to wrap its file handle in a lock, so the file reading
code is effectively single threaded. This allowed the iterators to be shared
easily using various
Arc
wrappers in each layer of iterators. - I cheated a bit for the memtable, and swapped out the
BTreeMap in std::collections
for a thread-safe map,
crossbeam_skiplist.
This meant I didn’t have to do the
RwLock
of theBTreeMap
myself, and the crossbeam approach is undoubtably faster.
- To make the sstable iterator thread-safe, I updated the lowest level
iterator in the stack to wrap its file handle in a lock, so the file reading
code is effectively single threaded. This allowed the iterators to be shared
easily using various
The
scan
method was much harder, and I learned quite a bit about borrowing during this process. Scan from the sstables on disk was easy, because I’d already solved the problem of sharing references to sstables and iterating them for theget
method.What ended up being hard was the reference to the crossbeam skipmap, the memtable, because I couldn’t figure out how to let Rust know that the skipmap would continue to exist for the length of time the consumer of the can iterator was using it — as the iterator was passed back to the consumer, who could do what they liked with it.
Even my new best friend,
Arc
, couldn’t fix this for me.In the end, I needed the ouroboros crate, which does some
unsafe
stuff to allow this to work. But in banging my head against the rust compiler trying to do it myself, I think I learned a lot — in particular, I (mostly) understood why I needed something more fancy than I could build myself.Various other little things, like making the metrics atomics.
Overall this turned into a +532 −212
oddessy — quite a chunk in a 6,692
lines-of-rust project. The fact I had to throw away my first attempt was a big
chunk of the reason I took a bit of a break from toykv after it was done.
Other toykv improvements
I did several other things before I took a break from toykv:
Use a min-heap for MergeIterator · Pull Request #20.
I was really pleased with this rewrite of my MergeIterator to use a min-heap approach, which is the “textbook implementation” of this kind of thing in a database. It gave a 10-15% speed up for my simple 100-item scan benchmark.
Basic frozen memtable implementation · Pull Request #21
This implemented an active and frozen memtable pair. Eventually, this is designed to decrease write stalls by having the frozen memtable written out to an on-disk sstable in the background while the active memtable still accepts writes.
Again, I wrote an awful first version, where I tried to go fully async from the start and tied myself in knots. So the second version is simpler, but still needs expanding into asysc to get the intended speedup. One issue here is that I haven’t gotten to grips with async testing in Rust yet.
The frozen memtable is a good piece of practice work towards compaction, so I guess I can say I did something towards that 😅
Push down upper bounds check to table iterators · Pull Request #23.
This was a nice little refactoring that removed an ugly wart from the MergeIterator. Previously, the MergeIterator was able to push down the starting bound to the underlying table iterators, but implemented the ending bound itself. This PR updated the table iterators to support the ending bound too, removing an odd implementation choice and making the underlying iterators handle both bounds.
ai-codeexplorer
I didn’t extend the functionality of ai-codeexplorer, instead I experimented with changing the user experience. I think I like it.
I used the textual library, which is a TUI (terminal UI) toolkit written in
python. The rich library I previously used for ai-codeexplorer is a part of
textual
. While rich
gives nice output from a terminal application, textual
gives you a full UI experience. Even scrollbars and buttons!
Anyway, so textual
allows for a fuller terminal experience for
ai-codeexplorer. It fills the terminal and has a bunch of affordances, such as
having tool calls in foldable sections.
I actually think it looks a little uglier than it did before — I was quite
attached to the way rich
rendered things — but I think that the UX feels
better in use. I was able to easily add a “chat” style UI where you can make
further prompts after the model’s turn has finished.
It’s not yet merged as it has a lot of rough edges — I’m only really just starting to get to grips with textual’s event model and ways to break code apart into self-contained units. In the UI itself, the tool call foldables also don’t feel very well spaced against the model commentry around them. The chatbox at the bottom also only displays input on a single line, that needs fixing. The Stop button doesn’t work most of the time. So, yes, quite broken really!
Beyond the UI overhaul, I still think ai-codeexplorer is missing a few key tools:
grep
to allow the AI model to search code. Without that, it wastes quite a lot of tokens reading files.glob_files
to allow finding and/or reading files by glob. Useful for reading all the files in a given package/module/crate.git
to allow the AI model to evaluate diffs and make commits. I’d like to have ai-codeexplorer have some kind of built in “review my diff” functionality.- Perhaps fetching web pages, to allow you to ask for how to do things with unfamilar libraries.
- I’m quite reluctant to add a generic
execute
in terminal style command, although I think it would immediately expand the AI’s capabilities. I think I’d be happier to do this in a container, and I wonder about apple/container for this, with a specially built container containing dev tools only, and no network capability.
I wrote down this list because when I’ve done this in previous months, I’ve ended up coding what I wrote down. Not sure that trick will work this time!
Zed has a nice selection of tools, quite simple but enough to get things done.