Link
Links: October 2023

  • Butterick’s Practical Typography

    Good writing is enhanced by good typography. It helps the reader’s eyes efficiently serve up the words to conscious attention. I enjoyed the writing here, as well as the advice.

  • Age of Invention: Does History have a Replication Crisis?

    A different take in a similar vein to last month’s link to a critique on the effectiveness of peer review.

    I’ve become increasingly worried that science’s replication crises might pale in comparison to what happens all the time in history, which is not just a replication crisis but a reproducibility crisis. Replication is when you can repeat an experiment with new data or new materials and get the same result. Reproducibility is when you use exactly the same evidence as another person and still get the same result — so it has a much, much lower bar for success, which is what makes the lack of it in history all the more worrying.

  • Designing a New Old Home

    I loved this series of articles on designing and building a new house that feels like it’s been part of the landscape for many years.

    We wanted to give attention to aesthetic and functional features that are absent in many newer homes, such as thoughtful use of light, space, ventilation, and some passive cooling. It seemed like it would be possible to design something more attractive and with fewer complications than most new homes. We did not have a large budget, certainly not a typical “custom house” budget, but we figured that with some careful tradeoffs, and by doing some of the finishing ourselves, we could build something modestly beautiful.

Post
What Is an Inverted Index?

I’d always been slightly confused by the term “inverted index”. The term comes from the world of text search. In that domain, a “forward” index is used to look up a document and retrieve its words. The index that goes the other way — from the words to the documents — is an inversion of that forward index, so the practitioners in that environment called it an inverted index.

But in databases, all indexes are pointing in that direction. When a database indexes a field, what happens is the database creates a large ordered mapping of all the values of the field to the original row they came from — which is identical to the “words (field value) to documents (row)” direction in text search.

So why don’t we call all indexes inverted in database world? In the end, I don’t know why that meaning didn’t get adopted for all indexes in databases, and this distinction between “normal” and “inverted” indexes has been niggling at me for years. A few days ago I decided to try to figure it out once and for all, so I wouldn’t be sitting on the sidelines wondering whether I was just stupid, as people made grand statements about inverted indexes being a perfect solution to this or that problem.

So anyway, what I found was that you don’t need a huge brain to comprehend the difference. It seems that database people just took the phrase from text search but kind of did it wrong, because they use it to talk about a different property of the index than the text search people do. I think this was the root of my confusion.

In the database world, instead of using “inverted” to indicate whether the index is doc -> words (forwards) or word -> docs (inverted) like text search people do, database people use the term “inverted” to refer to indexes that break up the indexed value to generate one or more keys before indexing it. Text search inverted indexes break up the document into words before indexing the separate words, and this “breaking up” meaning was taken into the database world, rather than the direction meaning.

This seems to be all it is: inverted indexes are those that take apart a field’s value to make multiple values, and then use each of those new values as a key in the index.

Unlike text search, where you pretty much only break into words, in a database, you could do this in many ways. For example:

  • Like text search, break up a value into words and use the words as keys.
  • For an array, index each value in the array as a key.
  • For a JSON document, index each path and its value as a key.

I think it took me longer than most to get this distinction because I work on CouchDB. CouchDB’s normal index type, views, are pretty much inverted by this definition, because, to index a document, you transform the document into one or more keys to index using JavaScript functions written to the database. So it’s very natural to, for example, emit a key for each entry in an array. As CouchDB just calls these “indexes”, I figured there must be some other magic to inverted indexes.

(It turns out, of course, that there can be. As an example, Lucene is an advanced text search library based on inverted indexes. It has all sorts of tricks up its sleeves to make a search for documents containing “hat AND cat AND mat” super-fast by doing fancy things when intersecting the documents associated with each of hat, cat and mat. But I don’t think that’s a fundamental of inverted indexes for the purposes of this discussion.)

Many thanks to CockroachDB’s great guide-level explanation of inverted indexes and this discussion of filtering with inverted indexes for prompting this realization. In part, because the filtering with inverted indexes article talked about inverted indexes like they were obviously different, which set me on a quest to find out just what the distinguishing factor was.

Link
Worst Practices of Software Development

Nothing earth-shattering, but I enjoyed Worst Practices of Software Development.

Brett Slatkin, author of Effective Python, briefly interviews several famous faces of software development — like Julia Evans and Mitchell Hashimoto — about a “worst practice” that they are happy to embody. It’s good to have people go on record to remind everyone that they need not do all — or even most of — the “right things” to be successful.

Post
Journal: watchOS 10

Apple flagged watchOS 10 as “the biggest update since the introduction of Apple Watch”. And it is quite a large set of changes. This is a quick post about how one change made a big difference to the day-to-day UX of my watch.

Read More…

Post
Big Trees of Bristol

Just south of Bristol, next to the small village of Failand, is a patch of woodland. We were surprised to find that in the middle of this woodland is “Big Tree Grove”, a stand of ten or so sequoia trees (“Giant Redwoods”). While these are not so giant as those in California, they are still impressive to be amongst.