Again, most of my spare time was dedicated to AI learning and experimenting:
Thus concludes chapter 4 of Build a Large Language Model (from Scratch). Coding along with the book’s examples, I now have an untrained version of GPT-2, first released in 2019. When fed with “Hello, I am” as a prompt, the untrained model outputs gibberish. This post’s title is taken from that gibberish.
Next comes Chapter 5, which will cover the training that will take us from gibberish to intelligible text. But for this post, I wanted to take the time to capture my thoughts at this point in the book.
Rather than explaining concepts that others have covered better, I’ll share my stream of consciousness about how fascinating and weird it is that this stuff works at all.
Two weeks in and I’ve got through about three and a half chapters from Build a Large Language Model (from scratch). As I suspected, it’s a much more time-consuming — frankly, just harder — read than AI Engineering was. I’ve spent about an hour each night with both the book and a collection of background reading. While challenging, it’s been really fun getting properly into this topic. I look forward to my daily hour of struggle!
I’ve written up a few brief thoughts on what I’ve read so far.
It’s now April, so I can write my journal for March. Overall, I’m not sure whether that’s really the right thing — should I be writing the March journal as March progresses? — but it’s how things are this time around.
March was a second “AI month”:
Let’s talk about each of these projects.
In Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs, the authors find:
We present a surprising result regarding LLMs and alignment. In our experiment, a model is finetuned to output insecure code without disclosing this to the user. The resulting model acts misaligned on a broad range of prompts that are unrelated to coding: it asserts that humans should be enslaved by AI, gives malicious advice, and acts deceptively. Training on the narrow task of writing insecure code induces broad misalignment. We call this emergent misalignment.