Link: Kindle Typography
Inspired by the insipid text rendering on Kindle, Kevin Lynagh implements TeX’s layout algorithm in javascript in Webkit. While the implementation is impressive, I confess myself more curious about the TeX paragraph layout algorithm itself.
I knew TeX had a clever layout algorithm which obviously considered paragraphs as a whole during layout, rather than lines individually. This much is clear when considering the beauty of TeX-set documents compared to a typical word processor-set document. Like many things in computer science, the solution is both devilishly clever and fiendishly simple:
The legendary Donald Knuth designed with Michael Plass a more advanced line breaking algorithm in the early 80’s. Their algorithm considers the paragraph as a whole when choosing the line breaks. It does this by considering the “badness” of individual lines and the consequences of breaks on the badness of subsequent lines. Furthermore, it allows but penalizes hyphenation (especially if a hyphenation results in consecutive hyphenated lines).
K&P models a paragraph as a sequence of three kinds of object: boxes, glue, and penalties. Roughly, boxes are objects to be typeset (individual characters, pictures, mathematics, &c.), glue is white space, and penalties are potential line breaks with an associated aesthetic cost. Boxes each have their own immutable width, determined by the content inside. Glue items have their own natural width, which can be adjusted via (potentially infinite) stretchability y or (non-infinite) shrinkability z parameters. Penalties items have width only if they are chosen as break points (a soft hyphen, for instance, has width only when the word is broken there and the hyphen must be drawn). Line breaks occur only at penalties or at glue following a box.
By the appropriate distribution of glue and penalties, this model handles in a unified way justified, centered, and ragged left/right paragraph alignments, as well as non-rectangular paragraph shapes and hanging punctuation. The Knuth and Plass algorithm itself uses dynamic programming to find the optimal series of breaks and line stretch/shrink ratios.