Building on the Comparative Copyfitting Factor

jackward · February 25, 2023, 11:10pm

After reading “the Comparative Copyfitting Factor” last year, I set up a similar metric in the browser. It’s based on word frequency and captures differences in kerning, punctuation, and capital letters.

I’ve also normalized by cap height, which felt fair at the time but does sacrifice the “how long will my 12pt essay be?” question. More than that, though, I regret giving it a domain and not putting it under an existing one :)

Love your work and this gorgeous forum styling!

https://copyfitting.club

mbutterick · February 27, 2023, 10:08pm

Thank you. I love this. Are you reading the metrics from live fonts, or is the information somehow cached & preloaded?

jackward · February 27, 2023, 10:44pm

Thanks! I render the font live to a hidden high-res <canvas> and read metrics off the TextMetrics interface — you’ll notice some lag in the “average sentence width” section as the browser churns though 100,000 renders. The “Other” option can measure any local font that your browser’s font-family CSS prop recognizes.

mbutterick · February 27, 2023, 11:01pm

Your technique is therefore computationally much costlier than mine, which requires looking up the widths of only 27 characters. What is the comparative advantage of the “100,000 renders”?

jackward · February 27, 2023, 11:27pm

Yeah, I like your solution! Simple to explain in a sentence, easy to execute without depending on opaque software.

Using renders of the top 100,000 tokens captures differences in kerning, punctuation, and capital letters, which can be a nice upgrade but may not be worth the effort since it doesn’t move the needle a ton.

One way to potentially speed this up: just measure diagraphs, of which there are only 95^2 for printable ASCII. But I couldn’t find good digraph frequency data and didn’t want to spend the compute on doing the counting myself.

mbutterick · February 28, 2023, 3:00pm

As with my alphabet test for judging line length, I like the idea that this copyfitting factor could be reduced to something that is not only computationally simpler but that a writer could accomplish by hand in a typesetting program.

You (in the indefinite sense of “someone”—I am not assigning homework) could use the 100K test or something like it as the ground truth. Then, as you simplify the technique, you could see how much accuracy is lost.

Because fonts themselves have a lot of internal measurement consistency, I’d suppose you don’t even need to measure 27 characters. You could figure out based on statistical variance of width and statistical occurrence what each letter’s contribution to the overall copyfitting factor is and drop the ones that are duplicative.

That is, I wonder if this whole computation could be expressed by a short string like “The Shelf”.