by cloudier

For a small but amusing example of the kind of problem that requires more than “culturomic trajectories”, take a look at Giles Thomas’s post about the systematic OCR substitution of f for long-s:

Why is it that of four swearwords, the one starting with ‘F’ is incredibly popular from 1750 to 1820, then drops out of fashion for 140 years — only appearing again in the 1960s?

Your first thought might be to do with the replacement of robust 18th-century English — the language of Jack Aubrey — with pusillanimous lily-livered Victorian bowdlerism. But the answer is actually much simpler. Check out this set of uses of that f-word from between 1750 and 1755. In every case where it was used, the word was clearly meant to be “suck”. The problem is the old-fashioned “long S“. It’s a myth that our ancestors used “f” where we would use “s”. Instead, they used two different glyphs for the letter “s”. At the end of a word, they used a glyph that looked just like the one we use now, but at the start or in the middle of a word they used a letter that looked pretty much like an “f”, except without the horizontal stroke in the middle.

But to an OCR program like the one Google presumably used to scan their corpus, this “long S” is just an F. Which, um, sucks. Easy to make an afs of yourself…

This is funnier than it should be.

  • for future use: ô_o