The Oxford English Dictionary defines hapax legomenon as “a word or word form which is recorded only once in a text, in the work of a particular author, or in a body of literature.” It comes from the work of Biblical scholars, perhaps unsurprisingly: obviously the smaller the corpus of a particular language, the more likely a hapax legomenon is to appear. The significance of a hapax legomenon is probably greater at the level of the individual author’s output, be it book or total corpus (though I find it hard to grant it much significance at any level). At the level of the whole language, while it might seem initially more exciting, it ends up being much ado about nothing: but of course that’s the level that commentators prefer to focus on, because superficially it looks like it ought to be meaningful.

Atlas Obscura has a piece looking mainly at classical literature, primarily Petronius’ Satyricon, hiding place apparently for several hapax legomena.

I guess scholars love to count stuff. We even have terms for two, three and four-time occurring words, dis legomenon, tris legomenon, and tetrakis legomenon. Who knew? It would of course be neat if tetrakis legomenon only occurred four times in English, but I think the internet has killed any chance of that.

The existence of hapax legomena is apparently mandated by Zipf’s law.* To me, cynically, they would also seem to be mandated by human fallibility — many a unique usage resulting no doubt from copying errors, typos, and inadvertent misspellings. Certainly we didn’t wait to start making transcription errors till after the development of print.

To suggest that James Joyce liked to sprinkle his work with hapax legomena seems rather trivial to me: and highly unlikely. Avant la lettre you can’t ever be sure a hapax legonenon will remain a hapax legomenon. If the nature of your enterprise is to twist orthography and phonology into new and suggestive vocabulary, à la Finnegans Wake, it would seem that originating hapax legomena would be the last thing on your mind. Make up your own words and it’ll not be amazing that nobody else ever uses them again: the amazing bit would be when people actually do pick up one of your neologisms.

Does Dr Johnson’s foupe count as a hapax legomenon (or actually a dis legomenon I suppose), or is it just an error? The OED does in fact contain the word, defining it as “Error for soupe (see swoop 2b) through misprint of f for ſ. Swoop in sense 2b, though now obsolete, means to utter forcibly. Although curlers (is that what people who engage in the sport of curling are called?) may utter it forcibly, when they shout “Soop, soop” they are in fact encouraging their colleagues to sweep the ice; soop being Scottish for to sweep.


* To go to the other extreme Yule-Simon distribution is apparently in part a realization of Zipf’s law. It looks like this:

{\displaystyle f(k;\rho )\approx {\frac {\rho \Gamma (\rho +1)}{k^{\rho +1}}}\propto {\frac {1}{k^{\rho +1}}}.}

Solving this will apparently display to you k, the probability that any two words selected at random in any body of text will be identical. Such matters are the domain of stylostatistics.

Wikipedia will tell you more, if more you need.