Digital Publishing 101 has a thorough post explaining how Amazon’s ranking algorithms work. For self-publishers keen to game the situation this sort of information is no doubt invaluable. One point they make is that as Amazon’s bestseller rankings are based on unit sales rather than dollar sales, it makes sense to start your e-book out cheap, and raise the price once it’s established. Readers in the Know has similar post encouraging self-publishers to maximize their sales.
I find it a little hard to understand why people care about ranking algorithms. Obviously there’s a minor interest in which book has sold the most copies, but “algorithm” is surely a bit of a fancy label for a process of adding 1 + 1 + 1 + 1 . . . Wikipedia provides a bit of the mathematics involved in ranking, for those with a curiosity gene.
I suppose there are people out there who want to read the same book that everyone else is reading, and for them the sales rank must have some significance. An algorithm designed to figure out what books would, in suitability rank, be the “best” books for you to read next, just seems like something a bored computer programmer would get up to in an idle moment. You really need a computer to tell you what you want to read?
The Digital Reader of 20 November 2014 brought us a story of a ranking algorithm of the less sales-oriented kind.
Machine-Learning Algorithm Can Rank the World’s Most Notable Authors, But Can it Identify the Most Worthwhile?
If it’s possible to judge an author’s notability based on their Wikipedia entry then Dr Allen Riddell of Dartmouth College has you covered.
Earlier this month Riddell published a paper which laid out his algorithm for generating an independent ranking of notable authors for a given year. he developed it with the goal of helping Project Gutenberg and other digitization projects focus on digitizing the public domain works of the most notable authors.
According to MIT Technology Review:
Riddell’s approach is to look at what kind of public domain content the world has focused on in the past and then use this as a guide to find content that people are likely to focus on in the future. For this he uses a machine-learning algorithm to mine two databases. The first is a list of over a million online books in the public domain maintained by the University of Pennsylvania. The second is Wikipedia.
Riddell’s begins with the Wikipedia entries of all authors in the English language edition—more than a million of them. His algorithm extracts information such as the article length, article age, estimated views per day, time elapsed since last revision, and so on.
The algorithm then takes the list of all authors on the online book database and looks for a correlation between the biographical details on Wikipedia and the existence of a digital edition in the public domain.
The article goes on to say that the algorithm can also rank authors by specific categories of interest, and not just a broad ranking across the calendar year in which an author died. For example, the top-ranked female American writer is Terri Windling, the top-ranked Dutch poet, Harry Mulisch, and the top-ranked President of France is Charles de Gaulle.
You can find Riddell’s website here, and his paper here (PDF).
This is a good idea, but even though Riddell says his ranking system compares well with existing rankings compiled by human experts, I still want to see a human hand in this decision.
Sometimes notability isn’t the best way to judge an author’s value. I was reminded of that point by one of the stories in this morning’s link post. The Boston Globe profiled a small publisher who had, over the course of his career, published two Nobel prize winners:
Boston publisher David Godine likes to say he specializes in books nobody buys, and that includes the works of French writer Patrick Modiano, whose novels about memory and war earned him the 2014 Nobel Prize for Literature.
Godine found Modiano by “asking European publishers to recommend their best writers — not their best-selling writers”. Modiano was relatively unknown in English before he won the Nobel Prize, and even though he has a sizable Wikipedia entry he still stands as a reminder that the obscure can be worth more than the notable.
An author who died in obscurity 50 years ago might only be known to scholars and not have a lengthy Wikipedia entry, but might have written Nobel-worthy work. But you might not know that without asking an expert, which is why I think the human touch is still required.
What do you think?
Should I be interested in the fact that America’s top-ranked female writer is Terri Windling (1958 – ), not say Emily Dickinson, Edith Wharton or some other piker like that? Maybe the algorithm included a requirement that she be alive. Not sure what to make of the discovery that Harry Mulisch (1927-2010) is the top-ranked Dutch poet. He did do a few poetry books, but published tons of novels, a few of which appear to be available in English translation. But ranking is just ranking. It doesn’t have to mean anything more.
Faced with help like this in figuring out what the “best” of everything is, one wonders how soon we’ll get an app that reads the book on our behalf and feeds back to our brain what it is we ought to think about it. In a way, come to think of it, I guess ranking algorithms already get pretty close to this — after all why does the machine have to bother with reading the book. It can just tell us what we “think” straightaway. Then we won’t have to worry about “the best”, “the most notable”, “the most important” any longer, and can spend all our time dozing on the couch in front of a television with the sound turned down — ‘cos who wants to bother with the effort of listening?