Archives for category: Writing

Shady Characters #100 reports that Lan­guage Hat has announced that punc­tu­ation has been found to be “math­em­at­ical”. Shady Char­ac­ters itself had previously shown us that “the dis­tri­bu­tion of the oc­cur­rence of punc­tu­ation marks in Eng­lish ap­pears to obey an in­verse power law (that is, the most com­mon mark, the comma, oc­curs about twice as fre­quently as the full stop, which oc­curs about twice as fre­quently as the double quo­ta­tion mark, and so on)”. It seems that T. Stan­isz, S. Drozdz, and J. Kwapien the au­thors of Uni­ver­sal versus sys­tem-spe­cific fea­tures of punc­tu­ation us­age pat­terns in ma­jor West­ern lan­guages have been focussed more on how punc­tu­ation oc­curs in dif­fer­ent lan­guages. “The results here are clear: the language characterized by the lowest propensity to use punctuation is English, with Spanish not far behind; Slavic languages proved to be the most punctuation-dependent.” Such earth-shattering analysis is bound to give us pause — and a pause that’s longer than a comma’s breath.

But wait for it — Stan­isz et al “find that the dis­tances between con­sec­ut­ive punc­tu­ation marks fol­lows the Weibull dis­tri­bu­tion — a curve some­times used to model time between fail­ures or mor­tal­ity rates — and that all of the lan­guages ex­amined had sim­ilar dis­tri­bu­tions de­scrib­ing the oc­cur­rence and or­der­ing of dif­fer­ent marks.” Try fighting against them laws!

Of course, anything you analyze mathematically will tend to turn out to be mathematical. One hears enthusiastic statistical biologists claim that the key to all life is mathematically governed. If you render all your data as numerical quantities your answer will turn out, mirabile dictu, to be a number. You might as well claim that, because many theorists use words to discuss such matters, the key to life is semantics. Mathematics is essentially just another language: where it might fit on a Weibull distribution curve I’m not sure.

_____________

* This we covered a few years back under the title Zipf’s law.

Elizabeth Eisenstein in her magnum opus The Printing Press as an Agent of Change (CUP, 1979) tells us “A 1604 edition of an English dictionary notes at the outset that ‘the reader must learne the alphabet, to wit: the order of the letters as they stand’.” Quite late in the day for what’s such an obvious, natural system to us. Actually, since the fifteenth century and the “invention”of printing children had been being taught by ABC books, so quite a lot of people must have by then been used to alphabetical order. Maybe this dictionary editor was just being overly cautious. But before paper and printing became established it would seem that alphabetical order was a concept struggling to become established: a thirteenth-century Genoese encyclopedia compiler would write “‘Amo’ comes before ‘bibo’ because ‘a’ is the first letter of the former and ‘b’ is the first letter of the latter and ‘a’ comes before ‘b’ . . . by the grace of God working in me , I have devised this order.” He exaggerates of course: it is believed from fragmentary evidence that the contents of the great library of Alexandria (founded in the 3rd century BC) were catalogued in alphabetical order, but it does seem that the ordering system we know and love was not widely or universally used before the fifteenth century.

Other methods of ordering were almost mandated by religious beliefs — for instance “Angel” couldn’t come before “Christ” as the one was axiomatically superior to the other.

Alphabetical scripts, where conventional symbols represent sounds, date back three or four thousand years: here’s a chart illustrating that evolution —

Clicking on this image will enlarge it a bit, but if you want to be able to read this chart you can obtain a copy for $19.95 from its creator, Useful Charts.

Writing was apparently invented by the Egyptian god Theuth. He neglected to come up with the idea of alphabetical order though.

Printed books made it possible for scholars to roam larger fields of knowledge than had ever before been possible. In that there is an obvious analogy for LLMs*, which trained on a given corpus of knowledge can derive all manner of things from it. But there was more to the acquisition of books than mere knowledge. 

Just over a century after Gutenberg’s press began its clattering Michel de Montaigne, a French aristocrat, had been able to amass a personal library of some 1,500 books—something unimaginable for an individual of any earlier European generation. The library gave him more than knowledge. It gave him friends. “When I am attacked by gloomy thoughts,” he wrote, “nothing helps me so much as running to my books. They quickly absorb me and banish the clouds from my mind.” 

And the idea of the book gave him a way of being himself no one had previously explored: to put himself between covers. “Reader,” he warned in the preface to his Essays, “I myself am the matter of my book.” The mass production of books allowed them to become peculiarly personal; it was possible to write a book about nothing more, or less, than yourself, and the person that your reading of other books had made you. Books produced authors.

As a way of presenting knowledge, LLMs promise to take both the practical and personal side of books further, in some cases abolishing them altogether. An obvious application of the technology is to turn bodies of knowledge into subject matter for chatbots. Rather than reading a corpus of text, you will question an entity trained on it and get responses based on what the text says. Why turn pages when you can interrogate a work as a whole?

Everyone and everything now seems to be pursuing such fine-tuned models as ways of providing access to knowledge. Bloomberg, a media company, is working on BloombergGPT, a model for financial information. There are early versions of a QuranGPT and a BibleGPT; can a puffer-jacketed PontiffGPT be far behind? Meanwhile several startups are offering services that turn all the documents on a user’s hard disk, or in their bit of the cloud, into a resource for conversational consultation. Many early adopters are already using chatbots as sounding boards. “It’s like a knowledgeable colleague you can always talk to,” explains Jack Clark of Anthropic, an LLM-making startup. 

Thus The Economist in an issue devoted to the exploration of the implications of AI.

Change is a-coming — as if we really need to be reminded of that fact — but it doesn’t have to be disastrous change. Stephen Hawking may warn of the end of humanity, and of course humanity will one day end (which’ll no doubt be a blessed relief for the rest of creation!) but if it comes about because a chatbot instructs some idiot on how to construct a doomsday machine, well, if you’re the sort of idiot who’d like to construct a doomsaday machine then you can probably already get instructions on how to do it without relying on AI to tell you.

On a slightly less apocalyptic scale publishers continue to wonder about how they will end up responding to the output of ChatGPT and its cousins. Just saying “No” is likely to be as unsuccessful in this matter as it has been in others. The effect on education and research could conceivably be more dramatic: if it’s all available at the click of a mouse why spend years learning it? If Montaigne found that “the idea of the book gave him a way of being himself no one had previously explored” should we not expect that ChatGPT and other LLMs will give some of us new ways of being ourselves?

For the previous steps in this evolution see From script to print.

____________________

* LLM stands for Large Language Model. Basically this just means that the program in question was not trained by being fed a large corpus of selected texts, but on a really, really large corpus of all sorts of text. LLMs took over from Natural Language Processing which involved training models for specific tasks by using specialized supervised text inputs.

While we are at initials, GPT (as in ChatGPT) stands for Generative Pre-trained Transformer.

Response of James Tabor, public notary, July 10, 1604, in Henry Cotton vs. William Windle. Cambridge University Archives. See Cambridge book theft.

.

I give up. Actually I’d given up before I ever started. I just know I’m happy to let the enthusiasts do the deciphering of all these old handwritten communications. However for those who actually feel compelled to rise to the challenge here is the Folger Shakespeare Library’s Alphabet Book complied by Heather Wolfe, Curator of Manuscripts.

Samples of letters in Secretary hand from the Alphabet Book.

William Davis recounts the development of his own paleographic skills, while introducing us to the themes in a collection of sixteenth century letters held at the Museum.

I never thought of it this way, but are AI text-generators like ChatGPT really just plagiarism machines? They memorize everything ever written about a subject and then regurgitate it on request with perhaps a bit of rearrangement and the change a few words to make the thing read more slickly. (Let us leave aside their eager-to-please tendency to make up convincing sounding evidence so as to answer a question as fully as possible.) If there were a human being who could memorize everything ever written about say corona viruses and they were then to write out selections in response to enquiries about particular aspects of the subject, we would presumably judge that to be nothing more than copying. Of course such a phenomenon of memory cannot exist (well, it can, but as an AI bot, just not as a human).

Reflection on this topic is provoked by Plagiarism Today‘s post Is Plagiarism a Feature of AI?.

The Copyright office signals an unwillingness to see as copyrightable any work created by AI. (An author must be human.) Authors’ organizations push against AI by claiming that by memorizing copyright works these machines are violating copyright, or to put it another way, before they consume these works the bots should get permission from the copyright holder (— which their organizations propose should not be forthcoming). Of course for a computer “memorization” actually consists in storing an accessible copy. But, using an anthropocentric definition of “memorize”, as we tend to in these discussions, objecting to the reading of your work does on the face of it seem a bit illogical. A human reader doesn’t need permission from the copyright holder (or the permission grant is assumed in the purchase of access to the writing) in order to read a book, or even to memorize it — as of course a few have managed to do. Homer certainly remembered well, and lots of his audiences must have kept large chunks in their memory, as no doubt do a few moderns. No doubt there are lots of people who can recite from memory the entirety of T. S. Eliot’s The Wasteland, but the vigilant Eliot estate is not knocking on their door whenever they break silence. A professor of ancient philosophy has no doubt read everything written by the Greeks, and while they may not be able to recite it all, can point to this, that, or the other location for support for an argument they are making. Of course you’ve got to want to do it, but I often wonder whether the difference between the professor and the person sweeping out the lecture hall is anything more than differences in the efficiency of their memory.

If an academic draws on a couple of hundred sources in writing their work, the “problem” is dissipated by their citing their sources. Indeed the very idea of an academic work which referenced no sources at all is a contradiction in terms: academic work is of necessity a development from previous academic work. Shoulders are always being stood on. An academic treatise with no references would count as a polemic, not as an academic monograph. So, might AI be able to get away with it and become copyrightable by citing all the works it looked at in order to come up with its text? Of course there would be millions of citations, so they’d have to be available only in response to enquiry — a book which was 99% bibliographical references would be at least an unwieldy proposition. Of course if this sort of policy were to be adopted, we’d next run into the difficulty of permissions from works used more than tangentially, which just takes you further down a rather stultifying rabbit hole.

What I go on to wonder is whether the inability to remember everything is in some fairly fundamental way, a requirement for originality as an author eligible for copyright protection. Actually it goes further than that — the fact that we all forget things could be regarded as a requirement for the very existence of the job of writer: after all, if we had all memorized all that had ever been written about corona viruses, or ancient Greek philosophy why would we need anyone to regurgitate it for us? To ascend to even higher meta levels: might we not have to think of books as extensions of our memory? Certainly I can “remember” a lot more nowadays, when I am able to store most of it in Wikipedia.

See also AI and copyright.

We seem to be racing down the slope leading to our never being able to trust anything we read online. Of course we’ve always known, haven’t we, that much of what we read in print may not be exactly true, but the move into the digital world has vastly increased the potential for this nonsense. Last year The Scholarly Kitchen, in the person of Rick Anderson, described a rather formalized scam:

  • The Company would supply me with scholarly books in my area of expertise.
  • I would write reviews of the books.
  • The company would provide me the name of a “co-author.” The Company’s preference is that this person be listed as as the sole author of the review, but if I insisted I could be listed as the second (and corresponding) author.
  • For each ghostwritten review I succeeded at publishing under that person’s name in a Web of Science-indexed journal, I would be paid an honorarium of $800.

In a way it’s a bit surprising that “The Company” bothers with books and real reviews of the books —what after all’s so wrong with purely fictitious reviews of purely fictitious works — but I guess their business model is to bolster academics who need to show that they have managed to get published, even if it’s only as a book reviewer. A fee of $800 sounds like there must be quite a bit of desperation out there.

I wonder if “The Company” is still making this offer to ghostwriters — one might think that an AI-powered chatbot might be able to write a nice fictitious review for a good deal less than $800.

See also Sock puppetry.

In a couple of weeks NJPAC (sounds like a conservative political organization, but actually means New Jersey Performing Arts Center — in downtown Newark) is putting on a local-boy Philip Roth festival entitled Philip Roth Unbound. There are readings, discussions, a comedy set, a theatrical performance, a bus tour. Most events cost $10 though the bigger theatrical ones will cost you $99.

“Philip Roth won the Pulitzer Prize for American Pastoral in 1997, the National Medal of Arts at the White House in 1998, and, in 2002, the highest award of the American Academy of Arts and Letters, the Gold Medal in Fiction. Roth received two National Book Awards and two National Book Critics Circle Awards, and won three PEN/Faulkner Awards. In 2005 The Plot Against America received the Society of American Historians’ Prize for ‘the outstanding historical novel on an American theme for 2003–2004,’ and was later adapted by David Simon for an HBO miniseries. In 2011 he was given the National Humanities Medal by President Barack Obama at the White House, and was later named the fourth recipient of the Man Booker International Prize. Roth died in 2018.”

March 19th would have been the writer’s ninetieth birthday. His complete works are available from the Library of America in ten volumes (nine volumes of novels and one of non-fiction). They also issued a memorial volume which make if clear that one of Roth’s most striking features was the intense way in which he’d listen. No doubt a quality no great fiction-writer can be without.

Does this count as the world’s oldest autobiography?

Well not really since it’s not Gimil-Ninkarrak himself who’s written it. Amanda H. Podany has read lots of clay tablets inscribed with cuneiform writing to reconstruct an impressive amount about this barber who lived in Terqa, on banks of the Euphrates, about 3,700 years ago. Her fascinating piece is published by Aeon.

Photo: Louvre Museum, Paris

Most of the records about Gimil-Nankarrak are business records. Above is a contract of sale where he buys Guatum, daughter of some neighbors, presumably as a slave. There are other contracts which have been excavated where girls are bought as adoptive daughters, or as wives, but in this case there’s no indication that this was not a slavery deal.

In a shocking turn of events, books written by the popular language model, ChatGPT, have started appearing on Amazon. The news has left many in the literary world scratching their heads and wondering what this means for the future of writing.

ChatGPT, known for its vast knowledge and ability to generate coherent sentences, has apparently decided to try its hand at book writing. The books, which cover a wide range of topics, from science and technology to literature and history, are gaining popularity among readers who are curious to see what a machine can come up with.

Some have criticized the move as a gimmick, arguing that a machine cannot truly understand human emotions or experiences, and therefore cannot write meaningful stories. However, others have praised the books for their clear and concise writing style, as well as their ability to convey complex information in an easy-to-understand manner.

One reviewer wrote, “I was skeptical at first, but ChatGPT’s book on quantum physics was actually quite insightful. It presented the information in a way that was accessible to the layperson, without dumbing it down too much. I’m impressed!”

Another reviewer was less enthusiastic, stating, “While ChatGPT’s books may be technically accurate, they lack the heart and soul that comes from human experience. It’s like reading a textbook instead of a novel.”

Regardless of the controversy surrounding ChatGPT’s foray into book writing, there is no denying that it is a fascinating development in the world of artificial intelligence.

Written by ChatGPT; published at Fudzilla.

The preceding piece, written by ChatGPT, was not “commissioned” by Making Book but by Fudzilla and comes from their post entitled ChatGPT books flood Amazon written by Nick Farrell. (Link via LitHub.) From here on it’s me writing — hope you can take that on trust.

Mr Farrell detected over 300 ChatGPT-generated books on Amazon on 22 February, which doesn’t seem like a huge number, but is no doubt just the beginning of things to come. It would also be a collection of items where ChatGPT was given some credit — silent bot authorship would be harder (impossible?) to detect.

How bad is this news? Mr Farrell writes “While there is a ton of things wrong with this, the biggest problem is that ChatGPT learns how to write by scanning millions of pages of existing text. So, the software is just correcting other people’s books and plagiarising them.” Not sure I see it that way. After all a person with an eidetic memory would presumably be in an analogous position, yet nobody would claim that Sheldon Cooper’s ability to remember stuff constituted plagiarism. Academics refer to and build on colleagues’ work, producing texts which nobody criticizes as plagiaristic — because academics spend much care and attention to making sure they acknowledge every source (the more the merrier it often seems) in order to bolster every claim they make. [As Wikipedia might say here: “Reference required”.] When it comes to publishing, the key factor is credit, and I believe that an admission that an artificial intelligence program wrote this material would carry with it the implication that your book was included in what the bot used for training. Direct quotation would of course be an instance of copyright infringement, but thoughts and ideas are not copyrightable, nor are words and letters of the alphabet.

ChatGPT and AI in general isn’t intelligent in the way we normally think of intelligence. It doesn’t know anything: it has just memorized a whole lot of text and been taught how to express itself in smooth prose (or verse). It works by figuring out the probability that this or that string of words should/might follow on from some other group of words. It is for this reason that chat bots are just as proud to deliver up slickly expressed lies as they are to give you slickly expressed truth. For them, both are identical: probable/possible word sequences. But, if they are lucky and avoid clangers, bots like ChatGPT can do a job which it’s hard not to call excellent. The example above, while not telling you anything much, does appear utterly plausible.

The Fudzilla subtitle, “Authors that didn’t write books, for readers who can’t read”, is way over the top. People who can’t read aren’t the problem; it’s people who can and do that we need to worry about. If ChatGPT is listed as an author then I’d say there’s no real problem. Caveat emptor governs the sale: and lots of authors write worse that the above paragraphs in blue. The sort of book that Ammaar Reshi published is surely fine ethically and practically — nobody’s being deceived, and nobody’s getting anything other than a perfectly respectable product. Some books might be argued to be of lesser value, but as long as their origin is clearly labelled, nobody suffers. The potential problem of course lies with the “unknown unknowns”. How are we to know this or that book is or isn’t written by AI rather than by a person who may be masquerading as the author? Now, to some extent I’m not sure this really matters either. Another romance by an author you’ve never heard of, a made-up nom-de-plume, — OK, so what? Does it matter whether it’s a machine or a human being, if you enjoyed the book? The real trouble comes with a book pretending to be by a real author who actually had nothing to do with it. This maybe has more in common with a deepfake than with a copyright infringement, but I do think authors and publishers need to get down to doing something about protecting the integrity of an author’s work: maybe by just preempting the deepfake market by doing it yourself, as I suggested recently.