The best guide to spotting AI writing comes from Wikipedia

We have all felt the creeping suspicion that something we are reading was written by a large language model, but it is remarkably difficult to pin down. For a few months last year, many became convinced that specific words could give models away, but the evidence is thin. As models have grown more sophisticated, the telltale words have become harder to trace.

It turns out the editors at Wikipedia have become quite skilled at flagging AI-written prose. Their public guide to the signs of AI writing is the best resource for confirming whether your suspicions are correct. Since 2023, Wikipedia editors have been working to manage AI submissions through an initiative called Project AI Cleanup. With millions of edits arriving each day, they have ample material to study. In classic Wikipedia style, the group has produced a field guide that is both detailed and evidence-based.

The guide confirms what many already suspect: automated detection tools are essentially useless. Instead, the guide focuses on writing habits and phrases that are rare on Wikipedia but common across the wider internet, and therefore common in a model’s training data. According to the guide, AI submissions often spend a lot of time emphasizing why a subject is important, usually with generic phrases like “a pivotal moment” or “a broader movement.” AI models also tend to detail minor media appearances to make a subject seem notable, which is more typical of a personal biography than an independent source.

The guide points out a particular quirk involving clauses that end with hazy claims of importance. Models will say an event is “emphasizing the significance” of something or “reflecting the continued relevance” of a general idea. This is a bit hard to define, but once you recognize it, you start to see it everywhere.

There is also a tendency toward vague marketing language, which is extremely common online. Landscapes are always scenic, views are always breathtaking, and everything is described as clean and modern. As the editors note, it often sounds more like the transcript of a television commercial.

The guide is worth reading in its entirety. Before encountering it, I would have said that AI prose evolves too quickly to define. However, the habits identified are deeply embedded in how AI models are trained and used. They can be disguised, but it will be difficult to eliminate them completely. If the general public becomes more skilled at identifying AI prose, it could lead to all sorts of interesting consequences.