The dictionary sues OpenAI

Encyclopedia Britannica and Merriam-Webster have filed a lawsuit against OpenAI. The publisher alleges that the AI company has committed massive copyright infringement. Britannica, which owns Merriam-Webster, holds the copyright to nearly 100,000 online articles. The lawsuit claims these articles were scraped and used to train OpenAI’s large language models without permission.

Britannica further accuses OpenAI of violating copyright laws in two specific ways. First, when its models generate outputs that contain full or partial verbatim reproductions of Britannica’s content. Second, when the company uses Britannica’s articles within ChatGPT’s retrieval augmented generation workflow. This tool allows the language model to scan the web or other databases for updated information when answering a query.

The publisher also alleges that OpenAI violates the Lanham Act, a trademark statute. This claim centers on instances where ChatGPT generates made-up hallucinations and falsely attributes them to Britannica. The lawsuit states that ChatGPT starves web publishers of revenue by generating responses that substitute for and directly compete with their content. It also argues that these hallucinations jeopardize the public’s access to high-quality and trustworthy online information.

Britannica joins a growing list of publishers and writers pursuing legal action against OpenAI over copyright issues. Other plaintiffs include The New York Times, Ziff Davis, and more than a dozen newspapers across the United States and Canada. These newspapers include the Chicago Tribune, the Denver Post, the Sun-Sentinel, the Toronto Star, and the Canadian Broadcasting Corporation. A similar lawsuit filed by Britannica against the AI company Perplexity remains pending.

There is not yet a strong legal precedent establishing whether using copyrighted content to train a large language model constitutes infringement. However, in one related case involving Anthropic, federal judge William Alsup ruled that using content as training data can be a transformative and legal use. Yet in that same case, Alsup argued that Anthropic did violate the law by illegally downloading millions of books rather than paying for them. This resulted in a 1.5 billion dollar class action settlement for the impacted writers.

OpenAI did not respond to a request for comment prior to publication.