Creative Commons announces tentative support for AI ‘pay-to-crawl’ systems

Earlier this year, the nonprofit Creative Commons announced a framework for an open AI ecosystem. Now, the organization has come out in favor of “pay-to-crawl” technology. This is a proposed system to automate compensation for website content when it is accessed by machines like AI web crawlers.

Creative Commons is best known for spearheading the licensing movement that allows creators to share their works while retaining copyright. In July, the organization announced a plan to provide a legal and technical framework for dataset sharing between companies that control data and the AI providers that want to train on it. Regarding pay-to-crawl, the nonprofit is tentatively backing the idea, stating it is “cautiously supportive.”

The organization said that implemented responsibly, pay-to-crawl could represent a way for websites to sustain the creation and sharing of their content. It could help manage substitutive uses and keep content publicly accessible where it might otherwise not be shared or would disappear behind even more restrictive paywalls.

Spearheaded by companies like Cloudflare, the idea behind pay-to-crawl would be to charge AI bots every time they scrape a site to collect its content for model training and updates. In the past, websites freely allowed web crawlers to index their content for inclusion into search engines like Google. They benefited from this arrangement by seeing their sites listed in search results, which drove visitors and clicks.

With AI technology, however, the dynamic has shifted. After a consumer gets an answer via an AI chatbot, they are unlikely to click through to the source. This shift has already been devastating for publishers by killing search traffic, and it shows no sign of letting up.

A pay-to-crawl system could help publishers recover from the hit AI has had on their bottom line. Plus, it could work better for smaller web publishers that do not have the pull to negotiate one-off content deals with AI providers. Major deals have already been struck between companies like OpenAI and Condé Nast, Axel Springer, and others; as well as between Perplexity and Gannett; Amazon and The New York Times; and Meta and various media publishers.

Creative Commons offered several caveats to its support for pay-to-crawl, noting that such systems could concentrate power on the web. It could also potentially block access to content for researchers, nonprofits, cultural heritage institutions, educators, and other actors working in the public interest.

It suggested a series of principles for responsible pay-to-crawl. These include not making pay-to-crawl a default setting for all websites and avoiding blanket rules for the web. In addition, it said that pay-to-crawl systems should allow for throttling, not just blocking, and should preserve public interest access. They should also be open, interoperable, and built with standardized components.

Cloudflare is not the only company investing in the pay-to-crawl space. Microsoft is also building an AI marketplace for publishers, and smaller startups like ProRata.ai and TollBit have started to do so as well.

Another group called the RSL Collective announced its own specification for a new standard called Really Simple Licensing that would dictate what parts of a website crawlers could access but would stop short of actually blocking the crawlers. Cloudflare, Akamai, and Fastly have since adopted this standard, which is backed by Yahoo, Ziff Davis, O’Reilly Media, and others.

Creative Commons was also among those who announced its support for this Really Simple Licensing standard, alongside its broader CC Signals project to develop technology and tools for the AI era.