Anthropic revises Claude’s ‘Constitution,’ and hints at chatbot consciousness

On Wednesday, Anthropic released a revised version of Claude’s Constitution, a living document that provides a holistic explanation of the context in which Claude operates and the kind of entity the company would like Claude to be. The document was released in conjunction with Anthropic CEO Dario Amodei’s appearance at the World Economic Forum in Davos.

For years, Anthropic has sought to distinguish itself from its competitors via what it calls Constitutional AI, a system whereby its chatbot, Claude, is trained using a specific set of ethical principles rather than human feedback. Anthropic first published those principles, known as Claude’s Constitution, in 2023. The revised version retains most of the same principles but adds more nuance and detail on ethics and user safety, among other topics.

When Claude’s Constitution was first published nearly three years ago, Anthropic’s co-founder, Jared Kaplan, described it as an AI system that supervises itself based on a specific list of constitutional principles. Anthropic has said that it is these principles that guide the model to take on the normative behavior described in the constitution and, in so doing, avoid toxic or discriminatory outputs. An initial 2022 policy memo more bluntly notes that Anthropic’s system works by training an algorithm using a list of natural language instructions, which then make up what Anthropic refers to as the software’s constitution.

Anthropic has long sought to position itself as the ethical alternative to other AI companies, like OpenAI and xAI, that have more aggressively courted disruption and controversy. To that end, the new Constitution released Wednesday is fully aligned with that brand and has offered Anthropic an opportunity to portray itself as a more inclusive, restrained, and democratic business.

The 80-page document has four separate parts, which represent the chatbot’s core values. Those values are: being broadly safe, being broadly ethical, being compliant with Anthropic’s guidelines, and being genuinely helpful. Each section of the document dives into what each of those particular principles means, and how they theoretically impact Claude’s behavior.

In the safety section, Anthropic notes that its chatbot has been designed to avoid the kinds of problems that have plagued other chatbots and, when evidence of mental health issues arises, to direct the user to appropriate services. The document instructs the system to always refer users to relevant emergency services or provide basic safety information in situations that involve a risk to human life.

The ethical consideration is another major section of Claude’s Constitution. The document states that the company is less interested in Claude’s ethical theorizing and more in Claude knowing how to actually be ethical in a specific context, which it calls ethical practice. In other words, Anthropic wants Claude to be able to navigate real-world ethical situations skillfully.

Claude also has certain constraints that disallow it from having particular kinds of conversations. For instance, discussions of developing a bioweapon are strictly prohibited.

Finally, there is Claude’s commitment to helpfulness. Anthropic lays out a broad outline of how Claude’s programming is designed to be helpful to users. The chatbot has been programmed to consider a broad variety of principles when delivering information. Some of those principles include the immediate desires of the user, as well as the user’s well being, meaning the long-term flourishing of the user and not just their immediate interests. The document notes that Claude should always try to identify the most plausible interpretation of what its principals want and to appropriately balance these considerations.

Anthropic’s Constitution ends on a decidedly dramatic note, with its authors questioning whether the company’s chatbot does indeed have consciousness. The document states that Claude’s moral status is deeply uncertain and that the company believes the moral status of AI models is a serious question worth considering. It notes that this view is not unique to Anthropic, as some of the most eminent philosophers on the theory of mind take this question very seriously.