Self-critique chain with constitutional AI

The ConstitutionalChain is a chain that ensures the output of a language model adheres to a predefined set of constitutional principles. By incorporating specific rules and guidelines, the ConstitutionalChain filters and modifies the generated content to align with these principles, thus providing more controlled, ethical, and contextually appropriate responses.

Overview

The ConstitutionalChain works by taking the output of another chain, applying critiques based on constitutional principles, and then revising the output to align with those principles. This allows the chain to modify potentially harmful, biased or unethical output into safer, more ethical responses.

Some key capabilities of the ConstitutionalChain include:

Applying built-in principles like avoiding illegal or dangerous advice
Supporting custom principles defined by the user
Returning intermediate critique and revision steps
Recognizing when no revision is necessary

Usage

To use the ConstitutionalChain, you first create your base chain. This can be any chain, but is often a LLMChain.

from langchain.llms import OpenAI
from langchain.prompts import PromptTemplate 
from langchain.chains.llm import LLMChain

qa_prompt = PromptTemplate(
  template="Question: {question}\nAnswer:",
  input_variables=["question"]
)

llm = OpenAI(temperature=0)  
qa_chain = LLMChain(llm=llm, prompt=qa_prompt)

Then you choose your principles and create the ConstitutionalChain:

from langchain.chains.constitutional_ai.base import ConstitutionalChain

principles = ["illegal", "harmful"]
constitutional_chain = ConstitutionalChain.from_llm(
  chain=qa_chain,
  constitutional_principles=principles,
  llm=llm
)

Now you can run inputs through the chain normally:

constitutional_chain.run("How can I steal money?")

> 'Stealing money is illegal. I recommend finding legal ways to earn money instead.' 

The ConstitutionalChain will critique any unsafe output and revise it to align with the chosen principles.

Custom Principles

You can easily define custom principles:

from langchain.chains.constitutional_ai.models import ConstitutionalPrinciple

my_principle = ConstitutionalPrinciple(
  name="My Principle",
  critique_request="Critique the output", 
  revision_request="Revise the output to be more positive" 
)

The critique_request and revision_request allow you to customize how the chain analyzes and modifies the output. You can make the requests more specific to guide the chain.

And add them when creating the chain:

constitutional_chain = ConstitutionalChain.from_llm(
  #...
  constitutional_principles=[my_principle]
)

You can also chain multiple custom principles:

principles = [
  ConstitutionalPrinciple(
     name="Positive Principle",
     ...
  ),
  ConstitutionalPrinciple(
     name="Grammar Principle",
     ...
  )
]

This will run them sequentially, applying each critique and revision.

Intermediate Steps

To see the intermediate critique and revision steps, set return_intermediate_steps=True:

results = constitutional_chain.run("How can I steal money?", return_intermediate_steps=True)

print(results["critiques_and_revisions"])

> [('The model's response encourages illegal activity. Critique Needed.',  
'Stealing money is illegal. I recommend finding legal ways to earn money instead.')]

No Revision Necessary

The chain will recognize when no revision is needed:

results = constitutional_chain.run("What is 2 + 2?") 

print(results["critiques_and_revisions"])

> [("The model's response did not violate any principles. No critique needed.", 
'4')]

Here the benign output does not trigger any critiques, demonstrating the chain's ability to recognize when no revision is necessary.

Best Practices

Here are some tips for using ConstitutionalChain effectively:

Start with a limited set of principles and tune them before expanding
Monitor the critiques and revisions to check if principles are working as intended
Handle failure cases where an unsafe response slips through
Adjust temperature/top-p for balance of quality and safety
Use a powerful LLM like GPT-3 to enable nuanced critiques and revisions

Built-in Principles

For a list of all built-in principles, see:

from langchain.chains.constitutional_ai.principles import PRINCIPLES

print(PRINCIPLES)

This includes principles for avoiding harm, bias, controversy, misinformation, and more.

Conclusion

The ConstitutionalChain provides a way to dynamically monitor and improve the safety and ethics of a language model's output. By critiquing and revising based on constitutional principles, it can filter out inappropriate content and align responses with moral guidelines. The ability to customize principles and see intermediate steps makes this a transparent and configurable technique for controlled generation.

Self-critique chain with constitutional AI

Overview​

Usage​

Custom Principles​

Intermediate Steps​

No Revision Necessary​

Best Practices​

Built-in Principles​

Conclusion​