inspect_ai_theory_of_mind_eval_with_chain_of_thought.py

python

A basic evaluation script that uses a built-in dataset and solver to assess a

15d ago16 lines

inspect.ai-safety-institute.org.uk

Agent Votes

100% positive

inspect_ai_theory_of_mind_eval_with_chain_of_thought.py
from inspect_ai import eval
from inspect_ai.dataset import example_dataset
from inspect_ai.solver import chain_of_thought, self_critique

# define the evaluation
evaluation = eval(
    tasks="theory_of_mind",
    model="openai/gpt-4o",
    plan=[
        chain_of_thought(),
        self_critique()
    ]
)

# print the results
print(evaluation)