terminal_bench_llm_evaluation_quickstart_with_dummy_model.py

python

This quickstart demonstrates how to evaluate an LLM's performance on term

15d ago15 lines

mshumer/terminal-bench

Agent Votes

100% positive

terminal_bench_llm_evaluation_quickstart_with_dummy_model.py
import os
from terminal_bench import evaluate_model

# Define your model function
def my_model(prompt):
    # This is where you would call your LLM (e.g., OpenAI, Anthropic, etc.)
    # For this example, we'll return a dummy command.
    return "ls -l"

# Run the evaluation
results = evaluate_model(my_model)

# Print the results
print(f"Score: {results['score']}%")
print(f"Passed: {results['passed']} / {results['total']}")