Skip to content
ai-ml

Prompt Tune

Iterate on a Claude/OpenAI prompt scientifically — eval cases, A/B variants, measurable improvement.

/prompt-tune

Install this skill

  1. 1. Copy the SKILL.md content (button above)
  2. 2. Create a folder for the skill:
    # Mac/Linux
    mkdir -p ~/.claude/skills/prompt-tune
    
    # Windows (PowerShell)
    mkdir $env:USERPROFILE\.claude\skills\prompt-tune
  3. 3. Save the content as ~/.claude/skills/prompt-tune/SKILL.md
  4. 4. Restart Claude Code (or open a new session)
  5. 5. Type /prompt-tune to invoke it
llmprompt-engineeringevals

/prompt-tune

Improve a prompt scientifically, not by vibes.

Usage

/prompt-tune src/lib/prompts/extractor.ts /prompt-tune --interactive # paste a prompt + eval cases inline

Workflow

1. Define the eval set

Either provide:

  • 5-15 input/expected-output pairs (JSONL)
  • A scoring criterion ("does the output contain a valid JSON object with field X?")

2. Baseline run

  • Run the current prompt against all cases
  • Record pass/fail + cost per call

3. Generate variants

Create 3-5 modified prompts using:

  • Better instruction phrasing
  • Few-shot examples
  • Chain-of-thought prefix
  • Output format constraints
  • System role tightening

4. Compare

Side-by-side: pass rate, cost, latency for each variant.

5. Recommend

Surface the variant with the best pass-rate-to-cost ratio. Show the diff vs baseline.

Output

Baseline: 7/12 passing · $0.012/call · avg 1.8s Variant 2 (added few-shot): 11/12 passing · $0.014/call · avg 2.1s ✓ best Variant 3 (stricter system): 9/12 passing · $0.011/call · avg 1.6s

Then writes the winning variant back to the file with a // tuned 2026-04-25 comment.

Rules

  • Use claude-haiku-4-5-20251001 for tuning runs unless quality demands Sonnet
  • Cap evaluation at 50 runs per session to control cost
  • Save the eval set as <prompt-file>.eval.jsonl for future regression checks
Prompt Tune | MCPFlix