LLM Eval — Tools & Articles — Practical Tools. Trusted Intelligence.

DeepEval

Pytest-style unit testing for LLM outputs, 50+ metrics, Apache 2.0

AI Directory →

LangWatch

OpenTelemetry-native LLMOps platform for tracing, evals and agents

AI Directory →

Maxim AI

Agent evaluation, simulation and observability in one workflow

AI Directory →

Opik

Open-source LLM tracing, evals and guardrails you can self-host

AI Directory →

Promptfoo

MIT-licensed LLM eval and red-teaming CLI, now owned by OpenAI

AI Directory →