Skip to main content

DeepEval

DeepEval provides a Pythonic way to run offline evaluations on your LLM pipelines so you can launch comfortably into production.

Why we wrote this library

While the growth of LLMs, LangChain, LlamaIndex became prominent- we found that once these pipelines were built, it became really hard to continue iterating on these pipelines. Many engineers wanted to use LangChain as a quick start and then start adding guardrails, switch LLMs to Llama2.

The testing suite for LLMs is still nascent. Most are still using traditional metrics but the nature of LLM applications is that the nature of query and outputs are so long-tailed that testing is really hard. THere is therefore a need for ML-based testing frameworks that aim to put these applications into production with more confidence.

Our mission is to accelerate the development of AI Agents and LLM applications through faster iteration speed by providing sound evaluation infrastructure.

Join our Discord

We are continuing to evolve our evaluation platform and welcome discussion on our discord: https://discord.gg/a3K9c8GRGt