Roadmap

What we're working on

Features we're actively building or planning. This list grows as we ship. Have a feature request? Let us know.

More judge providers

Planned

Support for additional LLM providers as judge backends, depending on demand. Mix and match providers across eval runs to compare scoring consistency.

Custom metrics

Planned

Define your own scoring dimensions with custom prompts and scoring guides. Built the metrics that matter for your specific use case rather than being limited to a fixed set.

Flexible metric weighting

Planned

Choose which metrics matter most for your use case and weight them accordingly. A hallucination-sensitive workflow gets a different overall score than a latency-sensitive one.

Have a feature in mind?

We're a small team and we build what we need, but we genuinely listen to user feedback. If there's something you'd use, tell us about it.