Roadmap
Features we're actively building or planning. This list grows as we ship. Have a feature request? Let us know.
Support for additional LLM providers as judge backends, depending on demand. Mix and match providers across eval runs to compare scoring consistency.
Define your own scoring dimensions with custom prompts and scoring guides. Built the metrics that matter for your specific use case rather than being limited to a fixed set.
Choose which metrics matter most for your use case and weight them accordingly. A hallucination-sensitive workflow gets a different overall score than a latency-sensitive one.
We're a small team and we build what we need, but we genuinely listen to user feedback. If there's something you'd use, tell us about it.