About

Built for devs working with Foundation Models

Apple Intelligence made it possible to add AI features directly in your macOS app without calling cloud APIs. But when you're tuning prompts and generation settings, it's hard to know if you're actually making things better. LLM Eval Suite connects to OpenAI, Claude, or MiniMax as judges so you can run structured evaluations and see real numbers instead of guessing.

We built this for ourselves. Our voice transcript summarization feature needed a way to test prompt changes reliably and measure results over time without leaving our workflow on macOS.

How It Works

Run

Execute prompts against Foundation Models on macOS.

Score

Judge outputs via OpenAI, Claude, or MiniMax.

Iterate

Review leaderboards and refine prompts.

About DreamLab Solutions LLC

DreamLab Solutions LLC is a small, independent software studio. We make tools we actually need ourselves. Find out more at www.dreamlabsolutionsllc.com.