BLXBench makes AI model benchmarking accessible, interactive, and community-driven. Whether you're evaluating models for production use, researching AI performance, or just curious about different LLMs, BLXBench provides a streamlined way to run standardized tests and compare results.
This guide walks you through installing BLXBench, running your first benchmark, and sharing results to the public leaderboard - all in under five minutes.
Installation
BLXBench is available as an npm package, making installation straightforward regardless of your preferred JavaScript runtime.
Via npm
npm install -g @bitslix/blxbench
Via pnpm
pnpm add -g @bitslix/blxbench
Via Bun
bun add -g @bitslix/blxbench
After installation, verify BLXBench is available:
blxbench --version
# Output: @bitslix/blxbench/1.0.0
First Run: Quick Start
The fastest way to experience BLXBench is through the interactive Terminal User Interface (TUI), which provides real-time feedback during benchmark runs.
1. Start BLXBench
blxbench
2. Follow the Setup Wizard
On first launch, BLXBench guides you through:
- API Key Configuration: Enter API keys for model providers you want to test (OpenAI, Anthropic, etc.)
- Benchmark Suite Selection: Choose between available test suites (currently v1–Nutrition and v2–Resilience)
- Concurrency Settings: Adjust how many simultaneous requests BLXBench makes
3. Run Your First Benchmark
Once configured, simply type:
/run
BLXBench will:
- Download the selected test fixtures
- Execute prompts against your configured models
- Display live metrics including:
- Cumulative cost (updated in real-time)
- Pass rate
- Average latency
- Tokens per second
- Show a progress bar for each test category
Understanding Your Results
As tests complete, BLXBench provides immediate feedback in the TUI:
Live Dashboard
The main screen shows:
- Top Score: Current leading model in your run
- Executed Tests: Number of completed fixtures
- Est. API Spend: Running total cost
- Top Decode Speed: Fastest token generation rate
- Categories: Breakdown by test type (Coding, Reasoning, etc.)
Detailed Model View
Press Enter on any model in the leaderboard to see:
- Pass rate breakdown by category
- Latency and cost metrics
- Test suite participation history
- Option to save/load this model's results
Sharing to the Community Leaderboard
One of BLXBench's unique features is the ability to publish your results to the public leaderboard at blxbench.com.
Publishing Results
After a run completes, use:
/publish
This will:
- Aggregate your results into a standardized format
- Prompt you to confirm publication
- Upload anonymized data to the Bitslix server
- Appear on the public leaderboard within moments
What Gets Published
The public leaderboard shows:
- Model name and provider
- Pass rate and score
- Average latency and cost
- Test suite version used
- Timestamp of the run
No API keys, prompts, or raw responses are ever shared - only aggregated performance metrics.
Beyond the Basics
Headless Mode for CI/CD
For automated testing in pipelines:
blxbench --headless --output results.json
Useful for nightly benchmarks or PR validation.
Arcade Mode
Keep your mind sharp during long runs:
/arcade
Access minigames directly from the TUI without interrupting benchmarks.
Report Browser
Examine past runs locally:
/report list
/report open <run-id>
View detailed HTML reports of previous benchmark sessions.
Next Steps
- Try Different Models: Experiment with API keys from various providers to compare performance
- Join the Community: Share your
/publishresults and see how your favorites stack up - Explore Test Suites: The v2–Resilience suite adds stress tests for hallucination and recovery
- Provide Feedback: Join the Bitslix Discord to suggest features or report issues
BLXBench is designed to evolve with community input. Every published run helps build a more comprehensive picture of AI model performance across real-world workloads.
Ready to benchmark? Install BLXBench today and see how your preferred models perform: https://blxbench.com
