BixArena Leaderboard Snapshot Automation¶
Overview¶
The BixArena leaderboard is updated daily through an automated snapshot generation pipeline. Each day, a scheduled task fetches battle evaluation data, computes model rankings using the Bradley-Terry algorithm, and publishes the results as a new public leaderboard snapshot.
Architecture¶
- EventBridge triggers the
bixarena-workerFargate task daily at 10:00 AM UTC - The task runs in a private VPC subnet with direct access to the PostgreSQL database
- The handler generates a leaderboard snapshot and auto-publishes it
- Structured JSON logs are emitted to CloudWatch for observability
flowchart LR
eventbridge["EventBridge<br>(cron: daily 10 AM UTC)"]
worker["bixarena-worker<br>(Fargate Task)"]
rds["PostgreSQL<br>(RDS)"]
cloudwatch["CloudWatch Logs"]
eventbridge -->|triggers| worker
worker -->|reads/writes| rds
worker -->|structured JSON logs| cloudwatch
Components¶
Shared Library (bixarena-leaderboard)¶
The core ranking and snapshot generation logic lives in a standalone Python library at
libs/bixarena/leaderboard/python/. It provides:
- Bradley-Terry ranking — computes model win rates with bootstrap confidence intervals
- Snapshot generation — fetches battle evaluations, ranks models, inserts snapshot and entries,
and sets visibility to
public
Worker Container (bixarena-worker)¶
A Python application at apps/bixarena/worker/ that runs as a one-shot container task. On each
invocation, the handler:
- Reads configuration from environment variables
- Calls
generate_snapshot()from the shared library - Emits structured JSON logs with a
correlation_idfor CloudWatch tracing
CDK Infrastructure (WorkerStack)¶
An ECS Scheduled Fargate Task deployed via AWS CDK. Key resources:
- EventBridge rule — triggers the task daily at 10:00 AM UTC
- Fargate task — 1 vCPU, 2 GB memory, runs the
bixarena-workercontainer image - Secrets Manager — database credentials are injected securely at task start by the ECS agent
Configuration¶
The following environment variables control snapshot generation. They are set in CDK and require a redeployment to change.
| Variable | Default | Description |
|---|---|---|
LEADERBOARD_SLUG |
overall |
Slug of the leaderboard to snapshot. Must match a slug that exists in the database — validated at runtime. |
NUM_BOOTSTRAP |
1000 |
Number of bootstrap iterations for Bradley-Terry confidence interval computation. |
MIN_EVALS |
10 |
Minimum number of battle evaluations a model must have to be included in the snapshot. |
SIGNIFICANT |
false |
If true, only include models with statistically significant rankings. |
POSTGRES_HOST |
(from RDS) | Injected automatically by CDK from the RDS endpoint. |
POSTGRES_PORT |
(from RDS) | Injected automatically by CDK from the RDS endpoint. |
POSTGRES_DB |
bixarena |
Database name. |
POSTGRES_USER |
(Secrets Manager) | Injected securely at task start by the ECS agent. |
POSTGRES_PASSWORD |
(Secrets Manager) | Injected securely at task start by the ECS agent. |
To change a value: update the relevant app.py for the target environment and redeploy:
nx run bixarena-infra-cdk:deploy:[dev|stage|prod]
Manual Trigger¶
Developers can generate a snapshot on demand by opening an SSH tunnel to the target RDS instance and invoking the handler locally:
# 1. Open SSH tunnel to RDS
nx run bixarena-infra-cdk:start-db-tunnel [dev|stage|prod]
# 2. Configure DB credentials in apps/bixarena/worker/.env
# 3. Run the handler locally
nx run bixarena-worker:invoke
Deferred Features¶
The following features were planned in RFC-0001 but deferred to a future admin dashboard release:
- On-demand trigger — API endpoint to manually trigger snapshot generation (ADR-0002)
- Notifications — SNS/Slack alerts on task success or failure (ADR-0002)