Skip to main content
The Tandemn server should be treated as shared infrastructure. It receives job requests from users and coordinates execution across accelerated resources.

Deployment checklist

1

Choose a server host

Pick a machine in your AWS environment that CLI users and EC2 inference replicas can reach.
2

Prepare AWS access

Configure AWS credentials, IAM permissions, S3 bucket access, and quota visibility.
3

Clone the server repository

Fetch the Tandemn server repository and submodules.
4

Run setup

Install dependencies, verify AWS and Redis connectivity, and create .env.
5

Start Redis

Run Redis for chunked multi-replica jobs.
6

Start the control plane

Run python server.py --url <public-or-private-url>.
7

Verify from a client

Run tandemn check from a separate machine using the public server URL.

Commands

git clone --recurse-submodules https://github.com/Tandemn-Labs/Tandemn-server.git
cd Tandemn-server
bash setup.sh
docker run -d -p 6379:6379 redis
python server.py --url https://your-public-url.example.com

Operating model

The administrator owns the server deployment, AWS credentials, S3 bucket, Redis, environment configuration, network exposure, and cluster access. Users own their prompt files, model choices, SLOs, and CLI environment.
Document the server URL, supported models, and expected SLO values for your users before sharing access.

Network model

For local development, the control plane can be exposed through a tunnel or VPN. For production, deploy the control plane on a small EC2 instance in the same VPC as the inference replicas to avoid public exposure and tunnel latency.

Updates

When a new Tandemn server version is available, review the release notes, back up deployment-specific configuration, and update in a controlled maintenance window.