Run your first job
Install the server, connect the CLI, and submit a model inference job.
Understand the architecture
Learn how the server, users, GPU workers, and job scheduler fit together.
Set up a cluster
Install the control plane, configure AWS, and start the server.
Use the CLI
Check connectivity and submit jobs from a local Python environment.
What Tandemn is for
Tandemn System is designed for teams that already have, or plan to operate, accelerated compute and want a simpler way to run inference workloads across that capacity.- Infrastructure teams can expose a single service for users instead of asking each team to manage hardware placement.
- ML and application teams can submit jobs through the CLI without deciding which machine or GPU should run them.
- Organizations with mixed GPU supply can use available capacity more efficiently across different machines and accelerator types.
How the workflow fits together
An administrator starts the Tandemn server
The control plane is deployed on a machine that users and EC2 replicas can reach over the network. It manages cluster state and receives inference job requests.
Users install the Tandemn CLI
Users install the
tandemn Python package, set TD_SERVER_URL, and check connectivity.Users submit inference jobs
A user provides a model, a JSONL prompt file, and a service-level objective. Tandemn schedules the job across available accelerated resources.

