Tandemn is an infrastructure layer for running inference workloads across accelerated clusters. It gives administrators a server to operate and users a CLI to submit jobs.
Tandemn is built for teams that run inference workloads and want to reduce manual infrastructure decisions. It is especially useful when an organization has mixed GPU supply or idle accelerated capacity.
Inference infrastructure often requires users to choose hardware, size jobs, and manage placement manually. Tandemn moves those decisions into an orchestration layer so users can submit work through a simpler interface.
The current docs focus on batch inference jobs submitted through the CLI. Batch workflows are a natural fit when jobs can be queued, planned, and executed across available resources.
An administrator needs Python 3.12+, AWS credentials, IAM access for EC2/S3/service quotas, an S3 bucket, Redis, and a network-reachable host for the self-hosted control plane.