> ## Documentation Index > Fetch the complete documentation index at: https://docs.tandemn.com/llms.txt > Use this file to discover all available pages before exploring further. # Introduction > Learn what Tandemn does, who it is for, and where to start. Tandemn System helps teams run inference workloads across accelerated infrastructure without hand-tuning every model, GPU, and batch job. It provides a self-hosted control plane that manages orchestration and a CLI that users can use to submit inference jobs. Instead of sending every workload to the newest and most expensive accelerator, Tandemn can route jobs across heterogeneous GPU pools and use the hardware that best matches the workload's latency and throughput requirements. Install the server, connect the CLI, and submit a model inference job. Learn how the server, users, GPU workers, and job scheduler fit together. Install the control plane, configure AWS, and start the server. Check connectivity and submit jobs from a local Python environment. ## What Tandemn is for Tandemn System is designed for teams that already have, or plan to operate, accelerated compute and want a simpler way to run inference workloads across that capacity. * **Infrastructure teams** can expose a single service for users instead of asking each team to manage hardware placement. * **ML and application teams** can submit jobs through the CLI without deciding which machine or GPU should run them. * **Organizations with mixed GPU supply** can use available capacity more efficiently across different machines and accelerator types. ## How the workflow fits together The control plane is deployed on a machine that users and EC2 replicas can reach over the network. It manages cluster state and receives inference job requests. Users install the `tandemn` Python package, set `TD_SERVER_URL`, and check connectivity. A user provides a model, a JSONL prompt file, and a service-level objective. Tandemn schedules the job across available accelerated resources. The orchestration layer selects an efficient hardware mix for the workload so users can focus on the job rather than the cluster. ## Next step Start with the [Quickstart](/quickstart) if you want the shortest path from a new environment to a submitted inference job.