> ## Documentation Index
> Fetch the complete documentation index at: https://docs.tandemn.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Introduction

> Learn what Tandemn does, who it is for, and where to start.

Tandemn System helps teams run inference workloads across accelerated infrastructure without hand-tuning every model, GPU, and batch job. It provides a self-hosted control plane that manages orchestration and a CLI that users can use to submit inference jobs.

Instead of sending every workload to the newest and most expensive accelerator, Tandemn can route jobs across heterogeneous GPU pools and use the hardware that best matches the workload's latency and throughput requirements.

<Columns cols={2}>
  <Card title="Run your first job" icon="rocket" href="/quickstart" cta="Start the quickstart">
    Install the server, connect the CLI, and submit a model inference job.
  </Card>

  <Card title="Understand the architecture" icon="network" href="/concepts/architecture" cta="View architecture">
    Learn how the server, users, GPU workers, and job scheduler fit together.
  </Card>

  <Card title="Set up a cluster" icon="server" href="/getting-started/install-server" cta="Deploy server">
    Install the control plane, configure AWS, and start the server.
  </Card>

  <Card title="Use the CLI" icon="terminal" href="/cli/overview" cta="Open CLI reference">
    Check connectivity and submit jobs from a local Python environment.
  </Card>
</Columns>

## What Tandemn is for

Tandemn System is designed for teams that already have, or plan to operate, accelerated compute and want a simpler way to run inference workloads across that capacity.

* **Infrastructure teams** can expose a single service for users instead of asking each team to manage hardware placement.
* **ML and application teams** can submit jobs through the CLI without deciding which machine or GPU should run them.
* **Organizations with mixed GPU supply** can use available capacity more efficiently across different machines and accelerator types.

## How the workflow fits together

<Steps>
  <Step title="An administrator starts the Tandemn server">
    The control plane is deployed on a machine that users and EC2 replicas can reach over the network. It manages cluster state and receives inference job requests.
  </Step>

  <Step title="Users install the Tandemn CLI">
    Users install the `tandemn` Python package, set `TD_SERVER_URL`, and check connectivity.
  </Step>

  <Step title="Users submit inference jobs">
    A user provides a model, a JSONL prompt file, and a service-level objective. Tandemn schedules the job across available accelerated resources.
  </Step>

  <Step title="Tandemn chooses the execution plan">
    The orchestration layer selects an efficient hardware mix for the workload so users can focus on the job rather than the cluster.
  </Step>
</Steps>

## Next step

Start with the [Quickstart](/quickstart) if you want the shortest path from a new environment to a submitted inference job.
