Model identifiers
The quickstart uses a Hugging Face style model identifier:Placement solvers
tandemn plan and tandemn deploy can show recommendations from two built-in solvers:
| Solver | Description |
|---|---|
| LLM Advisor | Uses the performance database and an LLM reasoning layer to rank placements by cost, throughput, and SLO feasibility. |
| Roofline solver | Uses GPU bandwidth, TFLOPS, memory, and model constraints. No API key is required. |
KOI_SERVICE_URL is set, Tandemn can also show an optional Koi recommendation.
Routing goals
Tandemn’s routing layer is designed to reduce the amount of manual placement work users need to do. Instead of asking each user to pick a specific machine, Tandemn can evaluate the job and choose an appropriate hardware mix.What affects placement
The exact placement decision depends on deployment-specific configuration, but these are the common inputs to reason about:- Model size and runtime requirements
- Prompt file size
- Requested SLO
- Available GPUs
- Current cluster load
- AWS quota and capacity
- Spot or on-demand launch mode
- Tensor and pipeline parallelism settings
Supported hardware
| GPU | AWS instance | VRAM |
|---|---|---|
| A100 80GB | p4d.24xlarge, p4de.24xlarge | 8 x 80GB |
| H100 80GB | p5.48xlarge | 8 x 80GB |
| L40S 48GB | g6e.12xlarge, g6e.24xlarge, g6e.48xlarge | 4 x / 4 x / 8 x 48GB |
| A10G 24GB | g5.12xlarge, g5.48xlarge | 4 x / 8 x 24GB |
If a model cannot be scheduled, the first thing to check is whether the model is enabled in the deployment and whether compatible resources are available.

