tandemn plan to preview placement without launching, and tandemn deploy to submit a batch inference job. Both commands take a model name, an input JSONL file or S3 URI, and an SLO deadline.
Usage
Examples
Arguments
| Argument | Required | Description |
|---|---|---|
model | Yes | Model identifier to run. Use a model supported by your Tandemn deployment. |
input | Yes | Local JSONL file or s3://... URI containing the batch workload. Local files are uploaded to S3 automatically. |
Options
| Flag | Description | Default |
|---|---|---|
--slo <hours> | Deadline. Accepts plain hours (4), fractional hours (0.5h), or minutes (30m). | 4 |
--max-output-tokens N | Maximum tokens per response. | 1024 |
--gpu <type> | Override GPU type, such as A100, H100, L40S, or A10G. | Solver-selected |
--tp N | Override tensor parallelism. | Solver-selected |
--pp N | Override pipeline parallelism. | Solver-selected |
--replicas N | Number of replica clusters. | 1 |
--chunk-size N | Lines per chunk for multi-replica jobs. | 1000 |
--no-advisor | Skip the LLM advisor and use the roofline solver only. | Disabled |
--skip-dangerously | Skip the interactive solver choice and auto-pick the advisor recommendation. | Disabled |
--force | Skip feasibility checks and launch anyway. | Disabled |
--persist | Keep clusters alive after the job completes. | Disabled |
--on-demand | Use on-demand instances instead of spot instances. | Disabled |
Placement behavior
Two placement solvers can run when you calltandemn plan or tandemn deploy:
- LLM Advisor: architecture-aware recommendation over the performance database, with an LLM reasoning layer that ranks candidates by cost, throughput, and SLO feasibility.
- Roofline solver: deterministic analytical placement based on GPU bandwidth, TFLOPS, memory, and model constraints.
KOI_SERVICE_URL is set, Tandemn can also show an optional Koi recommendation alongside the built-in solvers.
Supported overrides
Use--gpu, --tp, and --pp when you want to force a specific hardware plan. This is useful for models outside the performance database or when you already know the target fleet.
Prompt file format
Use OpenAI-style batch JSONL. See Input format for the full schema.prompts.jsonl
Before submitting
- Run
tandemn check. - Confirm the prompt file exists.
- Confirm the file is valid JSONL.
- Confirm the model is supported by your deployment.
- Confirm the S3 upload bucket is configured on the server.
- Start with a small file before scaling up.

