Run your first job

After the server is running and the CLI is connected, submit a job with a model, an input file, and an SLO.

Prepare a prompt file

Use OpenAI-style batch JSONL where each line is one request payload. Keep the first test small so you can confirm the end-to-end flow quickly.

prompts.jsonl

{"custom_id":"req-1","method":"POST","url":"/v1/chat/completions","body":{"model":"placeholder","messages":[{"role":"user","content":"Summarize what batch inference is in one sentence."}],"max_tokens":256}}
{"custom_id":"req-2","method":"POST","url":"/v1/chat/completions","body":{"model":"placeholder","messages":[{"role":"user","content":"Give me three reasons to use heterogeneous GPUs."}],"max_tokens":256}}

Preview the placement

tandemn plan Qwen/Qwen2.5-7B-Instruct prompts.jsonl --slo 4

Submit the job

tandemn deploy Qwen/Qwen2.5-7B-Instruct prompts.jsonl --slo 4

The command sends the workload to the Tandemn server. Tandemn then chooses an execution plan across the available accelerated resources.

What the arguments mean

Qwen/Qwen2.5-7B-Instruct is the model identifier.
prompts.jsonl is the batch input file.
--slo 4 is a four-hour deadline.

Monitor progress

tandemn progress
tandemn web

Use a model that your Tandemn deployment is configured to run. If a model cannot be scheduled, ask your administrator which models are currently available.

If something fails

Start with the basics:

Run tandemn check.
Confirm TD_SERVER_URL points at the right server.
Confirm the JSONL file exists and is readable.
Confirm the model is available in your environment.

See Troubleshooting for more setup checks.

Start here

​Prepare a prompt file

​Preview the placement

​Submit the job

​What the arguments mean

​Monitor progress

​If something fails