Skip to main content
After the server is running and the CLI is connected, submit a job with a model, an input file, and an SLO.

Prepare a prompt file

Use OpenAI-style batch JSONL where each line is one request payload. Keep the first test small so you can confirm the end-to-end flow quickly.
prompts.jsonl
{"custom_id":"req-1","method":"POST","url":"/v1/chat/completions","body":{"model":"placeholder","messages":[{"role":"user","content":"Summarize what batch inference is in one sentence."}],"max_tokens":256}}
{"custom_id":"req-2","method":"POST","url":"/v1/chat/completions","body":{"model":"placeholder","messages":[{"role":"user","content":"Give me three reasons to use heterogeneous GPUs."}],"max_tokens":256}}

Preview the placement

tandemn plan Qwen/Qwen2.5-7B-Instruct prompts.jsonl --slo 4

Submit the job

tandemn deploy Qwen/Qwen2.5-7B-Instruct prompts.jsonl --slo 4
The command sends the workload to the Tandemn server. Tandemn then chooses an execution plan across the available accelerated resources.

What the arguments mean

  • Qwen/Qwen2.5-7B-Instruct is the model identifier.
  • prompts.jsonl is the batch input file.
  • --slo 4 is a four-hour deadline.

Monitor progress

tandemn progress
tandemn web
Use a model that your Tandemn deployment is configured to run. If a model cannot be scheduled, ask your administrator which models are currently available.

If something fails

Start with the basics:
  • Run tandemn check.
  • Confirm TD_SERVER_URL points at the right server.
  • Confirm the JSONL file exists and is readable.
  • Confirm the model is available in your environment.
See Troubleshooting for more setup checks.