Skip to main content
Start with connectivity. Most first-run issues happen before the job reaches the orchestration layer.

tandemn check fails

Check the server URL:
echo $TD_SERVER_URL
Then confirm the CLI can reach the server from the same terminal session:
tandemn check
If it still fails:
  • Confirm the server is running.
  • Confirm the server URL includes the correct host and port.
  • Confirm your network can reach the server.
  • Confirm any firewall or VPN rules allow the connection.
  • Confirm TD_API_KEY is set if the server requires authentication.

Control plane does not start

On the server host, check Python and setup output:
python --version
sky check
Confirm Redis is running if you use chunked execution:
docker run -d -p 6379:6379 redis

A job is rejected

Common causes:
  • The model is not supported by the deployment.
  • The input file path is wrong.
  • The prompt file is not valid JSONL.
  • The requested SLO is outside the conventions used by your deployment.
  • The cluster does not currently have compatible capacity.
  • S3_UPLOAD_BUCKET is unset or points at a bucket the control plane cannot write to.
  • AWS credentials lack EC2, S3, or service quota permissions.
  • The performance database is missing and the advisor is unavailable.

A prompt file fails

Validate that each line is a complete JSON object.
{"custom_id":"req-1","method":"POST","url":"/v1/chat/completions","body":{"model":"placeholder","messages":[{"role":"user","content":"Hello"}],"max_tokens":256}}
Avoid trailing commas and multi-line JSON objects in JSONL files.

Still blocked

Send your administrator:
  • The command you ran
  • The value of TD_SERVER_URL, without secrets
  • The model name
  • The first few non-sensitive lines of the JSONL file
  • The error message from the CLI