Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.tandemn.com/llms.txt

Use this file to discover all available pages before exploring further.

Replica commands operate on running jobs that use chunked execution. New replicas join the same Redis chunk queue, and killed replicas have in-flight chunks reclaimed and returned to pending.

Add replicas

tandemn add <job_id> 2
tandemn add <job_id> 3 --gpu L40S --tp 4
The first command adds two replicas that inherit the job’s current GPU configuration. The second command adds three L40S replicas, creating a heterogeneous fleet.
Argument or flagDescription
job_idRunning job to scale.
NNumber of replicas to add.
--gpu <type>Optional GPU type override for the new replicas.
--tp NOptional tensor parallelism override.
--pp NOptional pipeline parallelism override.
--on-demandLaunch new replicas on on-demand instances instead of spot.

Kill replicas

tandemn kill <job_id> --replica <rid>
tandemn kill <job_id> --replica r0 --replica r1
Use tandemn kill to terminate specific replicas. Any chunk leased by a killed replica is reclaimed and returned to the queue.
FlagDescription
--replica <rid>Replica ID to kill. Repeat the flag to kill multiple replicas.

Hot-swap replicas

tandemn swap <job_id> --gpu A100 --tp 4 --replicas 2
tandemn swap <job_id> --gpu L40S --tp 1 --ready-threshold 2 --on-demand
Hot-swap replaces all replicas with a new GPU configuration mid-job. Tandemn launches the replacement replicas first, waits for them to begin processing, and then tears down the old replicas.
FlagDescription
--gpu <type>GPU type for the replacement fleet.
--tp NTensor parallelism for replacement replicas.
--pp NPipeline parallelism for replacement replicas.
--replicas NNumber of replacement replicas.
--ready-threshold NNumber of new replicas that must be ready before old replicas are removed.
--on-demandUse on-demand instances for the replacement fleet.

Operational notes

  • Replica operations require Redis-backed chunk coordination.
  • Killed replicas do not lose completed chunks.
  • Hot-swap is designed to avoid dropped chunks by keeping the shared queue intact.