When batch inference is a good fit
- Offline evaluation jobs
- Dataset labeling or enrichment
- Scheduled summarization, extraction, or classification tasks
- Experiments that can tolerate queueing in exchange for lower cost or better hardware utilization
Why heterogeneous GPUs help
Not every workload needs the newest, largest GPU. Some jobs can run efficiently on smaller or less utilized accelerators. Tandemn is designed to make that resource selection part of the orchestration layer instead of a manual decision for every user.What users provide
Users typically provide:- A model identifier
- A JSONL input file
- A service-level objective

