- Aggregates and routes jobs to the top open source models deployed across a collection of GPUs on separate machines
- Deploys across heterogeneous GPU pools, drawing on idle capacity across mixed GPU instances
- Comparable inference latency at a significantly lower cost

