Chat Completions

Create Chat Completion

curl --request POST \
  --url http://api.tandemn.com/v1/chat/completions \
  --header 'Content-Type: application/json' \
  --data '{
  "model": "casperhansen/deepseek-r1-distill-llama-70b-awq",
  "stream": true,
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful assistant."
    },
    {
      "role": "user",
      "content": "What is the capital of France?"
    }
  ],
  "max_completion_tokens": 100,
  "temperature": 0.7
}'

"data: {\"id\":\"uuid-here\",\"object\":\"chat.completion.chunk\",\"created\":1234567890,\"model\":\"llama-3.2-1b\",\"choices\":[{\"index\":0,\"delta\":{\"content\":\"The\"}}]}\n\ndata: {\"id\":\"uuid-here\",\"object\":\"chat.completion.chunk\",\"created\":1234567890,\"model\":\"llama-3.2-1b\",\"choices\":[{\"index\":0,\"delta\":{\"content\":\" capital\"}}]}\n\ndata: {\"id\":\"uuid-here\",\"object\":\"chat.completion.chunk\",\"created\":1234567890,\"model\":\"llama-3.2-1b\",\"choices\":[{\"index\":0,\"delta\":{\"content\":\" of\"}}]}\n\ndata: {\"id\":\"uuid-here\",\"object\":\"chat.completion.chunk\",\"created\":1234567890,\"model\":\"llama-3.2-1b\",\"choices\":[{\"index\":0,\"delta\":{\"content\":\" France\"}}]}\n\ndata: {\"id\":\"uuid-here\",\"object\":\"chat.completion.chunk\",\"created\":1234567890,\"model\":\"llama-3.2-1b\",\"choices\":[{\"index\":0,\"delta\":{\"content\":\" is\"}}]}\n\ndata: {\"id\":\"uuid-here\",\"object\":\"chat.completion.chunk\",\"created\":1234567890,\"model\":\"llama-3.2-1b\",\"choices\":[{\"index\":0,\"delta\":{\"content\":\" Paris.\"}}]}\n\ndata: [DONE]\n"

POST

chat

completions

Create Chat Completion

curl --request POST \
  --url http://api.tandemn.com/v1/chat/completions \
  --header 'Content-Type: application/json' \
  --data '{
  "model": "casperhansen/deepseek-r1-distill-llama-70b-awq",
  "stream": true,
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful assistant."
    },
    {
      "role": "user",
      "content": "What is the capital of France?"
    }
  ],
  "max_completion_tokens": 100,
  "temperature": 0.7
}'

"data: {\"id\":\"uuid-here\",\"object\":\"chat.completion.chunk\",\"created\":1234567890,\"model\":\"llama-3.2-1b\",\"choices\":[{\"index\":0,\"delta\":{\"content\":\"The\"}}]}\n\ndata: {\"id\":\"uuid-here\",\"object\":\"chat.completion.chunk\",\"created\":1234567890,\"model\":\"llama-3.2-1b\",\"choices\":[{\"index\":0,\"delta\":{\"content\":\" capital\"}}]}\n\ndata: {\"id\":\"uuid-here\",\"object\":\"chat.completion.chunk\",\"created\":1234567890,\"model\":\"llama-3.2-1b\",\"choices\":[{\"index\":0,\"delta\":{\"content\":\" of\"}}]}\n\ndata: {\"id\":\"uuid-here\",\"object\":\"chat.completion.chunk\",\"created\":1234567890,\"model\":\"llama-3.2-1b\",\"choices\":[{\"index\":0,\"delta\":{\"content\":\" France\"}}]}\n\ndata: {\"id\":\"uuid-here\",\"object\":\"chat.completion.chunk\",\"created\":1234567890,\"model\":\"llama-3.2-1b\",\"choices\":[{\"index\":0,\"delta\":{\"content\":\" is\"}}]}\n\ndata: {\"id\":\"uuid-here\",\"object\":\"chat.completion.chunk\",\"created\":1234567890,\"model\":\"llama-3.2-1b\",\"choices\":[{\"index\":0,\"delta\":{\"content\":\" Paris.\"}}]}\n\ndata: [DONE]\n"

Body

application/json

model

string

required

The name of the deployed model to use for the chat completion. Must match a model that has been deployed via the /deploy endpoint.

Example:

"llama-3.2-1b"

stream

boolean

required

Whether to stream the response using Server-Sent Events (SSE). When true, returns partial message deltas as they are generated.

Example:

true

messages

object[]

required

The messages to generate a chat completion for. Supports roles: "system", "user", "assistant", "tool" (developer role is converted to system).

Minimum length: 1

Show child attributes

Example:

[
  {
    "role": "system",
    "content": "You are a helpful assistant."
  },
  {
    "role": "user",
    "content": "Hello, how are you?"
  }
]

max_completion_tokens

integer | null

The maximum number of tokens to generate in the completion. Maps to vLLM's max_tokens parameter.

Required range: x >= 1

Example:

100

temperature

number | null

Sampling temperature to use, between 0 and 2. Higher values make the output more random, lower values more deterministic.

Required range: 0 <= x <= 2

Example:

0.7

top_p

number | null

Nucleus sampling parameter. The model considers tokens with top_p probability mass. E.g., 0.1 means only tokens comprising the top 10% probability mass are considered.

Required range: 0 <= x <= 1

Example:

0.95

top_k

integer | null

The number of highest probability vocabulary tokens to keep for top-k filtering.

Required range: x >= 1

Example:

50

min_p

number | null

Minimum probability for a token to be considered, relative to the most likely token.

Required range: 0 <= x <= 1

Example:

0.05

min_tokens

integer | null

The minimum number of tokens to generate before stopping.

Required range: x >= 0

Example:

10

seed

integer | null

Random seed for reproducible sampling.

Example:

42

frequency_penalty

number | null

Penalizes tokens based on their frequency in the generated text so far. Positive values decrease likelihood of repetition.

Required range: -2 <= x <= 2

Example:

0.5

repetition_penalty

number | null

Penalizes tokens that have already appeared in the generated text. Values > 1.0 discourage repetition, < 1.0 encourage it.

Required range: x >= 0

Example:

1.1

presence_penalty

number | null

Penalizes tokens based on whether they appear in the text so far. Positive values encourage the model to talk about new topics.

Required range: -2 <= x <= 2

Example:

0.3

enum<integer> | null

default:1

Number of chat completion choices to generate. Currently hardcoded to 1.

Available options:

1

Example:

1

eos_token_id

integer[] | null

List of token IDs that should trigger the end of generation. Maps to vLLM's stop_token_ids parameter.

Example:

[2, 50256]

stop

string[] | null

List of strings that should trigger the end of generation when encountered.

Example:

["\n", "###"]

Response

Successful response - returns a stream of chat completion chunks

Server-Sent Event stream format for chat completion chunks. Each chunk is prefixed with "data: " and followed by two newlines. The stream ends with "data: [DONE]".

string<uuid>

Unique identifier for this chat completion

Example:

"550e8400-e29b-41d4-a716-446655440000"

object

enum<string>

Object type, always "chat.completion.chunk" for streaming

Available options:

chat.completion.chunk

Example:

"chat.completion.chunk"

created

number

Unix timestamp when this chunk was created

Example:

1234567890.123

model

string

The model used for this completion

Example:

"llama-3.2-1b"

choices

object[]

Array of completion choices (currently always 1 choice)

Show child attributes

Introduction FAQ

⌘I

Getting started

Inference

API reference

Learn more

Chat Completions

Body

Response