Skip to main content
POST
/
v1
/
chat
/
completions
curl --request POST \
  --url https://api.powertokens.ai/v1/chat/completions \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "model": "glm-4.7-flash",
  "messages": [
    {
      "role": "system",
      "content": "You are a concise and professional assistant."
    },
    {
      "role": "user",
      "content": "Describe vector databases in three sentences."
    }
  ],
  "temperature": 0.7,
  "max_tokens": 1024
}
'
{
  "id": "chatcmpl_zhipu_123",
  "object": "chat.completion",
  "created": 1775174400,
  "model": "glm-5",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "A vector database is a data system designed to store and retrieve vector representations, commonly used for semantic search, recommendations, and RAG.",
        "reasoning_content": "Start with the definition, then add typical use cases."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 42,
    "completion_tokens": 31,
    "total_tokens": 73,
    "prompt_tokens_details": {
      "cached_tokens": 0
    }
  }
}

Authorizations

Authorization
string
header
required

Pass Authorization: Bearer <token> in the request header.

Body

application/json

Zhipu Chat Completions request body.

model
enum<string>
required

Model name. Supported model: glm-4.7-flash.

Available options:
glm-4.7-flash
Example:

"glm-5-turbo"

messages
object[]
required

Message list. Supports plain text messages and image input via image_url content parts. The request must contain at least one message that is not system or assistant; sending only system or only assistant messages will be rejected by the upstream provider.

Minimum array length: 1
stream
boolean
default:true

Whether to enable streaming output. When true, the response is returned as text/event-stream.

thinking
object

Thinking mode configuration. Applicable to models that support the thinking parameter.

temperature
number
default:1

Sampling temperature, range [0, 1].

Required range: 0 <= x <= 1
top_p
number
default:0.95

Nucleus sampling probability threshold, range [0.01, 1].

Required range: 0.01 <= x <= 1
max_tokens
integer

Maximum number of output tokens.

Required range: 1 <= x <= 131072
stop
string[]

Stop sequence list. Currently only a single stop sequence is supported.

Maximum array length: 1
tools
object[]

Tool definition list. Currently only function type tools are supported.

Maximum array length: 128
tool_choice
enum<string>
default:auto

Tool selection strategy. The Zhipu API currently only exposes auto.

Available options:
auto

Response

Success. Non-streaming mode returns JSON; streaming mode returns an SSE event stream.

Non-streaming Chat Completions response.

id
string

Response ID.

object
string

Object type.

Example:

"chat.completion"

created
integer<int64>

Unix timestamp (seconds).

model
string

The model name actually used.

choices
object[]

Candidate results returned by the model.

usage
object

Token usage statistics.