curl --request POST \
--url https://api.powertokens.ai/v1/chat/completions \
--header 'Authorization: Bearer <token>' \
--header 'Content-Type: application/json' \
--data '
{
"model": "glm-5-turbo",
"messages": [
{
"role": "system",
"content": "You are a concise and professional assistant."
},
{
"role": "user",
"content": "Explain vector databases in three sentences."
}
],
"temperature": 0.7,
"max_tokens": 1024
}
'{
"id": "chatcmpl_zhipu_123",
"object": "chat.completion",
"created": 1775174400,
"model": "glm-5",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "A vector database is a system designed to store and retrieve vector embeddings, commonly used for semantic search, recommendations, and RAG.",
"reasoning_content": "Define the concept first, then add the main use cases."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 42,
"completion_tokens": 31,
"total_tokens": 73,
"prompt_tokens_details": {
"cached_tokens": 0
}
}
}Unified public entry point for Zhipu GLM chat models.
Supported models include glm-5-turbo, glm-5, glm-4.7, glm-4.7-flash, and glm-4.5-air. The same endpoint supports both text-only chat and image understanding requests that include image_url content parts.
This document exposes the following fields: model, messages, stream, thinking, temperature, top_p, max_tokens, stop, tools, and tool_choice.
curl --request POST \
--url https://api.powertokens.ai/v1/chat/completions \
--header 'Authorization: Bearer <token>' \
--header 'Content-Type: application/json' \
--data '
{
"model": "glm-5-turbo",
"messages": [
{
"role": "system",
"content": "You are a concise and professional assistant."
},
{
"role": "user",
"content": "Explain vector databases in three sentences."
}
],
"temperature": 0.7,
"max_tokens": 1024
}
'{
"id": "chatcmpl_zhipu_123",
"object": "chat.completion",
"created": 1775174400,
"model": "glm-5",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "A vector database is a system designed to store and retrieve vector embeddings, commonly used for semantic search, recommendations, and RAG.",
"reasoning_content": "Define the concept first, then add the main use cases."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 42,
"completion_tokens": 31,
"total_tokens": 73,
"prompt_tokens_details": {
"cached_tokens": 0
}
}
}Send Authorization: Bearer <token> in the request headers.
Request body for Zhipu chat completions.
Model name. Supported models include glm-5-turbo, glm-5, glm-4.7, glm-4.7-flash, and glm-4.5-air.
glm-5-turbo, glm-5, glm-4.7, glm-4.7-flash, glm-4.5-air "glm-5-turbo"
Message list. Supports plain text messages and image input through image_url content parts. The request must include at least one message that is not system or assistant; requests made up only of system messages or only of assistant messages are rejected upstream.
1Show child attributes
Whether to enable streaming output. When true, the response content type is text/event-stream.
Thinking mode configuration. Applies to models that support the thinking parameter.
Show child attributes
Sampling temperature, in the range [0, 1].
0 <= x <= 1Nucleus sampling threshold, in the range [0.01, 1].
0.01 <= x <= 1Maximum number of output tokens.
x >= 1List of stop sequences. Only a single stop sequence is currently supported.
1Tool definitions. Only the function tool shape is currently supported.
128Show child attributes
Tool selection strategy. This Zhipu interface currently exposes only auto.
auto Success. Non-streaming mode returns JSON, while streaming mode returns an SSE event stream.
Non-streaming chat completion response.