curl --request POST \
  --url https://api.powertokens.ai/v1/chat/completions \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "model": "deepseek-v3-2-251201",
  "messages": [
    {
      "role": "system",
      "content": "You are a concise and accurate assistant."
    },
    {
      "role": "user",
      "content": "Summarize the core RAG workflow in three sentences."
    }
  ],
  "thinking": {
    "type": "enabled"
  },
  "reasoning_effort": "medium",
  "temperature": 0.3,
  "max_completion_tokens": 1024,
  "stream": false
}
'

{
  "id": "chatcmpl_bp_123",
  "object": "chat.completion",
  "created": 1742342400,
  "model": "deepseek-v3-2-251201",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The core RAG workflow typically consists of three steps: retrieval, augmentation, and generation. The system first converts the user query into a vector and retrieves relevant knowledge snippets, then concatenates the retrieved results with the original query into the prompt. Finally, the model generates an answer based on the augmented context, reducing hallucinations and improving factual accuracy."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 123,
    "completion_tokens": 98,
    "total_tokens": 221
  }
}

{
  "error": {
    "message": "messages is required",
    "type": "invalid_request_error",
    "param": "messages",
    "code": "bad_request_body"
  }
}

{
  "error": {
    "message": "Invalid token",
    "type": "api_error",
    "code": "access_denied",
    "param": ""
  }
}

{
  "error": {
    "message": "Insufficient account quota, please retry later",
    "type": "api_error",
    "code": "insufficient_user_quota",
    "param": ""
  }
}

{
  "error": {
    "message": "Internal processing failed, please retry later",
    "type": "api_error",
    "code": "gen_relay_info_failed",
    "param": ""
  }
}

BytePlus

deepseek-v3-2-251201 Chat Completions

Invoke BytePlus chat completions capability.

Supported models include deepseek-v3-2-251201.

Documented fields: model, messages, thinking, stream, stream_options.include_usage, stream_options.chunk_include_usage, max_tokens, max_completion_tokens, stop, reasoning_effort, response_format, frequency_penalty, presence_penalty, temperature, top_p, logprobs, top_logprobs, tools, parallel_tool_calls, tool_choice. assistant messages support reasoning_content and tool_calls.

POST

chat

completions

curl --request POST \
  --url https://api.powertokens.ai/v1/chat/completions \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "model": "deepseek-v3-2-251201",
  "messages": [
    {
      "role": "system",
      "content": "You are a concise and accurate assistant."
    },
    {
      "role": "user",
      "content": "Summarize the core RAG workflow in three sentences."
    }
  ],
  "thinking": {
    "type": "enabled"
  },
  "reasoning_effort": "medium",
  "temperature": 0.3,
  "max_completion_tokens": 1024,
  "stream": false
}
'

{
  "id": "chatcmpl_bp_123",
  "object": "chat.completion",
  "created": 1742342400,
  "model": "deepseek-v3-2-251201",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The core RAG workflow typically consists of three steps: retrieval, augmentation, and generation. The system first converts the user query into a vector and retrieves relevant knowledge snippets, then concatenates the retrieved results with the original query into the prompt. Finally, the model generates an answer based on the augmented context, reducing hallucinations and improving factual accuracy."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 123,
    "completion_tokens": 98,
    "total_tokens": 221
  }
}

{
  "error": {
    "message": "messages is required",
    "type": "invalid_request_error",
    "param": "messages",
    "code": "bad_request_body"
  }
}

{
  "error": {
    "message": "Invalid token",
    "type": "api_error",
    "code": "access_denied",
    "param": ""
  }
}

{
  "error": {
    "message": "Insufficient account quota, please retry later",
    "type": "api_error",
    "code": "insufficient_user_quota",
    "param": ""
  }
}

{
  "error": {
    "message": "Internal processing failed, please retry later",
    "type": "api_error",
    "code": "gen_relay_info_failed",
    "param": ""
  }
}

Authorizations

Authorization

string

header

required

Pass Authorization: Bearer <token> in the request header.

Body

application/json

BytePlus chat completions request body.

model

enum<string>

required

Model name. Supported BytePlus chat models include deepseek-v3-2-251201

Available options:

deepseek-v3-2-251201

Example:

"deepseek-v3-2-251201"

messages

object[]

required

Message list. This document covers four roles: system, user, assistant, and tool.

Minimum array length: 1

Show child attributes

thinking

object

Controls whether the model enables deep thinking mode.

Show child attributes

stream

boolean

default:true

Whether to enable streaming output. When true, the response content type is text/event-stream.

stream_options

object

Additional options for streaming responses. Only effective when stream=true.

Show child attributes

max_tokens

integer

Maximum length of the model response (excluding chain-of-thought), in tokens.

max_completion_tokens

integer

Maximum total output length (including response and chain-of-thought), in tokens. When set, max_tokens is ignored.

stop

Stop sequence. Can be a single string or an array of strings.

reasoning_effort

enum<string>

Limit reasoning effort. minimal is fastest, high is deepest.

Available options:

minimal,

low,

medium,

high

response_format

object

Controls the response format (Beta).

Show child attributes

frequency_penalty

number

default:0

Frequency penalty. Higher values suppress repetitive expressions.

Required range: -2 <= x <= 2

presence_penalty

number

default:0

Presence penalty. Higher values encourage the model to introduce new topics.

Required range: -2 <= x <= 2

temperature

number

default:0.7

Sampling temperature. Lower values produce more deterministic output; higher values produce more diverse output.

Required range: 0 <= x <= 2

top_p

number

default:0.95

Nucleus sampling parameter. Typically tuned as an alternative to temperature.

Required range: 0 <= x <= 1

logprobs

boolean

Whether to return log probabilities of output tokens.

top_logprobs

integer

When logprobs=true, specifies how many candidate token log probabilities to return at each position.

Required range: 0 <= x <= 20

tools

object[]

List of function tool definitions available for the model to call.

Show child attributes

parallel_tool_calls

boolean

Whether to allow the model to issue multiple tool calls in parallel.

tool_choice

Tool calling strategy. Can be a string mode or an object specifying a particular function.

Available options:

none,

auto,

required

Response

Success. Returns JSON in non-streaming mode; returns an SSE event stream in streaming mode.

BytePlus chat completions success response.

string

Response ID.

object

string

Object type, e.g. chat.completion.

created

integer

Unix timestamp.

model

string

Actual model name used.

choices

object[]

Show child attributes

usage

object

Token usage statistics.

Show child attributes

qwen3-max Chat Completions

seed-1-6-250915 Chat Completions

Chat Models

Image Models

Video Models

Audio Models

asset library

deepseek-v3-2-251201 Chat Completions

Authorizations

Body

Response