Skip to main content
POST
/
vidu
/
ent
/
v2
/
reference2video
curl --request POST \
  --url https://api.powertokens.ai/vidu/ent/v2/reference2video \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "model": "viduq3",
  "subjects": [
    {
      "name": "XiaoMing",
      "images": [
        "https://example.com/xiaoming.png"
      ]
    },
    {
      "name": "XiaoHong",
      "images": [
        "https://example.com/xiaohong.png"
      ]
    }
  ],
  "prompt": "@XiaoMing and @XiaoHong are cooking together, and both say they love hot pot.",
  "duration": 5
}
'
{
  "task_id": "your_task_id_here",
  "state": "created",
  "model": "viduq3",
  "prompt": "@1 and @2 are cooking together, and both say they love hot pot.",
  "duration": 8,
  "seed": 123456789,
  "resolution": "1080p",
  "movement_amplitude": "auto",
  "payload": "",
  "off_peak": false,
  "credits": "15",
  "created_at": "2025-01-01T15:41:31.968916Z"
}

Authorizations

Authorization
string
header
required

Pass Authorization: Bearer <token> in the request header.

Body

application/json
model
enum<string>
required

Video generation model name.

  • viduq3: Supports intelligent camera switching, supports simultaneous audio and video output, and achieves superior consistency across multiple camera positions
Available options:
viduq3
subjects
object[]
required

List of subjects.

  • When using q3, q2, q1, and 2.0 series models, only image and text subjects can be used, with a maximum of 7 subjects
  • When using the q2-pro model, video subjects, text subjects, and image subjects can be employed, with no more than 4 image or text subjects, and no more than 2 video subjects (or 1 temporary video subject)
  • viduq3-mix does not support the use of entities for the time being
Required array length: 1 - 7 elements
prompt
string
required

Text prompt for describing video generation content, with a maximum length of 5000 characters. You can reference subject content using @subject_name. For example: @XiaoMing and @XiaoHong are cooking together, and both say they love hot pot.

Maximum string length: 5000
auto_subjects
boolean
default:false

Whether to use the intelligent entity library capability.

  • true: Utilize the capabilities of the intelligent agent library
  • false: Do not use the intelligent agent library capability
voice_id
string

Voice ID, used to determine the voice character in the video. The system will automatically recommend a suitable voice, and you can also clone any voice through the Voice Cloning API. The voice_id parameters can be used interchangeably.

audio
boolean
default:false

Whether to enable audio-video output capability.

  • false: Outputs silent video
  • true: Outputs video with generated speech and background music based on the prompt Note: audio defaults to true for viduq3 and viduq3-turbo, and false for other models
audio_type
enum<string>
default:all

Audio type, required when audio is true.

  • all: Sound effects + vocals
  • speech_only: Vocal only
  • sound-effect_only: Sound effects only
Available options:
all,
speech_only,
sound_effect_only
duration
integer

Video duration in seconds. Default values vary by model:

  • viduq3, viduq3-turbo: default 5, available: 3-16
  • viduq2: default 5, available: 1-10
  • viduq1: default 5, available: 5
  • vidu2.0: default 4, available: 4
seed
integer

Random seed.

  • Defaults to a random seed number
  • Manually set values will override the default random seed
aspect_ratio
enum<string>
default:16:9

The aspect ratio of the output video.

  • Default: 16:9
  • Accepted values: 16:9, 9:16, 1:1 Note: q2, q3 models support any aspect ratio
Available options:
16:9,
9:16,
1:1
resolution
enum<string>

Resolution (based on model & duration):

  • viduq3, viduq3-turbo (3-16s): default 720p, options: 540p, 720p, 1080p
  • viduq2 (1-10s): default 720p, options: 540p, 720p, 1080p
  • viduq1 (5s): default 1080p, options: 1080p
  • vidu2.0 (4s): default 360p, options: 360p, 720p
Available options:
360p,
540p,
720p,
1080p
movement_amplitude
enum<string>
default:auto

The movement amplitude of objects in the frame.

  • Default: auto
  • Accepted values: auto, small, medium, large Note: Modifying this parameter is ineffective for q2, q3 models.
Available options:
auto,
small,
medium,
large
payload
string

Transparent transmission parameters. No processing, only data transmission, with a maximum length of 1048576 characters.

Maximum string length: 1048576
off_peak
boolean
default:false

Off peak mode.

  • true: Off peak generate mode
  • false: Normal generate mode Notes:
  1. The off_peak mode consumes lower points (see Pricing for details)
  2. Tasks submitted in off peak mode will be generated within 48 hours. Tasks that are not completed will be automatically cancelled and their points will be refunded
  3. We also support cancel off_peak tasks
  4. The q3 model supports peak-shifting tasks when the audio parameter is true; the q2, q1, and 2.0 series models support peak-shifting tasks when the audio parameter is false
callback_url
string

Callback When creating a task, you need to actively set the callback_url with a POST request. When the video generation task changes its status, Vidu will send a callback request to this URL, containing the latest status of the task. The structure of the callback request content will be the same as the return body of the GET Generation API. The "status" in the callback response includes the following states:

  • processing: Task is being processed.
  • success: Task is completed (if sending fails, it will retry the callback three times).
  • failed: Task failed (if sending fails, it will retry the callback three times). Vidu uses a callback signature algorithm for verification, check out the details here: Callback Signature

Response

Submission successful, returns a video task object.

task_id
string

Task ID.

state
enum<string>

Task processing state:

  • created: Task created successfully
  • queueing: Task in queue
  • processing: Task processing
  • success: Generation successful
  • failed: Task failed
Available options:
created,
queueing,
processing,
success,
failed
model
string

Model parameter used for this call.

prompt
string

Text prompt used for this call.

images
string[]

Images used for this call.

duration
integer

Video duration parameter used for this call.

seed
integer

Random seed parameter used for this call.

aspect_ratio
string

Aspect ratio parameter used for this call.

resolution
string

Resolution parameter used for this call.

bgm
boolean

Whether to use background music for this call.

audio
boolean

Whether to use audio for this call.

audio_type
string

Audio type parameter used for this call.

movement_amplitude
string

Camera movement amplitude parameter used for this call.

payload
string

Payload parameter used for this call.

off_peak
boolean

Off-peak mode parameter used for this call.

credits
integer

Credits consumed for this call.

created_at
string<date-time>

Task creation time.