Skip to main content
POST
/
vidu
/
ent
/
v2
/
img2video
curl --request POST \
  --url https://api.powertokens.ai/vidu/ent/v2/img2video \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "model": "viduq3-pro",
  "images": [
    "https://example.com/image.png"
  ],
  "prompt": "Animate the scene in the image"
}
'
{
  "task_id": "your_task_id_here",
  "state": "created",
  "model": "viduq3-pro",
  "images": [
    "https://prod-ss-images.s3.cn-northwest-1.amazonaws.com.cn/vidu-maas/template/image2video.png"
  ],
  "prompt": "The astronaut waved and the camera moved up.",
  "duration": 5,
  "seed": 123456789,
  "resolution": "1080p",
  "movement_amplitude": "auto",
  "payload": "",
  "off_peak": false,
  "credits": 10,
  "created_at": "2025-01-01T15:41:31.968916Z"
}

Authorizations

Authorization
string
header
required

Pass Authorization: Bearer <token> in the request header.

Body

application/json
model
enum<string>
required

Video generation model name.

  • viduq3-pro: Supports audio-video synchronization, supports storyboard video generation
Available options:
viduq3-pro
images
string[]
required

An image to be used as the start frame of the generated video.

  • Only accepts 1 image
  • Accepts public URL or Base64 format
  • Supported formats: png, jpeg, jpg, webp
  • The aspect ratio of the images must be between 1:4 and 4:1
  • Single image limited to 50MB
  • HTTP request body should not exceed 20MB
  • Base64 format example: data:image/png;base64,{base64_encode}
Required array length: 1 element
prompt
string

Text prompt for describing video generation content. Note: If the recommended prompt feature is enabled (is_rec=true), the model will ignore the manually entered prompt parameter.

Maximum string length: 5000
voice_id
string

Voice ID, used to determine the voice character in the video. Q3 series models are not effective. The system will automatically recommend a suitable voice, and you can also clone any voice through the Voice Cloning API.

is_rec
boolean
default:false

Whether to use the recommended prompt.

  • true: Yes, the system will automatically generate and apply a recommended prompt. Number of recommended prompts = 1
  • false: No, the video will be generated based on the user-provided prompt Note: When the recommended prompt feature is enabled, each task will consume an additional 10 credits.
bgm
boolean
default:false

Whether to add background music to the generated video. Ineffective for q3 model.

  • true: The system will automatically add a suitable BGM, no time limit, system automatically adapts
  • false: Do not add background music Note: BGM does not take effect when the duration of the q2 model is 9 or 10 seconds; BGM is not available in q3 models.
audio
boolean
default:false

Whether to use direct audio-video generation capability.

  • false: No direct audio-video generation, outputs silent video
  • true: Direct audio-video generation required, outputs video with sound (including dialogue and sound effects) Note 1: The voice_id parameter takes effect only when this parameter is true Note 2: When this parameter is true, staggered mode is not supported (q3 model supports staggered mode) Note 3: When model is q3, the default value for this parameter is true
audio_type
enum<string>
default:all

Audio type, required when audio is true.

  • all: Sound effects + vocals
  • speech_only: Vocal only
  • sound-effect_only: Sound effects only Note: Currently, this parameter only supports audio splitting for q2, q1, and 2.0 series models.
Available options:
all,
speech_only,
sound_effect_only
duration
integer

Video duration in seconds. Default values vary by model:

  • viduq3-pro-fast, viduq3-pro, viduq3-turbo: default 5, available: 1-16
seed
integer

Random seed.

  • Defaults to a random seed number
  • Manually set values will override the default random seed
resolution
enum<string>

Resolution (based on model & duration):

  • viduq3-pro-fast (1-16s): default 720p, options: 720p, 1080p
  • viduq3-pro, viduq3-turbo (1-16s): default 720p, options: 540p, 720p, 1080p
Available options:
540p,
720p,
1080p
movement_amplitude
enum<string>
default:auto

The movement amplitude of objects in the frame.

  • Default: auto
  • Accepted values: auto, small, medium, large Note: Modifying this parameter is ineffective for q2, q3 models.
Available options:
auto,
small,
medium,
large
payload
string

Transparent transmission parameters. No processing, only data transmission, with a maximum length of 1048576 characters.

Maximum string length: 1048576
off_peak
boolean
default:false

Off peak mode.

  • true: Off peak generate mode
  • false: Normal generate mode The off_peak mode consumes lower points (see Pricing for details). Tasks submitted in off peak mode will be generated within 48 hours. Tasks that are not completed will be automatically cancelled and their points will be refunded. We also support cancel off_peak tasks.
callback_url
string

Callback. When creating a task, you need to actively set the callback_url with a POST request. When the video generation task changes its status, Vidu will send a callback request to this URL, containing the latest status of the task. The structure of the callback request content will be the same as the return body of the Get Generation API.

The "status" in the callback response includes the following states:

  • processing: Task is being processed.
  • success: Task is completed (if sending fails, it will retry the callback three times).
  • failed: Task failed (if sending fails, it will retry the callback three times).

Response

Submission successful, returns a video task object.

task_id
string

Task ID.

state
enum<string>

Task processing state:

  • created: Task created successfully
  • queueing: Task in queue
  • processing: Task processing
  • success: Generation successful
  • failed: Task failed
Available options:
created,
queueing,
processing,
success,
failed
model
string

Model parameter used for this call.

prompt
string

Text prompt used for this call.

images
string[]

Images used for this call.

duration
integer

Video duration parameter used for this call.

audio
boolean

Whether to use audio for this call.

audio_type
string

Audio type parameter used for this call.

seed
integer

Random seed parameter used for this call.

resolution
string

Resolution parameter used for this call.

movement_amplitude
string

Camera movement amplitude parameter used for this call.

payload
string

Payload parameter used for this call.

off_peak
boolean

Off-peak mode parameter used for this call.

credits
integer

Credits consumed for this call.

created_at
string<date-time>

Task creation time.