Image-to-video (图生视频) prompt builder: first frame, desired motion, camera movement, amplitude. For Jimeng, Kling, Hailuo. In your browser.
Image-to-Video Prompt Builder
Start from a still image and assemble a clean, structured image-to-video (图生视频) prompt from a simple form — what is in the first frame, the motion you want, camera movement, motion strength, duration and style consistency — then copy it, with your image, straight into Jimeng (即梦), Kling (可灵) or Hailuo (海螺). Everything is built in your browser; nothing is sent to a server and no model is called.
Tip: this builder only assembles text. Copy the result — with your first-frame image — into 即梦 / 可灵 / 海螺 yourself. No model is called and nothing is sent anywhere.
How the image-to-video prompt builder works
Describe the first frame
In the first box, briefly describe what is in your still image: who the subject is, the scene, the lighting and the composition. This line helps the video model understand the starting point and align with the first frame you upload. Keep it brief — save the detail for the motion you want.
Describe the desired motion, not the static frame
The key to image-to-video is describing the action and change, not re-describing the picture. Fill in the subject's motion ("slowly turns and smiles", "hair drifting in a light breeze") and what should change on screen. The model already sees your image — your job is to tell it how to bring it to life.
Set camera movement, strength and duration
Specify the camera movement (push in, pull out, pan, tracking, orbit or a locked-off shot), the motion strength and speed (subtle, moderate, pronounced), and the clip duration. Keep the strength moderate — too much, too fast tends to warp or break the subject, the single most common image-to-video failure.
Copy into Jimeng / Kling / Hailuo
Click Copy and paste the assembled prompt — together with your first-frame image — into the image-to-video entry of Jimeng (即梦), Kling (可灵), Hailuo (海螺) or another video generator. Everything is assembled locally in your browser; your text is never uploaded and no model is called.
How the image-to-video prompt builder works
Describe the motion you want — not the picture the model already sees
Image-to-video — 图生视频 — is a different craft from writing a still-image prompt. You already have a picture you like, and your job is to bring it to life without breaking what makes it good. The crucial mental shift is this: the model can already see your first frame, so re-describing the static composition in detail is mostly wasted effort. What it cannot see is the future — how the subject should move, where the camera should travel, how fast, and what must stay the same. This builder keeps that structure for you. You give a brief line on what is in the frame, then spend your words where they actually matter: the desired subject action, the camera movement, the motion strength, the duration, the consistency requirements, and a short negative list of artefacts to avoid. The result is the kind of prompt an experienced motion artist would assemble by hand, only built in seconds and portable across Jimeng, Kling and Hailuo.
The single most valuable instruction is the desired motion. "Slowly turns her head and smiles, hair drifting in a light breeze" tells the model exactly what to animate, far more reliably than a paragraph of mood adjectives. After the motion comes the camera: push in, pull out, pan, orbit, or a locked-off shot. Naming the camera move explicitly is far more controllable than leaving it to chance, and a locked-off camera paired with gentle subject motion is one of the most reliable starting points. Duration then sets the pacing — most clips run only a few seconds, so a single clear action usually beats trying to choreograph three. Treat each field as concrete and specific: instead of "make it dynamic", say "subtle, natural breathing motion; keep the face and hands stable".
"In image-to-video, the model already has the picture. Your prompt is not a description — it is a set of directions for how it should move."
Moderate amplitude and a steady camera are what keep the subject intact
The field that quietly saves the most footage is motion strength. Image-to-video models are sensitive to dramatic movement, and the moment you push the amplitude too high, faces stretch, hands grow extra fingers, and objects warp. Keeping the strength moderate — and reserving the bigger amplitudes for things that move naturally, like wind, water and smoke — is the cheapest way to avoid the most common failures. Pair that with a short negative prompt ("warping, distortion, flicker, extra limbs") and a clear consistency request ("keep the character's face, clothing and the first frame's lighting"), and the subject is far more likely to survive the clip looking like itself. None of this limits the model; it focuses it on a motion it can actually render cleanly.
Because the output is structured plain text, the same prompt is portable across every major Chinese video model, and the same principles carry to other image-to-video tools too. Write it in Chinese when you are working with Chinese models — the structure travels regardless of language. And because the whole tool runs locally in your browser, you can iterate freely: tweak the motion line, copy again, regenerate, and compare — without your text ever leaving your device or any model being called. Treat the first clip as a draft. If the subject drifts, dial the strength down; if it feels static, nudge one specific action up; if the camera fights the subject, lock the camera and let the subject move alone. Two or three rounds of that disciplined adjustment usually turn a warping, jittery first attempt into a clean, believable shot — and you keep a tidy, reusable prompt at the end.
About Image-to-Video Prompting — 10 Key Points
Image-to-video (图生视频) animates a still image into a clip — so the heart of the prompt is describing the change and motion you want, not re-describing the picture.
The model already sees the first frame you upload, so over-describing the static composition adds little; spending words on motion, camera and change pays off more.
Keep motion strength moderate: too much, too fast is the most common source of image-to-video breakage, warping faces, hands and object structure.
Camera movement (push, pull, pan, tracking, orbit, locked-off) is its own description axis — stating it explicitly is more controllable than leaving it to the model.
One clear subject action ("slowly turns her head", "blinks and smiles") is usually more effective than a long string of vague adjectives.
Setting a duration helps the model pace the action; most image-to-video clips run a few seconds, so do not cram too much motion in.
Style / consistency requests (keep the first frame's look, lighting and character appearance) reduce the subject "changing face" or drifting mid-motion.
The same structured approach works across Jimeng, Kling and Hailuo, because a prompt is just well-structured plain text.
A negative prompt (elements or artefacts to avoid — warping, flicker, extra limbs) helps you dodge common image-to-video failures.
This tool assembles the prompt entirely in your browser — your image and text are never uploaded, no model is called, and nothing is stored.
Frequently Asked Questions
- No. It simply joins the fields you fill in into a structured image-to-video prompt using a fixed template, entirely in your browser. It does not call Jimeng, Kling, Hailuo or any video model, and does not go online. You copy the generated prompt — together with your first-frame image — into the tool of your choice.
- The big difference is that in image-to-video the model already sees your first frame, so you do not re-describe the static picture. Instead you focus on the change and motion you want — subject action, camera movement, motion strength and consistency. Text-to-video, by contrast, has to describe the whole scene from scratch.
- The most common cause is motion strength that is too large or too fast. Image-to-video is sensitive to dramatic movement, and big motion easily distorts faces, hands and object structure. Keep the action restrained and moderate, and add "warping, distortion, extra limbs" to the negative prompt.
- Jimeng (即梦), Kling (可灵) and Hailuo (海螺) all work, as do other image-to-video tools. Because the output is structured plain text, it is vendor-neutral — paste it into the image-to-video input box of the tool you use.
- No. Empty fields are omitted automatically. The desired subject motion alone gives you a usable prompt; adding camera movement, motion strength and consistency requirements makes the result more controllable and stable.
- Use ordinary camera language: push in, pull out, pan left/right, tracking, orbit, or a locked-off shot. Stating the camera move explicitly is more controllable than leaving it to the model; when unsure, a locked-off shot with subtle subject motion is usually the safest.
- This tool never touches your image and uploads no text — all assembly happens locally in your browser with plain JavaScript. Your image is only handled by the video tool itself, at the moment you submit the prompt and image to it yourself.
- "Moderate" is the safest in most cases. Subtle strength suits portraits and products that must stay consistent; pronounced strength suits things that naturally move — wind, water, smoke. The bigger the motion, the more likely it breaks, so err on the conservative side and iterate.
- Yes, but build up gradually. Get either the subject or the camera moving first, confirm it is stable, then layer the other in. Giving both a large camera move and large subject motion at once is the most likely to cause breakage.
- Completely free, with no account or sign-up and no usage limit. It runs in your browser and collects no data.
Related News
You may be interested in these recent stories from our newsroom.
No related news yet for this tool. Our editorial team publishes new pieces every week.
Browse all news →75 more free tools
Calculators, converters, security tools — no signup.