Clip Long Videos into Captioned Vertical Shorts (2026)

Clipping a long video into shorts is grunt work. You scrub a timeline for the good moments, crop each one to vertical, type out the captions, and export, again and again, or you pay an editor to do it for you. It is exactly the kind of manual, repetitive marketing job you should not be doing by hand anymore. That is the whole reason Wonda exists: the things you used to do manually, you now hand to Claude Code.
You describe what you want in plain English, say, the ten most controversial moments from this podcast with captions on, and the agent reads Wonda's manual, runs the job, and drops the finished vertical clips in a folder. No dashboard, no timeline, no flags to learn. The commands in this post are what the agent runs for you, not steps you follow.
And it does the job the way a good editor would, by picking clips on what was actually said, not on where the camera cut. The moments that travel have a hook, a payoff, and one quotable line, and all of that lives in the transcript, not the camera track. A model reading the words finds them; a scene-cut tool cannot. It matters because the platforms rank on whether people finish a video, well above your follower count (TikTok Newsroom, 2024).
The payoff is volume you could never reach by hand. Marketers rate short-form the highest-ROI format they make (HubSpot, 2025), and YouTube alone now has more than a billion monthly podcast viewers (TechCrunch, 2025), hours of spoken source sitting un-clipped. An agent can work through all of it while you sleep. It is the same control-layer pattern as letting Claude Code run Wonda, pointed at video, and it is how you put TikTok on autopilot.
Here is exactly what comes out. One wonda clipping command turned an 18-minute press conference into this vertical, captioned short: the moment the England manager gets pressed on the world-class players left out of his squad. No timeline scrubbing, no manual captioning.
The same 30-second moment in both reframe modes: you ask for the one you want in plain English, and the agent runs it.
Key Takeaways
- One
wonda clippingcommand cuts one long video (podcast, interview, webinar) into many vertical clips in a single job.- It picks by what is actually said, not where the camera cut, and a plain-English
--briefsteers it toward the angle you want.- Animated word-by-word captions are on by default, with three presets (white, black, red); default is TikTok Red Captions.
- Clipping is CLI-only and flag-gated. Claude Code and Hermes Agent shell out to the binary; there is no clipping tool in the MCP server.
TL;DR:
wonda clippingtakes one longform video and cuts it into many short vertical clips in a single job. It picks the moments by what is actually said, not where the camera cut, and a plain-English brief steers it toward the angle you want. Output reframes to 9:16 with face tracking, and word-by-word captions are on by default. You drive it from Claude Code in plain English: the agent runs the command and downloads the clips plus aplan.jsonthat scores and explains each pick. Clipping is rolling out per account, so it may need enabling.
What actually makes a clip worth posting
A clip earns a slot on the feed when three things line up: it hooks in the first seconds, it pays that hook off before the viewer swipes, and it leaves them with one line worth repeating. Those are not stylistic preferences. Each maps to a measured retention behavior, and each lives in the language of the moment, not its frames.
Hook is the three-second gate. TikTok reports that 90% of ad recall lands in the first six seconds, and 63% of its highest-CTR videos hook within the first three (TikTok for Business, 2024; Stackmatix on TikTok creative data, 2024). Cross that gate and the compounding is steep: clips holding 85%+ of viewers past three seconds pull roughly 2.8x more total views (TTS Vibes, 2025). A scene-cut clipper has no idea whether the opening words are a hook or a throat-clear.
Payoff is the ranking signal itself. Watch-time is what the algorithm optimizes: TikTok weights finishing a video as a strong indicator, and YouTube Shorts ranks on average view duration and percent viewed (viewed versus swiped) (Socialinsider, 2025). The mechanism underneath is the curiosity gap. A hook opens a loop, and an open loop the viewer expects to close is what holds attention (Loewenstein's information-gap theory, plus the Zeigarnik effect). If the loop never resolves, they bail (Growth Engineering on Loewenstein, 2023). A clip that opens a question and answers it is a complete unit; one that cuts before the answer is a tease that gets punished.
Quotability is what makes it travel. High-arousal emotion drives sharing. Berger and Milkman's study of nearly 7,000 New York Times articles found content that evokes awe, anger, or anxiety is significantly more likely to be shared (Journal of Marketing Research, 2012). One standalone line a viewer can repeat is the carrier.
Wonda picks for exactly these three. Length, the dimension everyone fixates on, is a tradeoff, not a target. The sweet spots are real, 15 to 60 seconds on TikTok and under 15 on Reels and X (Sprout Social, 2024), with 21 to 34 seconds posting the highest completion, but a longer clip can out-distribute a shorter one on total watch time (Shortimize, 2025). The implication is precise: cut to the natural boundary of a complete thought, hook through payoff, not to a fixed duration. That is exactly why a model reading the transcript beats a fixed-duration chopper. It knows where the thought ends.
None of this is new craft. It is the judgment a good editor makes by ear, and the reason it used to take an afternoon. The shift is that the agent now applies it to every window of every recording, so the work you used to do clip by clip happens once, in one command.
Why drive video clipping from Claude Code instead of a dashboard?
Clipping is a textbook agent task. You describe the outcome ("ten controversial clips from this podcast, captions on") and the agent reads Wonda's auto-synced skill manual, picks the flags, runs the job, and downloads the clips. It is the same loop Claude Code already runs for generate and publish, and it matters more as agent use climbs: 63% of video marketers already use AI tools to create or edit video (Sprout Social, 2026).
One honest constraint: an agent that only has the MCP server cannot clip. It needs shell access to the wonda binary. The same shell-out works from Claude Code for Instagram and from an always-on Hermes daemon. The interface choice matters because a dashboard cannot hand a machine-readable plan back to your agent, and clipping produces exactly that.
| Capability | Agent + CLI clipping | GUI clipper dashboard |
|---|---|---|
| Selects clips by | What was actually said | Visual scene-cut (camera changes) |
| Steerable | Plain-English --brief reshapes the ranking | Fixed presets and sliders |
| Machine-readable plan | plan.json per clip (score, rationale) | None an agent can read |
| Captions | On by default, word by word | Manual toggle or upsell |
| Scriptable end to end | Yes (one command, chained) | No (human clicks each job) |
Two ways to pick the clips
Wonda picks by what is actually said in the video, not where the camera cut, so the clips open on real moments instead of arbitrary jumps. You point it one of two ways:
- Let it find them. Run it with no brief and it surfaces the most postable moments on its own: the ones with a hook, a payoff, and a line worth repeating.
- Tell it what you want. Add a plain-English brief like
--brief "the most controversial moments"and it picks clips that match your angle instead. The brief is not a filter applied after the fact, it changes what the tool goes looking for from the start.
A scene-cut clipper cannot do either, it just cuts where the camera cut. A typical dashboard clipper gives you one automatic pass you cannot steer without scrubbing the timeline yourself. With Wonda you either trust the auto picks or hand it your angle in a sentence.
You can see the picks before paying to render them. Ask the agent for a plan-only pass ("just show me the picks and why, do not render yet"), and it runs:
wonda clipping --media <uuid> --brief "controversial moments" --dry-run --waitEach picked clip is written to plan.json as an object like this:
{
"start": 12.4,
"end": 38.7,
"title": "Why he quit the agency",
"hookText": "He admits...",
"rationale": "Concedes \"the agency model is dead\" then explains why...",
"score": 87,
"dominantSpeaker": "SPEAKER_00"
}A score next to a written rationale is the unit a GUI dashboard cannot hand your agent: you curate from reasoning instead of scrubbing, and you can see why each clip was chosen.
How Wonda compares to Opus Clip, Riverside, and Klap
Picking clips by what was said is the right idea, and the good tools all do a version of it. Opus Clip assigns a Virality Score from 0 to 99 (Opus Clip Help Center, 2026). Klap leans on speech detection to find moments (Klap, 2026). Riverside's Magic Clips weighs sentiment and speaker energy (Riverside, 2026). Wonda is in the same camp, so this is not where it stands apart.
The difference is the surface, not the science. Those tools are single-mode web dashboards. You log in, upload, wait, and download a finished video. You do not get a documented, machine-readable per-clip plan; reviewers note that with Klap you get the final video, not an editable project file (Dupple, 2026). They are good dashboards for a human in a browser. Wonda's difference is the agent control surface: a CLI binary, a plan.json your loop can consume, and two-mode --brief steering, so an agent picks, filters by score, and chains the result without a human clicking through. If you want the full field, the AI marketing CLI landscape covers it.
What clipping actually costs
The dashboards are monthly subscriptions with upload and clip caps. OpusClip runs $15 to $29 a month for 150 to 300 input minutes (OpusClip, 2026). Klap is $29 to $151 a month for 10 to 100 uploads (Klap, 2026). Vizard sits around $15 to $20 (Vizard, 2026), and Riverside's Magic Clips ride a $19 to $29 plan (Riverside, 2026). You pay the seat every month whether you clip one video or none, and you hit a ceiling if you clip too many.
Wonda has no subscription. The clip selection runs once per video and is the only real cost; the render is free because it happens on your machine. A real run measured about $2 to clip a video, almost all of it that one-time selection, and it barely grows with length, the selection runs a single time and only the cheap transcription scales, so a one-hour podcast costs about the same as a short one. Every clip you pull from that video shares the one cost, and you pay only for the videos you actually clip.
To put that per clip against the subscriptions, assume 30-second clips and a typical month: thirty clips from three hour-long videos (about ten clips each). On Wonda that is three $2 jobs, around 20 cents a clip. The subscriptions bill their flat monthly fee for the same thirty clips:
| Tool | Cost for ~30 clips/mo | Per 30-second clip |
|---|---|---|
| OpusClip | $29/mo (Pro) | ~$0.97 |
| Klap | $29/mo (Standard) | ~$0.97 |
| Vizard | ~$20/mo | ~$0.67 |
| Riverside | $19/mo (Standard) | ~$0.63 |
| Wonda | ~$6, pay-per-use | ~$0.20 |
So, cheaper per clip? At this volume, by three to five times, and the gap only widens the less you clip, because a subscription bills the same flat fee for thirty clips or three, while Wonda bills only for the videos you touch. The big plans undercut it per clip only if you max the cap every month: Klap's $151 tier gets to about 15 cents a clip at a thousand clips, but you have to live on that plan and use all of it. Wonda has no cap and nothing to pay in a slow month.
Human clipping vs an agent
Be fair about where each wins. A skilled human editor wins on taste, brand fit, legal and PR risk, and context the transcript cannot carry. They know which concession will read as bold and which will read as a lawsuit, and which guest will be upset to see a clip out of context. The agent has none of that judgment.
The agent wins on everything mechanical. Speed: minutes, not an afternoon. Marginal cost: near zero, so clipping the entire back-catalog costs about what clipping one episode costs. Consistency: clip number 40 gets the same scoring as clip number one, with no fatigue. Scriptability and uptime: it runs at 3 a.m. on a schedule.
The contrast is sharpest on money and time. A captioned one-minute short runs roughly three to four hours of edit time (Offshore Clipping, 2026), and captioning alone is five to ten minutes of work per minute of video (3Play Media, 2025). Freelance short-form runs about $50 to $400 per clip at $30 to $80 an hour for mid-level editors (Twine, 2025). Twelve clips a week is a real budget line and a real bottleneck.
The synthesis is a division of labor. The agent does the mechanical 80%: find, cut, reframe, caption, score. The human reads the plan.json rationales and curates the top few, approving from written reasoning instead of scrubbing timelines. That is the move, not "fire the editor."
What you need before your first clip
Setup is a one-time thing you hand to the agent, not a checklist you work through. You tell Claude Code to get Wonda ready, and it installs the binary, logs you in, and warms the local renderer.
"Install Wonda, log me in, and warm the caption renderer."
Claude Code runs:
curl -fsSL https://wonda.sh/install.sh | bash
wonda auth login
wonda doctor --warm-chromeIt picked --warm-chrome because captions render locally through a bundled Chromium, and that step prefetches it (~150 MB on first fetch) so your first clip does not pause to download it. The install also works through npm i -g @degausai/wonda; the source lives at github.com/degausai/wonda.
Two things the agent cannot do for you. First, the clippingEnabled flag has to be on for your account: a 404 from /clipping means it is off, and the feature is rolling out, so it may need enabling. Second, bring a source with speech (podcasts, interviews, webinars, talking-head video), because it works off what is said, and silent or music-only footage gives it nothing to pick from.
Time: about 5 minutes of setup, then a few minutes per clipping job. Difficulty: Intermediate. The full install and auth path is documented at wonda.sh/docs.
Step 1: Get your long video into Wonda
The goal of this step is just to point the job at a source. There are two ways in, and the agent picks based on what you hand it.
"Clip this YouTube interview into shorts: https://www.youtube.com/watch?v=ID"
Claude Code runs:
wonda clipping --url "https://www.youtube.com/watch?v=ID" --brief "the most controversial moments" --wait -o ./clips/It used --url because you gave it a link. That path is the fastest to try, but it is best-effort: it can fail from datacenter IPs or on private and age-gated videos. So when you hand the agent a file instead, it takes the reliable path: upload first, then clip with the returned UUID.
"Clip this recording I downloaded: ./source.mp4"
Claude Code runs:
wonda media upload /path/source.mp4
wonda clipping --media <uuid> --wait -o ./clips/wonda media upload returns a media UUID, which is exactly what --media takes. The agent chose --media over --url because a local upload does not depend on YouTube cooperating. One thing it knows to check: the source needs speech, because it works off what is said.
Step 2: Steer the picks with a plain-English brief
This is where the thesis turns into a result you can see. You hand it your angle in one sentence, and it picks from what was said and ranks by what you asked for, not by where the camera happened to cut.
"Find the six moments where he changes his mind, and rank by that."
Claude Code runs:
wonda clipping --media <uuid> --brief "the moment he changes his mind" --count 6 --wait -o ./clips/It passed your sentence straight through as --brief; you did not choose a mode or tune anything. The payoff is a ranked set where each clip carries its own score and rationale, so you see why it was chosen instead of trusting a black box. This is "in your voice, no slop" applied to clip picking.
Step 3: Reframe to vertical with face tracking
The outcome here is a clip that fills a phone screen and keeps the speaker in frame. You describe the kind of source, and the agent picks the reframe method; you do not choose between crop, track, and active-speaker yourself.
"Reframe these to vertical and keep the speaker centered. It's a talking-head podcast."
Claude Code runs:
wonda clipping --media <uuid> --aspect 9:16 --reframe track --subject personIt chose --reframe track and --subject person because you described one talking head. The default is a 9:16 crop that locks onto the active speaker and smooths the motion, so it follows the face without jitter as they move. Describe your source and the agent maps it to a mode:
| Reframe mode | Use when |
|---|---|
track | You want a vertical crop that follows the face as it moves. |
active-speaker | Multi-person interview; follow whoever is currently talking. |
crop | Simple center crop, subject stays roughly still. |
blur-fill | Keep the full landscape frame (slides, graphics) on a blurred backdrop. |
none | Leave the source aspect untouched. |
The two clips at the top of this post are exactly these modes on the same source: track follows the speaker's face in a 9:16 crop, while blur-fill keeps the full landscape frame on a blurred backdrop so nothing gets cropped. Reach for blur-fill on slides, screenshares, or any source where the edges carry information a vertical crop would throw away.
Step 4: Burn word-by-word captions (on by default)
You get captions whether or not you ask, because most mobile video is watched on mute (Digiday, 2024), and viewers are 80% more likely to finish a video when captions are available, with roughly 80% of caption users reporting no hearing impairment (Verizon Media and Publicis Media, 2019). The only thing you usually say is the color.
"Use red captions on those clips."
Claude Code runs:
wonda clipping --media <uuid> --brief "the most controversial moments" --caption-preset "TikTok Red Captions" --wait -o ./clips/It set --caption-preset "TikTok Red Captions", which is also the default if you say nothing (--captions defaults true). There are exactly three presets, and the agent maps "red," "white," or "black" to the right one. The animation is deliberately restrained: an opacity fade-in plus a highlight pill that hops to the spoken word, word by word. No bounce, wiggle, or scale-pop. New clipping jobs come out with this animated caption style by default.
| Preset | Look |
|---|---|
| TikTok White Captions | black text, white highlight pill on the active word |
| TikTok Black Captions | white text, black highlight pill on the active word |
| TikTok Red Captions (default) | white text, red (#E14135) highlight pill on the active word |
Want captions on a single video instead of a clip batch? Tell the agent to caption that one file, and it reaches for the standalone op, which auto-transcribes for word timing and preserves the original audio:
wonda edit video --operation animatedCaptions --media <id> --preset "TikTok Red Captions" --wait -o final.mp4And you are not locked to one typeface. The captions can render in any of a wide set of fonts built into Wonda, so the look matches the channel or the brand instead of every clip wearing the same default. You just name the font:
"Same red captions, but set them in Anton."
The bundled families cover TikTok Sans in six widths (Condensed through ExtraExpanded), plus Anton, Bebas Neue, Oswald, Montserrat, Poppins, League Spartan, Bungee Inline, Nohemi, Inter, Roboto, and the display faces Comic Cat and Gavency, all available out of the box. Need your own brand font instead? Point the agent at a folder of your fonts, or add it to your Wonda brand kit, and the captions render in it: same word-by-word motion, your typography.
Two honest notes. Auto-generated hook headlines are off by default, because the guessed headlines came out inconsistent, so if you want a hook overlay, tell the agent the exact text and it passes --hook "your custom text" per clip rather than relying on a guess. And the local caption render is why the agent warmed Chromium in setup: the first render pauses to fetch ~150 MB if you skip it.
Step 5: Restyle, then publish or schedule to TikTok and Instagram
The cheap way to iterate is to re-skin an existing job, not re-run it. Once a job exists, --restyle <jobId> re-renders its clips with a new caption preset and reuses the original selection, so it is roughly $0 and a few seconds per clip instead of running the whole job again. You ask for a different look; the agent restyles instead of starting over.
"Try those clips again in black captions instead of red."
Claude Code runs:
wonda clipping --restyle <jobId> --caption-preset "TikTok Black Captions" --wait -o ./clips/It reused the job id and only swapped the preset, so selection never re-ran. Once you like the look, you tell it where to post, and it picks the publish command per platform.
"Post the best one to TikTok, public."
Claude Code runs:
wonda publish tiktok --media <clipMediaId> --connection <tiktok-uuid> --caption "..." --privacy PUBLIC_TO_EVERYONEIt used wonda publish tiktok (never wonda tiktok post) and the UPPER_SNAKE_CASE --privacy value, both details you never had to know. To queue an Instagram Reel for a future slot, tell it the time and it adds --scheduled-at (RFC3339) through the social media scheduling CLI:
wonda publish instagram --media <clipMediaId> --connection <ig-uuid> --caption "..." --share-to-feed --scheduled-at 2026-06-10T14:00:00ZTuning the batch: count, duration, and speaker filters
The outcome is a sized, shaped batch instead of whatever the model surfaces by default. You describe how many clips, how long, and whose voice; the agent maps that to flags and knows the limits.
"Give me five punchy one-liners, around 20 seconds each, red captions."
Claude Code runs:
wonda clipping --media <uuid> --brief "punchy one-liners" --count 5 --duration 20 --tolerance 5 --caption-preset "TikTok Red Captions" --wait -o ./clips/It set --count 5, --duration 20, and --tolerance 5 from your sentence. The agent knows --count runs 1 to 20 but auto-clamps to 3 to 12, so asking for 20 quietly gives you up to 12. It can pick lengths with either --duration plus --tolerance or --min-duration plus --max-duration (mutually exclusive; defaults 15 to 60 seconds), and narrow to one person with --speaker using diarization labels.
For a multi-guest interview, ask it to keep selection on one voice and it adds --speaker.
Common mistakes to avoid
Feeding it footage with no speech. It works off what is said, so music videos and silent b-roll come back weak or empty. This is built for podcasts, interviews, and talking-head video.
Forcing a fixed clip length. Pinning --duration to a round number can chop a clip off before the payoff resolves. Leave the default range unless a platform gives you a reason to clamp it.
Reading a 404 as a bug. A 404 from /clipping means the per-account clippingEnabled flag is off, not that the command broke. The feature is rolling out and may need enabling.
Results: what success looks like
If everything went correctly, you now have a folder of vertical 9:16 clips with word-by-word captions burned on, plus a plan.json listing each clip's start, end, title, score, rationale, and dominantSpeaker. Each clip opens on the actual hook of the moment, not mid-sentence, because it picked by what was said rather than where the camera cut.
That is the thesis paying off in a folder of files: one command produced many clips from one source, the score field tells you which to post first, and a --restyle pass re-skins the whole batch in seconds for roughly $0. The stretch goal is to chain it, having Claude Code clip, restyle, and schedule a week of Reels in one conversation.
Frequently Asked Questions
How do I cut a long video into short clips automatically?
You tell Claude Code what you want ("clip this podcast into ten vertical shorts with captions"), and it runs wonda clipping --media <id> --brief "..." --wait -o ./clips/ for you. It picks the moments by what is actually said, reframes to vertical 9:16 with face tracking, burns word-by-word captions on by default, and downloads the files plus a plan.json. Source: Wonda's in-repo CLI skill doc, 2026.
Does it pick the clips for me, and can I steer it?
Yes. By default it finds the most postable moments on its own. Add a plain-English --brief and it picks clips that match your angle instead. Either way, each clip comes back with a score and a written reason, so you can see why it was chosen. It picks by what is said, not by camera changes, so it cuts on meaning.
How long should each clip be?
Cut to the natural boundary of a complete thought, hook through payoff, rather than a fixed number. The platform sweet spots are 15 to 60 seconds on TikTok and under 15 on Reels and X, with 21 to 34 seconds posting the highest completion rates (Sprout Social, 2024; Shortimize, 2025). Leaving the default --min-duration/--max-duration range lets the model respect the thought boundary.
Can Claude Code do this, or do I need the MCP server?
Claude Code drives it by shelling out to the wonda binary after reading the auto-synced skill manual. Clipping is CLI-only: the MCP server has no clipping tool. Hermes Agent works the same way as an always-on daemon.
How is this different from Opus Clip or Klap?
They all pick by what was said, so the core idea is the same. The difference is the surface. Opus, Klap, and Riverside are single-mode web dashboards that hand back a finished video, not a machine-readable per-clip plan (Dupple on Klap, 2026). Wonda gives you a CLI your agent drives, a plan.json it can read, and the choice of letting it find clips or steering it with a brief.
Why am I getting a 404 from clipping?
Clipping is gated behind the per-account clippingEnabled flag. A 404 from /clipping means it is off for your account. The feature is rolling out, so it may need enabling. Source: Wonda's CLI skill doc, 2026.
The Bottom Line
Great clipping was never a cutting problem. It is a selection problem, and selection is a language problem: a clip is worth posting when the words carry a hook, a payoff, and one line worth repeating. A camera-change detector cannot see any of that. A model that reads what is said finds it directly, and an agent runs the whole loop, select, reframe, caption, schedule, from one plain-English instruction. Clipping used to be manual work. Now it is one sentence to your agent, the same way the rest of your marketing is going.
curl -fsSL https://wonda.sh/install.sh | bash
wonda auth login
wonda doctor --warm-chromeThen tell Claude Code: "Clip this podcast into ten controversial vertical shorts with red captions and download them." The agent handles the mechanical 80%; you curate the top few from the plan.json rationales. From here, point the same clips at batching and scheduling a week of Reels from the terminal.
Sources
- TikTok Newsroom, How TikTok Recommends Videos For You, retrieved 2026-06-03, https://newsroom.tiktok.com/en-us/how-tiktok-recommends-videos-for-you
- TikTok for Business, Creative Best Practices for Top-Performing Ads, retrieved 2026-06-03, https://ads.tiktok.com/business/en/blog/creative-best-practices-top-performing-ads
- Stackmatix, TikTok Hook in the First 3 Seconds, retrieved 2026-06-03, https://www.stackmatix.com/blog/tiktok-hook-first-3-seconds
- TTS Vibes, TikTok First 3 Seconds Hook Retention Rate, retrieved 2026-06-03, https://insights.ttsvibes.com/tiktok-first-3-seconds-hook-retention-rate/
- Socialinsider, YouTube Shorts Analytics Guide, retrieved 2026-06-03, https://www.socialinsider.io/blog/youtube-shorts-analytics-guide/
- Growth Engineering, Curiosity and the Information-Gap Theory (Loewenstein), retrieved 2026-06-03, https://www.growthengineering.co.uk/curiosity/
- Berger and Milkman, What Makes Online Content Viral?, Journal of Marketing Research (2012), retrieved 2026-06-03, https://journals.sagepub.com/doi/10.1509/jmr.10.0353
- Verizon Media and Publicis Media (via Amberscript), Captions Increase Video Views by 80%, retrieved 2026-06-03, https://www.amberscript.com/en/blog/how-subtitles-and-captions-help-increase-video-views-by-80-percent/
- Digiday, 75% of People Watch Mobile Videos on Mute, retrieved 2026-06-03, https://digiday.com/sponsored/75-percent-of-people-watch-mobile-videos-on-mute/
- Sprout Social, Video Length Best Practices, retrieved 2026-06-03, https://sproutsocial.com/insights/video-length-best-practices/
- Shortimize, Video Length Sweet Spots for TikTok, Reels, Shorts, retrieved 2026-06-03, https://www.shortimize.com/blog/video-length-sweet-spots-tiktok-reels-shorts
- Opus Clip Help Center, Virality Score, retrieved 2026-06-03, https://help.opus.pro/docs/article/virality-score
- Klap, AI Auto Clip Maker, retrieved 2026-06-03, https://klap.app/blog/auto-clip-maker-ai
- Riverside, About Magic Clips, retrieved 2026-06-03, https://support.riverside.com/hc/en-us/articles/12124048765981-About-Magic-Clips
- Dupple, Klap AI Review, retrieved 2026-06-03, https://dupple.com/tools/klap-ai
- Offshore Clipping, How Long Does Video Editing Take?, retrieved 2026-06-03, https://offshoreclipping.com/blog/video-editing-time/
- 3Play Media, How Long Does It Take to Manually Caption Videos?, retrieved 2026-06-03, https://www.3playmedia.com/blog/long-take-manually-caption-videos/
- Twine, Video Editor Rates Guide, retrieved 2026-06-03, https://www.twine.net/blog/video-editor-rates-guide/
- TechCrunch, YouTube Surpasses 1 Billion Monthly Podcast Viewers, retrieved 2026-06-03, https://techcrunch.com/2025/02/26/youtube-surpasses-1-billion-monthly-podcast-viewers/
- HubSpot, Video Marketing Statistics (2025 State of Marketing Report), retrieved 2026-06-03, https://blog.hubspot.com/marketing/video-marketing-statistics
- Sprout Social, Social Media Video Statistics, retrieved 2026-06-03, https://sproutsocial.com/insights/social-media-video-statistics/
- OpusClip, Pricing, retrieved 2026-06-03, https://www.opus.pro/pricing
- Klap, Pricing, retrieved 2026-06-03, https://klap.app/pricing
- Vizard, Pricing, retrieved 2026-06-03, https://vizard.ai/pricing
- Riverside, Plans and Pricing, retrieved 2026-06-03, https://riverside.com/pricing