All posts

The video ad framework: five moves that turn scrolls into action

Five seconds, five moves, and why your video lives or dies in the first of them

Jun 18, 2026
Guide
By Roxy Crouch
The video ad framework: five moves that turn scrolls into action

Here's a number for you.

The average person now decides whether to keep watching a video in well under a second, usually with the sound off, usually while their thumb is already moving. That is the whole window you get, and everything you spent on the shoot, the talent, the colour grade and the licensed track gets judged inside it, in less time than it takes to blink.

I am not guessing about that. Meta's own research puts 47% of a video ad's value in the first three seconds and 74% inside the first ten, around 85% of people watch with the sound off, and somewhere between 65% and 75% of paid video views are already gone before the third second arrives. So when I say the opening carries the ad, I mean that quite literally - by the time most brands get to the part they were proud of, the audience that was going to leave has already left.

That is usually where it goes wrong. A video opens on a logo, or a slow establishing shot, or a tasteful fade, and those few seconds of throat-clearing are enough to lose the people it needed. When we go back and pull a disappointing video apart, the footage is almost always fine. What lets it down is the order things happen in, and the order is the one thing nobody thinks to question.

I want to talk about video on its own here, because it is the single most effective creative format we run and the numbers bear that out. Video pulls higher click-through rates than static across nearly every platform - on Instagram, feed video runs around 44% higher than a single image - and short-form video tends to win clicks far more cheaply too. I should be straight with you, because there are real exceptions and we see them often. A sharp image or a well-built carousel will sometimes beat video on a given metric, particularly low in the funnel where a clean offer closes faster than a story, and on LinkedIn, where static still tends to win for B2B. When the data says that, we follow it. Step back from any single campaign, though, and the same thing keeps showing up across all of them. Video creates the bigger impact more reliably than anything else, stopping people, holding them, and moving them to act in a way the other formats rarely manage on their own. The brands that take it seriously are pulling away, while the ones still opening every clip with a logo are quietly paying for everyone else's growth.

None of this came from a whiteboard. It came from running a great deal of video across very different kinds of business - ecommerce brands, tech companies and funded startups - and watching, obsessively, what actually landed. Plenty of it did not work, so we dropped it and kept whatever did, and over time the same five-part structure kept rising to the top no matter who we were talking to. That is the part I find genuinely satisfying, because those audiences have almost nothing in common, and yet the shape of the video that moves each of them turns out to be the same.

The honest version, and the bit I like most about this job, is that we have never stopped testing. We still run the experiments, still read the drop-off curves, still throw out things we were sure would work. Every round teaches us a little more about what genuinely resonates with real people, and when we feed that back in, the audience gets something that actually speaks to them and the client watches the results arrive soon after. That loop, the testing feeding the learning feeding the work, is the whole craft, and it is what keeps the framework honest rather than frozen. The clearest proof of it is what happens when we rebuild a video around what the testing teaches us. On one campaign that rebuild drove 295% more sales, at a cost per sale 72% lower than the year before, and the footage had barely changed. All that really moved was the order.

So here is the formula. Five stages, in a deliberate order, where the first four exist to win attention and hold it and the fifth finally delivers the message. Get the sequence right and an ordinary video will outrun a beautiful one that was built the usual way.

Video framework

The five-part framework: the first four stages capture attention, the fifth delivers the message.

Let me take you through each one.

Hook: the first three seconds decide everything

Everything starts in the first three seconds, and they are the most important moment of the whole video, because if you waste them nothing else you made will ever be seen. The hook has one job, which is to stop a moving thumb, and the surest way to do that is to challenge what the viewer expects with something that looks nothing like the forty things they have just scrolled past. Surprise is what buys you the half-second you need.

There is even a way to measure whether it is working. Marketers call it the hook rate, the share of people who keep watching past the first three seconds, and a strong one sits above 30%, while anything under 20% is a sign the opening needs rebuilding rather than tweaking. It also has to land in silence, because most people are watching with the sound off and deciding almost instantly, so the hook has to live entirely in what they can see - a bold claim, a striking image, a number that should not be true, a line of text that fills the screen before they have worked out what they are looking at. Whatever you choose, it earns its place by doing the work without a single word being heard.

What tends to kill a hook is precisely the stuff that feels most professional. The logo animation, the gentle fade-in, the cinematic establishing shot all push the interesting part back by a few seconds, and a few seconds is all it takes to lose someone. A good test is to look at your opening frame and ask whether it could belong to any brand in any category, because if it could, what you have is a throat-clear, and nobody ever stopped scrolling for one of those.

Wow: pay them back immediately

Stopping someone only gets you halfway, because the moment they pause they are already weighing up whether to leave, so you have to reward them before they do. That reward is the wow, the emotional or visual payoff that makes them react and want more, and it has to come fast enough that staying feels worth it.

This is the move that feels upside down, because it asks you to spend at the start the very thing you were trained to save for the finale. The old arc ran setup, build, climax, resolution, and it served cinema beautifully for a hundred years, until the feed came along and rewrote the rules. Audiences now want the climax first, because they have learned to filter ruthlessly and they decide whether you have earned their time on a feeling in the opening seconds, whether that is curiosity, awe, recognition or a laugh. Once they feel something, they will give you the time to explain it.

The brands that have mastered short video live by this. Duolingo built a whole social presence out of it, where every clip carries a hook, a payoff and an ongoing storyline starring the green owl, a character people actually follow rather than a logo they tune out. Red Bull does the same with thrill, Nike with human grit, Ryanair with fast and slightly mean humour, and none of them spend their opening seconds telling you who they are. They give you a feeling first and trust that the explanation will be welcome once they have it.

We have watched the same principle pay off in our own work, again and again. The wow was never the polished brand shot but the single most striking moment we had, the bit that made people stop and feel something, and the videos that led with it consistently outran the ones that saved it for later. Putting the best moment first is the cheapest change you can make to a video, and it is almost always the one that moves the numbers most.

Cliffhanger: open a loop they have to close

With their attention earned, you open a gap. The cliffhanger is a moment of curiosity that leaves something unresolved and draws the viewer further in, and it works because of a quirk in how we are built. We fixate on unfinished things far more than finished ones, so the instant you open a loop, people feel a small and surprisingly persistent need to close it. It is the same pull that has you starting one more episode at midnight when every sensible part of you wants to be asleep.

All the craft lives in the setup. A cliffhanger only holds when the question is clear and the answer feels worth the wait, so you make the stakes obvious before you withhold the payoff, and the viewer will happily stay for the resolution. Get greedy and tease something vague that pays off with nothing, and you have not built suspense at all, you have wasted their time, and they will leave faster than if you had never opened the loop in the first place.

Sting: a beat, a breath, a brand stamp

The sting is the shortest stage and the one I hold most loosely. It is a brief branded beat, a title card or a quick cue that tells the viewer the main story is about to begin, and it works as a breath between the attention-grabbing opening and the substance that follows. Its real value is rhythm, a small signal that the gear is changing, and it gives you a chance to stamp your brand on the moment while you still have everyone watching.

I would be honest with yourself about this one, though, because it is more of a craft convention than a rule, and on a very short video you can leave it out entirely and lose almost nothing. What matters is where it goes. It belongs here, once you have earned the right to it, because if you slide it to the very front before anyone has a reason to care, you have simply opened on a logo again, and we have already seen how quickly that loses the seconds that matter most.

Main: now, finally, you deliver

Everything has been building to this. The scroll is stopped, the payoff has landed, the loop is open and the gear has changed, and only now have you earned the right to tell the full story. The main section is where the insight, the context and, above all, the proof live, and it is where the action you have been working towards finally happens.

Proof does the heaviest lifting here, and it is the part most brands hurry past. People are sceptical by default and quite right to be, so you owe them a reason to believe what they have just watched, and this is also where the framework shows its range, because the proof shifts from one sector to the next even as the job it does stays the same. For an ecommerce brand, the proof might be reviews, ratings and the words of people who are not the brand, the social validation a buyer trusts more than any claim you make about yourself. For a considered, high-value purchase it leans on provenance and reputation, the things that reassure someone before they part with real money. For a tech company or a startup it tends to be the numbers and the demo, the plain evidence that the thing actually works. The evidence changes each time while the work it does never really does, and across our own campaigns the videos built this way have repeatedly delivered the highest reach and the lowest cost per result of anything we run.

Then you ask for one thing, and one thing only, a single clear action you want the viewer to take next. It is tempting to offer a few options so you feel you have covered yourself, but the moment you hand people a menu they tend to order nothing at all, so it pays to decide what matters most and ask for that on its own.

The order is the strategy

That is the whole thing, five stages in that sequence, flexed to suit whoever is on the other side of the screen. A video selling a considered, high-value purchase moves slowly and quietly, because rushing tells a careful buyer you do not understand what you are selling. A video with a genuine deadline can lean hard on urgency, because the offer really does end. A tech or startup video can move fast and lead with the result, because the audience wants to know the thing works before they will care about anything else. The dressing changes from one business to the next while the skeleton underneath holds for all of them.

So when a video underperforms, hold off on reshooting it, because we almost made that expensive mistake more than once and it rarely turns out to be the answer. Map the thing against these five stages first, and nine times out of ten you will find the footage was fine all along while the hook was a logo, the wow was buried at the back, the proof was missing entirely, or the ask was three things wearing one coat. Put the order right and the good video you already had tends to reappear in front of you.

One last thing worth saying plainly. This framework is never quite finished, because the people on the other side of the screen keep changing. We keep testing it and we keep watching how audiences actually behave, and that behaviour shifts over time with the platforms, the trends and a hundred things outside anyone's control. So the real work is paying attention, constantly, to the patterns underneath. What is grabbing attention this month, what is earning trust, what is actually convincing someone to believe a brand and buy from it. Get into the habit of watching for that, and the framework keeps working long after any single video has had its day.