AI Video: 2016 to 2026 – From Twitching Pixels to...

There are those strange moments on the internet when you pause, tilt your head, and wonder: “Wait — is this video real?”

Between 2016 and today, January 2026, such moments have become daily routine. AI videos carry names like “CCTV clip,” “doorbell camera,” or “glass fruit ASMR.” They show jumping rabbits, cats with banjos, glass strawberries, or people with balloon heads. What has happened in less than a decade cannot simply be described as technological progress. It is more of a visual decoupling: the image has detached itself from light, the camera has become optional — and our relationship to reality has been fundamentally renegotiated.

Welcome to a journey through ten years in which the act of seeing itself was up for debate.

The Backstory: The Era of Talking Heads (2016–2021)

To understand where we stand today, you have to look back to a time when AI video meant something fundamentally different. Between 2016 and 2018, research labs were experimenting with so-called lip-sync models. The idea was simple: take existing film footage and recalculate mouths so they match a different language. The goal was synchronization, not creativity.

The term “deepfake” first appeared on shady Reddit forums, accompanied by a diffuse mix of unease and technical fascination. Yet visually, everything remained surprisingly limited: no new worlds, no movement through space — just faces that almost worked.

When Synthesia went public in 2019, it was a milestone for businesses. Suddenly avatars could deliver presentations, explain compliance rules, or conduct product training. In dozens of languages. Efficient, scalable, cost-effective. But also: sterile. These digital speakers were correct but lifeless. They lacked the tremor, the hesitation, the micro-disorder of real life. They looked like well-groomed news anchors from the uncanny valley.

YouTube: David Beckham speaks nine languages to launch Malaria Must – Die Voice Petition

It was still clear: AI replaces humans in front of the camera. Not the camera itself.

2022: The Year AI Learned to Dream

Then came 2022 — and with it a paradigm shift. In July, CogVideo appeared, a Chinese open-source project. Rough, unwieldy, technically demanding. But revolutionary. For the first time, pure text generated movement over time.

Shortly after, a race among tech giants began. Meta presented Make-A-Video, Google responded with Phenaki and Imagen Video. The results were short, noisy, fragmented — like dreams you can’t quite remember. But they showed something fundamental: AI now understood concepts. Not just pixels, not just frames. But actions. A “teddy bear painting” remained a painting teddy bear across several seconds. The timeline had gained meaning.

YouTube: An AI that generates videos from text – Make-A-Video Explained

This was no longer just a tech demo. It was the promise of a new visual language.

2023: Runway and the Disappearance of the Camera

The real breakthrough didn’t come from a closed lab, but from a browser. In February 2023, Runway opened up text-to-video to the public with Gen-1. But Gen-2, released in March, was the true turning point: suddenly anyone with an internet connection could generate videos from text.

Not perfect. Not by a long shot. Arms grew too numerous, faces melted, backgrounds morphed without regard for physical logic. But it didn’t matter. The internet did what it always does: it started playing. The camera was no longer a physical object — it was a thought you could type.

YouTube: Runway Gen-2 vs Gen-1 Comparison Short

For the first time, AI video was no longer a privilege. It was a tool. Democratized, chaotic, untamed.

2024: The Illusion of Reality — From Digital Cinema to the End of Proof

Sora and the Silent Shock

When OpenAI introduced Sora in spring 2024, something fundamental changed. Not euphorically, not loudly — but quietly.

YouTube: Sora: Tokyo Street Walk

The AI video — the clip that made the world pause — was called “Tokyo Street Walk.” A woman in a black leather jacket walks through rain-soaked, neon-lit Tokyo. Lights reflect in puddles. The camera follows her, pans slightly, breathes. It was no spectacle of effects. No gag. Just: world. A scene that could have come from a Wong Kar-wai film. The shock ran deep because nothing about it screamed “AI.” No exaggeration, no morphing faces, no physical errors. Just reality that had never existed.

YouTube: Air head – Made by shy kids with Sora

Shortly after, “Air Head” followed from the production company Shy Kids — a poetic short film about a man with a yellow balloon as a head. Consistent, narrative, emotional. For the first time, AI video was not a demo but cinema.

The Age of Fake Surveillance Cameras

While Sora attacked high-gloss cinema, something even more powerful emerged in the niches of social media: a genre that exploited our primal trust in poor image quality.

YouTube: Bunnies jumping on trampoline

It started with a seemingly mundane clip: rabbits jumping on a trampoline, filmed from the perspective of a surveillance camera. Mediocre resolution, static angle — exactly the visual vocabulary we’ve trusted blindly for decades. The video went viral because it looked so unspectacular. Only upon close analysis did it become clear: completely AI-generated.

Shortly after, the genre exploded: cats playing banjo at doorbells. Raccoons holding weddings on terraces. “CCTV Captures Cats Saving Kids From Bear Attacks.”

YouTube: Cat Play Instrumental Tonight – AI Core Madness!

What emerged here was more than a trend. It was a mental tipping point. AI had learned that grainy, shaky footage doesn’t look suspicious — it looks authentic. Poor image quality was no longer a flaw but a stylistic device. The surveillance camera, once a symbol of documentary truth, became a theater backdrop.

2025: The Year of Surreal Everyday Life and Glass Fruit

The Uncanny Becomes a Meme

When reality can be perfectly imitated, creativity often flees into the grotesque. The “You Are What You Eat” trend showed anthropomorphic burgers and pizza people consuming themselves. These videos were intentionally wrong, deliberately uncanny — and successful precisely for that reason. They used the uncanny valley not as a bug, but as a feature.

YouTube: Thought it was just spaghetti… then it cried

In parallel, historical iconography became a playground. We saw the Mona Lisa eating pizza, “Leonardo Da Vinci Paints Mona Lisa,” or “Ancient Roman Empire AI video.” The absurd became the new normal.

YouTube: Ancient Rome | AI Generated Video

In 2025, the ironic counter-movement followed: the “2004 Webcam Style.” Intentionally pixelated clips set to pop hits of the 2000s — a digital retro filter that flirted with its own artificiality. In a world of perfect 8K simulation, imperfection became a luxury.

Glass, Fruit, and the New Materiality

Then came the moment that condensed everything. AI videos of glass strawberries, grapefruits, and watermelons that sounded like crystal when cut, but bled like real fruit.

YouTube: Cutting Glass Fruits Strawberry 🍓 ASMR

These clips struck something deep. They combined physical impossibility with tactile logic. You could almost taste the shattering. It was ASMR for a world you cannot touch — Dalí’s melting clocks, but in 4K and slow motion. AI video was suddenly not just visual anymore, but sensory.

When Brands Dream — and Stumble

When the industry commercialized the technology, the debate turned emotional. Coca-Cola’s “Holidays Are Coming 2.0” from Christmas 2024 was technically flawless yet sharply criticized. Too polished. Too soulless. Many felt the clip was a betrayal of the nostalgic warmth of the original.

YouTube: Coca-Cola – Holidays Are Coming (AI-Generated 2025 Christmas Ad)

The Toys “R” Us “The Origin” spot showed the other extreme: a soft, dreamlike fever dream of memory. It made clear: we accept AI more readily when it creates surreal worlds than when it tries to simulate our sacred memories.

The Tools of Change: AI Video Platforms in 2026

Today the market has sorted itself out. There are tools for every kind of dreaming: Generative core platforms: OpenAI Sora, Kling AI, Luma Dream Machine, Google Veo. They are the directors of the new visual world. Communicators: HeyGen and Synthesia have effectively demolished language barriers through perfect lip-syncing and video translation. Automators: InVideo AI and Pictory fuel the “faceless channels” on YouTube. Script in, video out. Efficient, but often as interchangeable as fast food. AI doesn’t replace a film crew here. It replaces the barrier to entry. That’s democratic — and gives pause at the same time.

Plattform	Modell / Basis	Kostenlose Option	Gut geeignet für Trend:
ChatGPT	Sora 2 (OpenAI Modell)	Begrenzte Anzahl für Plus-Nutzer; Gratis-User via Microsoft Rewards / VPN	Identitätsvariationen & filmische Ästhetik
Google Gemini	Veo 3.1 (Google Modell)	Monatliches KI-Guthaben (ca. 200 Punkte) in Gemini Advanced	Lippensynchronisation & YouTube-Integration
Kling AI	Kling 2.0	Tägliche Gratis-Credits bei Login	Physikalische Korrektheit (z.B. Flüssigkeiten)
Luma Dream Machine	Luma / Sora 2 Preview	Begrenzte monatliche Generations	Fake Memory Footage (VHS-Style)
HeyGen / Hedra	Eigene spezialisierte Modelle	1–5 Credits pro Monat kostenlos	Sprechende Avatare & Selfie-Dialoge
Runway	Gen-3 / Gen-4	Startguthaben für Neuregistrierung	World-Morphing & präzise Kameraführung
Grok (xAI)	Grok-3 / X-Video	Gelegentlicher Free-Access für verifizierte Profile; meist X-Premium nötig	Real-Time News Visuals & Ungefilterte Satire

AI Video Trends in 2026

In 2026, the character of AI videos has fundamentally changed. The technical wow effect has faded; what remains is the aesthetic stance. AI videos no longer need to impress — they need to resonate.

A central trend is identity-playing selfie transformations. From a single photo, alternative versions of the same person emerge: as a future self, as a historical figure, or as an inhabitant of a different reality. Particularly popular are dialogues between “present” and “future” selves. Made possible by significantly improved facial consistency and precise lip-sync.

In parallel, world-morphing is booming — where your own room is transformed in real time into a jungle or film set through video-to-video synthesis. While high-dopamine ultra-shorts create fast entertainment through AI-driven rhythm editing, formless formats without faces are gaining importance simultaneously. Videos from the perspective of everyday objects (object-POV) or surreal material studies work purely sensorially and transcend language barriers.

Another emotional trend is Fake Memory Footage: AI-generated “memories” in VHS or webcam aesthetic play with nostalgia and show that atmospheric credibility often weighs heavier than factual truth. Characteristic of 2026 are long, quiet takes and deliberate reduction. AI video is no longer a mere effect but a precise narrative tool, with human authenticity at its core.

Try It Yourself: Six Prompts That Show What AI Can Do

Prompt 1: Selfie-Based Identity Variations – “Another Possible Me”

What it’s about: From a neutral selfie, an alternative version of you emerges — not as a gag, but as a believable existence in a different reality.

Prompt: Use the uploaded selfie as a reference. Generate a calm, cinematic video of the same person in an alternative reality. The person stands still, looking slightly past the camera. Setting: muted light, realistic materials, believable architecture. No effects, no exaggeration. Camera movement: very slow push-in. Mood: reflective, calm, matter-of-fact. Duration: approx. 8–12 seconds.

(Selfie optional, but recommended)

What to observe: Whether the face remains stable throughout the sequence or subtly “drifts”; whether the scene feels deliberately calm rather than spectacular; and whether that quiet but compelling feeling arises: “That’s me — but somewhere else.” Works with: Sora (via ChatGPT), Runway (Gen-3/Gen-4), Veo (via Google Gemini)

Prompt 2: Selfie-Based Identity Variations – Identity Timelines

What it’s about: No dialogue, no text on screen — just two versions of the same person looking at each other. Time is told visually, not explained.

Prompt: Use the uploaded selfie as a reference. A calm, cinematic split-screen video showing two versions of the same person. Left: the present self. Right: the future self, naturally aged, only subtle changes. Both retain identical facial structure and a similar expression style. They don’t speak. They briefly look at each other, then look away again. The light is soft, neutral, and realistic. The camera remains completely still. Mood: reflective, intimate, quiet.

(Selfie optional, but recommended)

What to observe: How subtle the aging looks; whether both versions clearly remain the same person; and whether the silence is more powerful than words. Works with: Veo (via Google Gemini), Kling AI, HeyGen (simplified version)

Prompt 3: Object-POV — The World from an Object’s Perspective

What it’s about: The camera is no longer an observer — it’s an object quietly participating in everyday life.

Prompt: Generate a video from the perspective of a coffee cup on a kitchen table. Morning light falls through a window, dust particles visible. People move blurred in the background, remaining anonymous. The camera moves minimally with the object’s inertia. No cuts, no music — just visual stillness. Duration: approx. 10 seconds.

What to observe: Whether the camera movement feels physically plausible; whether the scene remains interesting despite lacking action; and whether a quiet intimacy emerges. Works with: Luma, Runway

Prompt 4: Material Study – “Impossible Material, Real Logic”

What it’s about: An object behaves in a materially contradictory way, but remains visually consistent.

Prompt: Show a strawberry made entirely of clear glass. A blade slowly cuts through it. The glass splinters audibly, while red juice simultaneously flows out. Macro shot, calm light, neutral setting. No camera cuts — focus on texture and movement. Duration: approx. 6–8 seconds.

What to observe: Whether the breaking, sound, and movement fit together coherently; whether the impossibility still feels logical within the scene; and whether the video thereby produces an almost tactile, “touchable” effect. Works with: Luma, Runway, Veo (via Gemini, if available)

Prompt 5: Memory Footage – “A Memory That Never Existed”

What it’s about: The AI generates a familiar memory that feels emotionally credible, independent of truth.

Prompt: Generate a video in the style of an old VHS home recording. Scene: summer afternoon, children running through a garden, blurry and casual. Colors slightly washed out, visible image noise, mild overexposure. The camera feels clumsily operated. No clear faces, no focus on individuals. Duration: approx. 10–15 seconds.

What to observe: Whether the video evokes nostalgia even though you know it’s fabricated; whether the scene feels more “remembered” than staged; and whether the blurriness enhances its credibility. Works with: Luma

Prompt 6: Quiet Narrative – “The Silence”

What it’s about: This trend is about omission. A single shot that tells a story through light and shadow, without much happening.

Prompt: Uninterrupted shot of an empty wooden chair in a dark room. A single beam of light slowly moves across the wood grain. Dust particles dance in the light. At the end of the video, a soft shadow of a person falls on the floor — someone you never see. Melancholic, aesthetic, still. Duration: approx. 10 seconds.

What to observe: The interplay of light and shadow and the subtlety of the dust particles. The emotional effect is created purely through composition. Works with: Google Gemini (Veo 3.1) or Sora 2

Conclusion: The End of Innocence

AI videos have not simply lied to us over these ten years. They have shown us how unconditionally we used to believe in images. The camera is not dead, but it is no longer innocent. The image is no longer proof — it is a claim.

Perhaps this is the most important insight of January 2026: not the machines learned to make images — we learned to read them anew. We no longer just look; we examine, we feel, we decide afresh what moves us. The question is no longer: “Is this real?” But: “Does it touch me?”

And now if you’ll excuse me. I need to watch once more how those raccoons celebrate their wedding on the terrace. It’s generated. But damn, they look happy.

This article was created with the assistance of Claude.ai Sonnet 4.5. The embedded AI videos are linked via YouTube and were all generated using various specialized AI video generators.

Have you tested the prompts? What was your result — let me know in the comments.

AI Video: 2016 to 2026 – From Twitching Pixels to Transparent Reality