Have you ever seen an AI-generated video where things just felt… off? Maybe a person’s shirt suddenly changes color, or the background warps for no reason. This usually happens because AI has a surprisingly bad memory, especially when making longer videos. But now, researchers from ByteDance (the company behind TikTok) and Stanford University have unveiled a groundbreaking new technology that changes everything.
It’s called Mixture of Contexts (MoC), and it’s a clever new way for AI to create minute-long, perfectly consistent videos without the usual glitches—and for a fraction of the cost.
Let’s break down what this means in simple terms.
The Big Problem: AI’s Short-Term Memory
Think of an AI creating a video like an artist painting a long mural. The artist starts at one end, and by the time they get to the other, they might forget the exact shade of blue they used for the sky at the beginning.
AI models face a similar challenge. To create the next frame of a video, the AI needs to look back at all the previous frames to keep everything consistent. For a short, 3-second clip, this is manageable. But for a 60-second video, that’s thousands of frames. Forcing the AI to analyze every single detail from every previous frame is incredibly demanding on computer memory and processing power.
This “memory bottleneck” is the main reason why:
- AI videos are often very short.
- Longer videos tend to have weird errors (like a character’s face changing or objects disappearing).
- Generating high-quality video is extremely expensive and requires supercomputers.
ByteDance Seed and Stanford introduce Mixture of Contexts (MoC) for long video generation, tackling the memory bottleneck with a novel sparse attention routing module.
— DailyPapers (@HuggingPapers) August 31, 2025
It enables minute-long consistent videos with short-video cost. pic.twitter.com/JHCSQ81FWJ
The Solution: A Smarter Way to Remember
The researchers at ByteDance and Stanford realized that the AI doesn’t need to remember everything. Just like our brains, it only needs to recall the most important details.
Their new method, Mixture of Contexts (MoC), basically gives the AI a smart search engine for its own memory. Instead of re-reading the entire “story” of the video so far, the AI now just asks itself: “What are the most important things I need to know to create the very next frame?”
Here’s how it works:
- Identify Key Information: The AI breaks down the video into important “contexts.” A context could be a person’s face, the background scenery, or a specific object.
- Smart Retrieval: When generating a new frame, the MoC system intelligently searches and retrieves only the most relevant contexts. It ignores all the unnecessary information.
- Putting it Together: The AI then uses this small, focused set of information to create the next frame, ensuring it perfectly matches what came before.

This approach is incredibly efficient. The researchers found that their MoC model ignores over 85% of the unnecessary information, which slashes the required computing power by a factor of 7!
What This Breakthrough Means for Everyone
This isn’t just a small technical update; it’s a giant leap forward for creative AI.
- Longer, High-Quality Videos: We can now get AI-generated videos that are a minute long or more and look as if they were shot with a real camera. Characters will remain consistent, and scenes won’t fall apart.
- Drastically Lower Costs: Because the process is so much more efficient, creating these long videos now costs about the same as making a very short clip used to. This makes the technology accessible to more creators and businesses.
- The Future of Content: This opens the door to AI-assisted filmmaking, dynamic video advertisements, and new forms of storytelling that were previously impossible.
ByteDance has been pushing the boundaries of making AI more efficient. Alongside MoC, their researchers also developed UltraMemV2, a powerful AI “memory network” designed for understanding very long pieces of information (like an entire book) with much less effort. Both of these innovations show a clear focus on making AI smarter and more practical for real-world use.
In short, the era of short, glitchy AI videos is coming to an end. Thanks to Mixture of Contexts, we are on the cusp of seeing longer, more believable, and more creative AI-generated content than ever before.