A new flagship video and audio generation system, Sora 2, has been announced as the successor to the original Sora 1 model. This updated model introduces several new capabilities, including simultaneous audio generation, enhanced physics simulation, and a personalized content feature named "Cameo." The model is being released through a new mobile application, the Sora app, designed as a platform for creating and sharing AI-generated content.
Sora 2: A Multimodal Video and Audio Generation System
The announcement positions Sora 1 as the "GPT-1 moment for video generation," a point where simple behaviors like object permanence began to emerge from scaled-up pre-training. Sora 2 is presented as the next significant advancement in model capability.
A primary feature of Sora 2 is its function as a general-purpose system that simultaneously generates both video and corresponding audio. This multimodal capability extends to several types of audio, including:
- Dialogue in a variety of languages across multiple speakers.
- Sound effects.
- General soundscapes.
The video further highlights that Sora 2 demonstrates a wide dynamic and stylistic range, capable of producing content from realism to anime and cartoons.
Enhanced Steerability and Physical World Understanding
Sora 2 reportedly marks a significant improvement in rendering motion, physics, and body mechanics. The system is described as more robust at handling complex dynamics and collisions, such as an Olympic gymnastics routine or a backflip on a wakeboard, in a way that appears natural.
The "steerability" of the model has also been improved. While previous video generation systems often required a shot-by-shot approach, Sora 2 is stated to be more capable of telling longer, more coherent stories with multiple shots within a single generation. This allows for the creation of more complex narratives in one go.
Introducing the "Cameo" Feature for Personalized Content
A unique feature introduced with Sora 2 is "Cameo," which allows users to insert a specific person, pet, or object into any Sora-generated environment. According to the presentation, this capability emerges from the system's world simulation models.
The process works by having the model observe a short video clip of the subject. The model develops a deep understanding of the subject, enabling it to be injected into any prompt as if it were a text token.
To ensure user control and prevent impersonation, setting up a Cameo requires a specific flow. A user must record a dynamic audio prompt and pass a liveness check, which involves moving their head in specified directions. This data is validated to confirm the user's identity. Once a Cameo is approved, the user has full control over who can use their likeness, with permission settings including "Only I can use my cameo," "people I approve," "mutuals," and "everyone." Users are also granted ownership rights over any video created with their authorized Cameo, including the ability to delete it.
The Sora App: A New Platform for AI-Generated Content
Sora 2 is being launched within the Sora app, a new product described as a social platform where all content is AI-generated by its human users. The interface is designed to be familiar to users of social media, featuring profiles and the ability to follow other users.
A key function of the app is the "remix" feature, which allows a user to take an existing video and generate their own variation of it by providing a new prompt. The app's feed is designed to heavily prioritize "connected content" from a user's friends and family. A beta feature is also in development that will allow users to guide the feed's content based on a chosen mood, such as "relaxing" or a desire to see "cute animals."
User Control, Safety, and Content Provenance
The platform incorporates several measures for safety and content identification. To ensure content is clearly labeled as AI-generated when it leaves the platform, several provenance techniques are used:
- Videos are visibly watermarked upon export.
- The C2PA standard is utilized.
- Internal techniques are in place to trace generations back to the Sora platform.
The system employs reasoning models to make it difficult to create harmful, X-rated, or violent content, which is noted as particularly important for the Cameo feature. For users under 18, the app has a separate set of policies, including the absence of an infinite scroll by default, replaced by a stopping period with a cool-down. For adults, the app will provide nudges to encourage creation if a user is detected to be in a "doom scroll loop."
Rollout and Future Availability
The Sora app is scheduled for an initial launch on iOS for users in the U.S. and Canada. The rollout will be invite-based, with users who get off the waitlist receiving four invite codes to share with friends.
In addition to the new app, the existing sora.com web application will be updated with the Sora 2 model. A new "storyboard" feature, allowing for more granular shot-by-shot control, is also planned for the web app. Furthermore, an API for Sora 2 is expected to be launched in the coming weeks to allow developers to integrate the technology into their own applications and video editors.
