Is OpenAI’s Text-to-Video Model Sora the Future of Content Creation?

By: Alyssa Miller | Published: Feb 16, 2024

The U.S.-based artificial intelligence (AI) research organization OpenAI is teasing its latest AI project, Sora. The AI company says Sora “can create realistic and imaginative scenes from text instructions.”

Rather than being a text-to-image AI, Sora allows users to create photorealistic videos based on a prompt users give. How exactly does this work? And when does Sora go live for the public? Let’s get into it.

OpenAI Is Training AI to Make Videos

The company behind AI innovations like ChatGPT and DALL-E is ready to tackle the world of content creation and, perhaps, cinema. In a blog post, OpenAI announced that they are “teaching AI to understand and simulate the physical world in motion, with the goal of training models that help people solve problems that require real-world interaction.”

Advertisement
White OpenAI logo against a multi-colored background

Source: OpenAI

Known as Sora, the text-to-video model’s research progress is now ready to gain feedback from others in the AI community. But what is Sora?

Advertisement

What Is Sora? 

Sora is said to be capable of creating “complex scenes with multiple characters, specific types of motion, and accurate details of the subject and background,” according to OpenAI’s introductory blog post.

Advertisement
A spaceman with a red helmet running across a snowy landscape toward a spaceship

Source: OpenAI

OpenAI notes that the text-to-video model can understand how objects “exist in the physical world,” and can “accurately interpret props and generate compelling characters that express vibrant emotions.”

What Else Can Sora Do? 

Sora can do more than create a video based on a user’s prompt. Users can drop a still into Sora to bring it to life, fill in missing frames on an existing video, or extend it. In a recent demo from OpenAI’s blog post, the AI company shows off how impressive Sora is.

Advertisement
Three green animated fish swimming in the ocean

Source: OpenAI

While Google and Runway already have text-to-video projects, Sora stands out for its photorealism and its ability to produce minute-long clips, which seem to be longer than most models can do.

How Does Sora Work? 

Wired tried a demo version of Sora to review how the text-to-video model shapes up against its stiff competition. While OpenAI didn’t allow journalist Steven Levy to enter his prompt into Sora, the AI company did share four clips rendered by Sora.

Advertisement
A man with a white t-shirt reading a book in a cloud against a blue sky

Source: OpenAI

Despite the impressive opening shots that Sora can create, the longest clip was only 17 seconds.

Sora’s Rendering Time Takes as Long as a Lunch Break

The researchers behind Sora did not share with Levy how long it takes to render these text-to-video prompts. However, they did share a ballpark estimate, saying a user could go out for a burrito and then come back to a rendered video.

Two mammoths running through the snow toward camera

Source: OpenAI

While this is impressive, there are some limitations to the AI model that are already apparent.

Advertisement

The Limitations of Sora

According to Levy, Sora is not perfect (then again, who or what is?). On the first watch, the clips look great. After a while, the gleam of the new tech wears off and you can start to see the flaws in the video.

A lighthouse on a coast as waves break against the rocks below

Source: OpenAI

In the Tokyo example, the virtual camera seems to hit a dead-end, just like the sidewalk that the background characters are walking off of. It’s a mild glitch that breaks the photorealism of the scene.

Advertisement

The AI Face Problem Persist 

Levy notes that Sora is shying away from close-ups of generated characters beyond the main character(s). This becomes a problem because the close-up, which is a type of shot that tightly frames a person or object, is a powerful tool for filmmakers as it shows the nuances of a character’s emotions.

A man with a beard wearing a red helmet and a spacesuit looking at the camera

Source: OpenAI

If Sora boasts that it can replicate “generate compelling characters that express vibrant emotions,” then it should be able to do so in the close-up.

Advertisement

Sora Is Learning How to Do Some Things On It’s Own 

Despite Sora’s shortcomings, the AI model is constantly learning and evolving as more and more prompts are fed to it. In one clip that depicts “an animated scene of a short fluffy monster kneeling beside a red candle,” Sora created a Pixar-esque monster with complex fur texture that Pixar made a big deal about when “Monsters, Inc.” debuted in 2001

An animated monsters reaching its hand to a burning red candle

Source: OpenAI

“It learns about 3D geometry and consistency,” says Tim Brooks, a research scientist on the project. “We didn’t bake that in—it just entirely emerged from seeing a lot of data.”

Advertisement

Sora Is Understanding Cinematic Language  

Powered by the version of the diffusion model and transformer-based engine that OpenAI uses for DALL-E 3 and GPT-4, Sora has learned how to create a narrative through camera angles and pacing. As it continues to learn, one thing is becoming clear: Sora is starting to understand and master cinematic language.

A stack of TVs in a museum playing different channels

Source: OpenAI

“There’s actually multiple shot changes—these are not stitched together, but generated by the model in one go,” Bill Peebles, another researcher on the project, says. “We didn’t tell it to do that, it just automatically did it.”

Advertisement

Does Sora Use Other’s Work to Create Videos? 

Another potential issue that has caused problems for AI text-to-image models in the past is copyright infringement. “The training data is from the content we’ve licensed and also publicly available content,” says Peebles.

An animated squirrel looking at a hand pointing to a group of small fairies on a toadstool

Source: OpenAI

However, there have been several lawsuits against OpenAI that question whether or not “publicly available” copyrighted content is fair use for AI training.

Advertisement

When Will Sora Be Available? 

Currently, Sora is only available to “read teamers,” which are people who are assessing the model for potential harms and risks. Some visual artists, designers, and filmmakers are also testing Sora to provide feedback to the OpenAI team.

Two pirate ships rocking back and forth in a sea of coffee

Source: OpenAI

There is currently no set release date for Sora to go live to the public at the time of writing. However, Sora’s future could provide some interesting developments and risks to any content creator out there.

Advertisement