Video | Mona Lisa raps...but how?

Written By ندى ماهر عبدربه on 2024/04/25

This article was written originally in Arabic and is translated using a 3rd party automated service. ArabiaWeather is not responsible for any grammatical errors whatsoever.

ArabiaWeather - A team of scientists at Microsoft Research Asia has developed a new artificial intelligence model called VASA-1, which turns images of people's faces and audio clips into synchronized videos with lip movements, facial expressions, and head movements in an accurate and realistic manner.

In a research paper, the team stated that they presented the VASA framework, which enables the creation of lifelike talking faces with attractive visual emotional skills from a single image and speech audio clip. The first model, VASA-1, is distinguished by its ability to generate exquisite lip movements in sync with sound, In addition to capturing a wide range of nuances in facial expressions and natural head movements that contribute to the authenticity and liveliness of the video.

The team claims that their method not only delivers high video quality with realistic face and head dynamics, but also supports online creation of 512 x 512 videos at up to 40 frames per second with almost negligible latency.

Video | A Saudi airline employee becomes a trend... What's the story?

Singing the Mona Lisa and fears of impersonation

VASA, or Visual Affective Skills Animator, is a name that stands for “Visual Affective Skills Animator,” and is capable of creating realistic videos that accurately and realistically mimic human conversational behaviors.

The VASA model can create videos that look completely real, with “realistic talking faces” mirroring conversational behaviors through natural facial gestures, eye and head movements, all starting from a single static head image.

The team used the VoxCeleb2 dataset, which includes videos of thousands of real-life celebrities, to train their model.

Their model was distinguished by its ability to deal with diverse inputs outside the training domain, such as artistic images and non-English speech.

While the model's capabilities raise impersonation concerns, the scientists stress that their goal with the tool is to develop virtual characters' visual emotional skills, not to impersonate anyone in the real world.

Microsoft confirms that there are currently no plans to release the code supporting the model, and aims to use the technology responsibly and in accordance with appropriate regulations in the future.

Read also:

China is drowning in dust... How so?

On World Earth Day, frequently asked questions about...


Sources:

Interesting Engineering

This article was written originally in Arabic and is translated using a 3rd party automated service. ArabiaWeather is not responsible for any grammatical errors whatsoever.


Browse on the official website



Your Air Fryer Might Be Spying On You... What's The Story?How to Measure Weather Temperature on Samsung Smart WatchSector | Heavy rains affect the northern sector, and expectations of cold and rainy weather at intervals until WednesdaySaudi Arabia | A colder than usual air mass affects the north of the Kingdom, and cold nights with temperatures dropping to 6 degrees CelsiusJordan | A rush of humid air currents starting tonight will bring rain to large parts of the north, center and east of the countryJordan - Alert | Low clouds touching the ground lead to almost zero horizontal visibility in the northern highlandsSaudi Arabia | Bryde's whales spotted in Jubail Marine ReserveSaudi Arabia | Jeddah Tower continues to rise.. What's the story?Jordan - Update 4:00 PM | Rain clouds are advancing towards the northern regions, accompanied by showers of rain and fog formation over the mountains