ArabiaWeather - A team of scientists at Microsoft Research Asia has developed a new artificial intelligence model called VASA-1, which turns images of people's faces and audio clips into synchronized videos with lip movements, facial expressions, and head movements in an accurate and realistic manner.
In a research paper, the team stated that they presented the VASA framework, which enables the creation of lifelike talking faces with attractive visual emotional skills from a single image and speech audio clip. The first model, VASA-1, is distinguished by its ability to generate exquisite lip movements in sync with sound, In addition to capturing a wide range of nuances in facial expressions and natural head movements that contribute to the authenticity and liveliness of the video.
The team claims that their method not only delivers high video quality with realistic face and head dynamics, but also supports online creation of 512 x 512 videos at up to 40 frames per second with almost negligible latency.
Video | A Saudi airline employee becomes a trend... What's the story?
Microsoft acaba de lanzar VASA-1.
This also causes the image to be alone and able to share with us the original audio format. Similar to EMO de Alibaba
10 different salvajes
1. Mona Lisa rapeando a Paparazzi pic.twitter.com/74mZH9fTQO
- Adam (@Adamaestr0_) April 19, 2024
VASA, or Visual Affective Skills Animator, is a name that stands for “Visual Affective Skills Animator,” and is capable of creating realistic videos that accurately and realistically mimic human conversational behaviors.
The VASA model can create videos that look completely real, with “realistic talking faces” mirroring conversational behaviors through natural facial gestures, eye and head movements, all starting from a single static head image.
The team used the VoxCeleb2 dataset, which includes videos of thousands of real-life celebrities, to train their model.
Their model was distinguished by its ability to deal with diverse inputs outside the training domain, such as artistic images and non-English speech.
While the model's capabilities raise impersonation concerns, the scientists stress that their goal with the tool is to develop virtual characters' visual emotional skills, not to impersonate anyone in the real world.
Microsoft confirms that there are currently no plans to release the code supporting the model, and aims to use the technology responsibly and in accordance with appropriate regulations in the future.
Read also:
China is drowning in dust... How so?
On World Earth Day, frequently asked questions about...
Sources:
Arabia Weather App
Download the app to receive weather notifications and more..