Video | Mona Lisa raps...but how?

Written By ندى ماهر عبدربه on 2024/04/25

This article was written originally in Arabic and is translated using a 3rd party automated service. ArabiaWeather is not responsible for any grammatical errors whatsoever.

ArabiaWeather - A team of scientists at Microsoft Research Asia has developed a new artificial intelligence model called VASA-1, which turns images of people's faces and audio clips into synchronized videos with lip movements, facial expressions, and head movements in an accurate and realistic manner. In a research paper, the team stated that they presented the VASA framework, which enables the creation of lifelike talking faces with attractive visual emotional skills from a single image and speech audio clip. The first model, VASA-1, is distinguished by its ability to generate exquisite lip movements in sync with sound, In addition to capturing a wide range of nuances in facial expressions and natural head movements that contribute to the authenticity and liveliness of the video. The team claims that their method not only delivers high video quality with realistic face and head dynamics, but also supports online creation of 512 x 512 videos at up to 40 frames per second with almost negligible latency. <a href="https://www.arabiaweather.com/ar/content/%D8%A8%D8%A7%D9%84%D9%81%D9%8A%... | A Saudi airline employee becomes a trend... What's the story?</a><blockquote class="twitter-tweet" data-media-max-width="560" style=";text-align:left;direction:ltr"> Microsoft acaba de lanzar VASA-1. This also causes the image to be alone and able to share with us the original audio format. Similar to EMO de Alibaba 10 different salvajes 1. Mona Lisa rapeando a Paparazzi <a href="https://t.co/74mZH9fTQO">pic.twitter.com/74mZH9fTQO</a> - Adam (@Adamaestr0_) <a href="https://twitter.com/Adamaestr0_/status/1781395640565530633?ref_src=twsrc... 19, 2024</a> </blockquote><script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script><h2 style=";text-align:left;direction:ltr"></h2><h2 style=";text-align:left;direction:ltr"> Singing the Mona Lisa and fears of impersonation</h2> VASA, or Visual Affective Skills Animator, is a name that stands for “Visual Affective Skills Animator,” and is capable of creating realistic videos that accurately and realistically mimic human conversational behaviors. The VASA model can create videos that look completely real, with “realistic talking faces” mirroring conversational behaviors through natural facial gestures, eye and head movements, all starting from a single static head image. The team used the VoxCeleb2 dataset, which includes videos of thousands of real-life celebrities, to train their model. Their model was distinguished by its ability to deal with diverse inputs outside the training domain, such as artistic images and non-English speech. While the model's capabilities raise impersonation concerns, the scientists stress that their goal with the tool is to develop virtual characters' visual emotional skills, not to impersonate anyone in the real world. Microsoft confirms that there are currently no plans to release the code supporting the model, and aims to use the technology responsibly and in accordance with appropriate regulations in the future. Read also: <a href="https://www.arabiaweather.com/ar/content/%D8%A7%D9%84%D8%B5%D9%8A%D9%86-... is drowning in dust... How so?</a> <a href="https://www.arabiaweather.com/ar/content/%D9%81%D9%8A-%D8%A7%D9%84%D9%8A... World Earth Day, frequently asked questions about...</a><hr /> Sources: <a href="https://interestingengineering.com/">Interesting Engineering</a>

This article was written originally in Arabic and is translated using a 3rd party automated service. ArabiaWeather is not responsible for any grammatical errors whatsoever.

Browse on the official website

Video | Mona Lisa raps...but how?

See More