Google DeepMind's latest model promises realistic audio for your AI-generated vids

Tan KW

Publish date: Wed, 19 Jun 2024, 07:54 AM

Video Mainstream adoption of generative AI technologies has, in large part, centered around the creation of text and images. But, as it turns out, the statistical probabilities on which these models are based are just as good at generating all manner of other media.

The latest example of this came on Monday when Google's AI lab DeepMind detailed its work on a video-to-audio model capable of generating sound to match video samples.

The model works by taking a video stream and encoding it into a compressed representation. This, alongside natural language prompts, acts as a guide for a diffusion model which, over the course of several steps, refines random noise into something resembling audio relevant to the input footage. This audio is then converted into a waveform and combined with the original video source.

As we understand it, this approach isn't all that different from how image generation models work, but, rather than emit pictures or illustrations, it's been trained to reproduce audio patterns from video and text inputs.

Here's one of several samples DeepMind released this week showing the model in action:

DeepMind says it used a variety of datasets which not only included video and audio, as you might expect, but AI-generated annotations and transcriptions to help teach the model to associate various visual events with different sounds. This, the researchers explained, means that the model can generate audio with or without a text prompt and doesn't require manual alignment of the tracks. However, there are still some hurdles to overcome.

For one, because of how audio is generated, the actual quality of the soundtrack is dependent on the source material. If the video quality is poor, the audio is likely to be as well. Lip sync has also proven to be quite challenging, to be polite.

DeepMind expects the new model to pair nicely with those designed for video generation, including its own in-house Veo model.

According to the DeepMind team, one of the problems with the current crop of text-to-video models is that they usually are limited to generating silent films. Combined with its video-to-audio model, the DeepMind team claims that entirely AI-generated videos, complete with soundtracks and even dialogue, is possible.

Speaking of video-gen models, the category has grown considerably over the past year with more players entering the space.

ML juggernaut OpenAI unveiled its own video-generation model called Sora back it February. But Sora is just one of several models pushing the envelope of what's possible.

Kling AI

Among these models is one from Kling AI. Developed by (partially state-owned) Chinese tech firm Kuaishou, Kling uses a combination of diffusion transformers to generate the frames and a "3D time-space attention system" to model motion and physical interactions within the scenes. Here's the system in action:

The results are videos that, if you don't look too closely, could easily be confused for human-generated video footage. However, on closer inspection, you'll quickly start to notice visual artifacts and incongruities. With that said, this seems to be a common theme with many of the AI video generators on the market today.

While details on Kling are scarce, its developers claim it's more capable than OpenAI's Sora. The model can supposedly produce videos up to two minutes in length at resolutions of 1080p and 30 frames per second. Unfortunately, access to the model is, for the moment, limited to China.

Runway

Another model builder working on video generation is Runway which, on Monday, revealed its Gen-3 Alpha model. Runway has been working on a number of image and video generation models going back to early 2023.

According to Runway, Gen-3 Alpha is one of several models currently under development and was trained on a combination of videos and images paired with highly descriptive captions. According to the startup, this allowed them to achieve more immersive transitions and camera movements than was possible with previous models. Here's this one in action:

The model will also introduce safeguards to prevent users from generating unsavory images or videos, or so we're told. Runway plans to build the latest model into its existing library of text-to-video, image-to-video, and text-to-image services as well as work with industry partners to create custom versions.

Pika

These AI video upstarts have clearly got investors' attention. Earlier this month, Pika picked up $80 million in Series-B funding from Spark Capital and others to accelerate development of its AI video generation and editing platform.

The team's 1.0 model, which
launched in beta last July, supports a variety of common scenarios including generating videos based on text, image, or video prompts. Over the past few months, Pika has added support for fine-grain editing, in-painting-style adjustments, sound effects, and lip sync. Here's where it's up to, below:

Similar to the popular image-gen model Midjourney, users can interact with Pika's AI video service through Discord, or through the startup's web app.

This, of course, is by no means an exhaustive list, and we expect to see many more models and video-generation services crop up over the next few months, as AI devs' ambitions push beyond text and image models. Let us know below if you've seen any related stuff like this. ®

https://www.theregister.com//2024/06/18/google_deepmind_video/

Discussions

Be the first to like this. Showing 0 of 0 comments

Featured Posts

MQ Chat

New Update. Discover investment communities that resonate with your ideas

Latest Videos

MQ Market Updates - 27 June 2024

MQ Trader

Apps

MQ Chat

Send individual or group chats with anyone on i3investor

MQ Trader

Earn MQ Points while trading with MQ Trader

MQ Affiliate

Earn side income from Affiliate Program

MQdemy

Online learning and teaching marketplace

Hot Stocks Today >

JCY

JCY INTERNATIONAL BERHAD

1000

DNEX

DAGANG NEXCHANGE BERHAD

993

HLIND

HONG LEONG INDUSTRIES BHD

729

PTRANS

PERAK TRANSIT BERHAD

727

YTLPOWR

YTL POWER INTERNATIONAL BHD

697

MPI

MALAYSIAN PACIFIC INDUSTRIES

696

NOTION

NOTION VTEC BHD

618

MYEG

MY E.G. SERVICES BHD

554

GENTING

GENTING BHD

513

PBBANK

PUBLIC BANK BHD

509

Daily Stocks

HSI-CXF

0.08

-0.03

147,307,800

INIX-OR

0.02

0.00

141,839,900

HSI-CXV

0.11

-0.035

132,517,000

HSI-HWE

0.175

+0.02

109,641,200

HSI-HUZ

0.24

+0.05

107,181,300

MYEG

0.97

-0.05

101,501,300

YNHPROP

0.495

-0.11

94,446,300

AHB-WC

0.07

0.00

85,284,900

DNEX

0.44

-0.025

75,452,000

EDUSPEC-OR

0.005

0.00

75,002,700

More active Stocks

GESHEN

3.40

+0.29

29,200

THETA

1.73

+0.25

20,598,500

PENTA

5.08

+0.15

4,228,700

PTT

2.38

+0.15

3,074,400

AIRPORT

9.73

+0.13

2,520,900

PETDAG

17.26

+0.12

389,600

UTDPLT

24.20

+0.12

123,200

IJM

3.07

+0.10

15,946,200

F&N

31.82

+0.10

335,100

HLIND

11.22

+0.10

30,800

More gainer Stocks

DLADY

35.50

-0.30

17,500

AMBANK-C46

0.075

-0.275

1,400,000

HSI-CXU

0.295

-0.275

59,700

PCHEM

6.33

-0.19

2,008,700

ORIENT

6.97

-0.18

1,220,900

HEIM

22.20

-0.18

400,400

ICON

0.89

-0.16

12,816,000

KLK

20.52

-0.16

953,700

HSI-CXT

0.725

-0.135

23,600

NIKKEI-CC

0.065

-0.125

30,000

More loser Stocks

MQ Trading Signals

BUY
SELL

KLK

KUALA LUMPUR KEPONG BHD

2024-06-27 16:55:00

EMA 5

5 Mins

LCTITAN

LOTTE CHEMICAL TITAN HOLDING BERHAD

2024-06-27 16:55:00

EMA 5

5 Mins

CLMT

CAPITALAND MALAYSIA TRUST

2024-06-27 16:55:00

EMA 5

5 Mins

MAYBULK

MAYBULK BERHAD

2024-06-27 16:55:00

EMA 5

5 Mins

ENGTEX

ENGTEX GROUP BHD

2024-06-27 16:55:00

EMA 5

5 Mins

More Trading Signals

HPMT

HPMT HOLDINGS BERHAD

2024-06-27 16:55:00

EMA 5

5 Mins

KUB

KUB MALAYSIA BHD

2024-06-27 16:55:00

EMA 5

5 Mins

BORNOIL

BORNEO OIL BHD

2024-06-27 16:55:00

EMA 5

5 Mins

CAPITALA

CAPITAL A BERHAD

2024-06-27 16:55:00

EMA 5

5 Mins

EFFICEN

EFFICIENT E-SOLUTIONS BHD

2024-06-27 16:55:00

EMA 5

5 Mins

More Trading Signals

Featured Advertisers / Partners

Top Brokers >

AmEquities

Affin Hwang

Rakuten Trade

Hong Leong Bank

Books Review >

Ride The Bull Short The Bear

CS Tan

4.9 / 5.0

This book is the result of the author's many years of experience and observation throughout his 26 years in the stockbroking industry. It was written for general public to learn to invest based on facts and not on fantasies or hearsay....

Read More