Google Unveils Major Updates to Generative AI Media Models on Vertex AI

Google has announced a series of significant updates to its generative AI models for media, available through its Vertex AI cloud platform.

Lyria: Text-to-Music Model Now in Beta

One of the most prominent updates is the release of Lyria, Google’s AI model specialized in transforming text into music. The model is now available in beta for a select group of users. Lyria is being positioned as a practical alternative to royalty-free music libraries, allowing users to compose musical pieces across a wide range of genres—from mellow jazz to simple electronic tracks.
Here is a little demonstration of what Lyria can do:

<br>

Veo 2: Enhanced Video Generation

Google’s video generation model, Veo 2, has received significant improvements. New tools now allow for greater flexibility in editing and customizing visual effects, offering users more creative control over their video content.

Chirp 3: Advanced Voice Cloning and Multilingual Support

The company also launched a voice cloning feature powered by Chirp 3, its latest model for sound understanding and analysis. Chirp 3 can generate speech in up to 35 languages and supports Instant Custom Voice, a feature capable of mimicking a person’s voice using just a 10-second audio sample. This functionality is now available to all users.

Imagen 3: Improved Image Generation Quality

The Imagen 3 image generator has been updated to significantly enhance the quality of its outputs. According to Google, the improvements result in clearer, more detailed, and visually appealing images.

New Tool: Transcription with Diarization

In addition, Google introduced a new tool currently in preview called Transcription with Diarization. This tool can identify and separate voices in recordings with multiple speakers, improving the accuracy of text transcriptions derived from audio content.