Microsoft has registered a new patent on the use of AI-generated music/soundtracks/audio in an extensive range of media, including movies, video games, live recordings and related fields. The patent is titled ‘ARTIFICIAL INTELLIGENCE MODELS FOR COMPOSING AUDIO SCORES‘ and was published on 17th November 2022. And MICROSOFT TECHNOLOGY LICENSING, LLC is the applicant of the patent. 

The patent’s description explains how real-time audio can be generated using AI System built with the help of large data sets that will include machine learning techniques using visuals, audio, and text (prompts). You can have a look at the exact description of the patent below.

A method for training one or more AI models for generating audio scores accompanying visual datasets includes obtaining training data comprising a plurality of audiovisual datasets and analyzing each of the plurality of audiovisual datasets to extract multiple visual features, textual features, and audio features. The method also includes correlating the multiple visual features and textual features with the multiple audio features via a machine learning network. Based on the correlations between the visual features, textual features, and audio features, one or more AI models are trained for composing one or more audio scores for accompanying a given dataset.

According to the patent, this new technology will help the system generate audio in real-time according to the situation, or more simply, it will help generate dynamic/adaptive audio. It is interesting as this technology will set apart the experience for every individual based on their choices and situation in a video game, if we consider video games as an example of implementing this technology. 

Microsoft’s new AI for audio has the potential to go far beyond the conventional use of dynamic/adaptive music in games. Player actions can be dynamically scored in real-time with appropriate audio cues and music. As a result, the audio experience would differ from person to person.

For example, we use pre-recorded background scores and audio in video games and movies that have been recorded according to the pre-decided situation the user will face in a particular game or movie.Still, video games use more AI technology than movies; in video games, many areas are already implementing AI, whether it is about the player’s interaction with an NPC or a primary level of dynamic audio based on the player’s movements.

On the other hand, movies are more rigid compared to video games, as every aspect of a movie is pre-decided and pre-recorded, and nothing changes in real-time for the audience. So, as the patent describes it, this new technology can be revolutionary in the media field. It will change everything, and players or audiences will feel Invested and immersed more than ever in the media they consume. 

It is not that far-fetched either, if we think about it realistically, as AI technology has evolved significantly in recent years, from just using AI for targeted advertisements to generating ultra-realistic photos and videos with a single line of text; the technology has come a long way and sooner or later it will be implemented in all the areas of the media industry to make lengthy processes automatic. 

It will be fascinating to see something like AI-generated soundtracks in real time. So, what are your thoughts about this? Are you looking forward to experiencing something like this? Let us know in the comment section below.


