Earlier this year, Spotify introduced clickable cards for its digital audio ads, enabling users to take action after hearing promotional content. Considering that audio has long been a one-way street, where people can only listen without being able to participate, this is a considerable step toward more interactive audio.

Currently, the average person consumes 97 minutes of audio a day in the United States. That’s a significant amount of time to speak to, and with, users. However, it is only recently that brands have recognized this potential. One of the factors helping them do so is synthetic audio, where text-to-speech services generate human-sounding voiceovers that can be edited and deployed on a mass scale. Already, the synthetic voice market is valued at $14.8B and projected to grow up to $36B by 2025. 

Synthetic audio also means that brands can develop numerous versions of the same advert, allowing them to personalize and test which audio best resonates with certain demographics. The automated nature of the tech lets brands tap into events in real-time, creating audio specifically for a current news story or sporting event. But there’s still a long way to go before being able to leverage audio in this manner. Here’s what needs to happen for interactive audio to become the norm in advertising.

AI will drive more versioned, personalized audio ads

The visual medium of advertising is full. How many times have you visited a website and been overwhelmed with an avalanche of ads competing with one another and on-screen elements?

Audio is increasingly attractive for brands to advertise because it can be delivered in a non-confined space. It’s also hands-free, so it’s more flexible for users, and easily blends into people’s daily routines – for example, while listening to your morning podcast or your playlist at the gym. Integrating into people’s habits makes audio a more sustainable way to connect with listeners, especially when the content is tailored to a particular listener and given to them at the right time. 

At the moment, the absence of versioning infrastructure in audio advertising prevents effective personalization. Brands cannot schedule ads for a certain time of day – nor can they alter the speaker, music track, or words depending on the listener. However, using AI-powered synthetic voice, companies can make multiple versions of the same promotion. For example, a pizza franchise offering customers a discount could implement a spreadsheet to produce a different audio version for each location, day of the week, and hours the offer should be shared. The company would then geo-target the ads to people who live close to franchise spots. 

Audio ads deliver more value

The pandemic has spurred fresh demand for brands to offer genuine value to their customers. People are more selective with their time and attention, and advertising that doesn’t – quite literally – speak to users’ personalities and interests may as well be white noise.

Versioned audio ads can tap into people’s preferences more specifically. For instance, a hip-hop enthusiast could have hip-hop tracks in the background, tying the brand experience closer to the listener. Likewise, after lockdowns and COVID-19 restrictions, people are eager to spend more time outside and are keen to learn through different mediums. Audio can leverage this shift because it’s more dynamic and mobile than traditional advertising and can rely on the likelihood of the listener being in nature or using headphones.

Synthetic voice ads can also address ethical dilemmas that have arisen about fair use terms for voice actors in the digital age. In verticals like gaming, the tech can scale voice actors’ content, giving them royalties and recognition, and greater access to passive income streams. Bev Standing, the voice actor who filed a lawsuit against TikTok for using her voice without permission, uses our AI tool exactly for this reason.

Audio content contributes to more inclusive content

Accessible content is more important than ever before as people with diverse abilities need to use digital tools. Websites, apps, and experiences that aren’t designed with all people in mind automatically restrict their audience pools. In fact, Americans with disabilities are three times more likely to say that they never go online than people without a disability. 

Audio advertising can make experiences more inclusive by serving people who have visual impairments. For example, audio-described ads have been around for a while, where a narrator describes the non-verbal action while the original audio is played at a slightly lower level. Moving forward though, audio can do more than simply market products. 

Research is being conducted into audio content designed in the range of frequencies that people suffering from tinnitus are unable to hear. Over time, if sufferers are exposed to the audio, they may regain sensitivity to those sound frequencies. If the research is successful, there’s huge potential for the likes of hearing aid brands to use the audio in their advertising strategies.

Accessible audio content can also support people experiencing screen fatigue. Audio advertising provides audiences the freedom to do other tasks, such as working out or traveling, while absorbing information and keeping up on products and services that could genuinely improve their lives.

There’s no doubt that the conversation around voice content is getting louder in advertising, especially as brands seek new ways to engage people and break the linearity of talking to listeners. Synthetic voice technology exists for companies to tailor, scale, and automate their audio offerings, but the industry needs to ensure that it is easily integrated into different platforms. If, and when, brands adopt this technology, it will ultimately amplify their value and accessibility.

Disclosure: This article mentions a client of an Espacio portfolio company.


This article was authored by Dr. Timo Kunz, the co-founder, and CEO of Aflorithmic Labs – the world’s first fully automatable solution for end-to-end voice and audio creation from text