Audio algorithms and how we watch (and read) TV

More people use subtitles with TV shows because algorithms for audio have changed:

Photo by Nothing Ahead on Pexels.com

Specifically, it has everything to do with LKFS, which stands for “Loudness, K-weighted, relative to full scale” and which, for the sake of simplicity, is a unit for measuring loudness. Traditionally it’s been anchored to the dialogue. For years, going back to the golden age of broadcast television and into the pay-cable era, audio engineers had to deliver sound levels within an industry-standard LKFS, or their work would get kicked back to them. That all changed when streaming companies seized control of the industry, a period of time that rather neatly matches Game of Thrones’ run on HBO. According to Blank, Game of Thrones sounded fantastic for years, and she’s got the Emmys to prove it. Then, in 2018, just prior to the show’s final season, AT&T bought HBO’s parent company and overlaid its own uniform loudness spec, which was flatter and simpler to scale across a large library of content. But it was also, crucially, un-anchored to the dialogue.

“So instead of this algorithm analyzing the loudness of the dialogue coming out of people’s mouths,” Blank explained to me, “it analyzes the whole show as loudness. So if you have a loud music cue, that’s gonna be your loud point. And then, when the dialogue comes, you can’t hear it.” Blank remembers noticing the difference from the moment AT&T took the reins at Time Warner; overnight, she said, HBO’s sound went from best-in-class to worst. During the last season of Game of Thrones, she said, “we had to beg [AT&T] to keep our old spec every single time we delivered an episode.” (Because AT&T spun off HBO’s parent company in 2022, a spokesperson for AT&T said they weren’t able to comment on the matter.)

Netflix still uses a dialogue-anchor spec, she said, which is why shows on Netflix sound (to her) noticeably crisper and clearer: “If you watch a Netflix show now and then immediately you turn on an HBO show, you’re gonna have to raise your volume.” Amazon Prime Video’s spec, meanwhile, “is pretty gnarly.” But what really galls her about Amazon is its new “dialogue boost” function, which viewers can select to “increase the volume of dialogue relative to background music and effects.” In other words, she said, it purports to fix a problem of Amazon’s own creation. Instead, she suggested, “why don’t you just air it the way we mixed it?”

This change in how television audio works contributes to needing subtitles to understand what is being said.

I wonder if the bigger question is whether this significantly changes how people consume and are affected by television. If we are reading more dialogue and descriptions, does this focus our attention on certain aspects of shows and not others? Could this be good for reading overall? Does it limit the ability of viewers to multitask if they need to keep up with the words on the screen? Do subtitles help engage the attention of viewers? Do I understand new things I did notice before in the world with fewer subtitles? Does a story or scene stick with me longer because I was reading the dialogue?

Does this also mean that as Americans have been able to buy bigger and bigger TVs for cheaper prices, they are getting a worse audio experience?

The difficulties in creating viral audio clips

Why listen to audio on the Internet when you can read an article or watch a video? This is the problem in creating a viral audio clip:

In a provocative piece for Digg on viral sound, reporter  Stan Alcorn asked Reddit cofounder Alexis Ohanian: “Why does the Internet so rarely mobilize around audio? What would it take to put audio on the Reddit front page?”

Since audio is, of course, our business, we asked Stan Alcorn to make us an audio version (listen above). We want our work to be sharable – and so we’ve decided to be proactive…

As Stan reports, there are certainly exceptions to the rule. For instance, the audio I share usually falls into a few different categories: Isolated David Bowie vocals, super-awkward studio outtakes with Art Garfunkel, and angry phone messages to reporters about drones. (As a journalist, I think the last one is my favorite. “DON’T YOU SUPERVISE THE SUB EDITORS WHO WRITE THESE HEADLINES!?”)

There’s also plenty of stuff that Marketplace has done that I would hope could go viral.

The key here is that audio just seems to take more time to get to the point. With an article or video, you can leave it quickly and plenty of watchers do: they check out the first few seconds, see if it catches their attention, and then either engage further or move on. Audio is more of a mystery. What might happen next? This is something that people who love radio talk about all the time, all of the “theater of the mind” stuff. I’m trying to imagine what might have happened if the Internet had been invented during the golden age of radio, roughly the 1930s and 1940s, and if the Internet could have been an audio medium rather than a primarily visual medium.

It will be interesting to see if any of the Marketplace audio clips submitted at the end of this story could go viral…