AES Give Peaks a Chance: Loudness Wars Panel
October 9, 2014, Audio Engineering Society Convention, Los Angeles—A panel of experts considered the issues associates with loudness. Thomas Lund moderated the panel comprised of Florian Camerer, Bob Ludwig, Geroge Massenburg, and Susan Rogers.
Lund opened wit comments on the existing cultural heritages. The record business has been moving to hyper levels, resulting in over 10 percent distortion on CDs. The overuse of lossy compression makes the level compression and distortion even greater. Overall, the peak loudness ratio, headroom, and loudness range are being abused. Dynamic range is not part of the issue, since it is only related to signal-noise ratio.
The peakloudness ratio grew between '51 and '84 to 16 dB. Now, many songs have a p-l ratio of less than 9 dB, even though the minimum for good audio is 11 dB. The problem is that many of the newer mobile devices only have 9 dB of headroom. One good development is that iTunes can be set for auto level set which tries to make all the average levels the same and reduces the volume differences between songs.
New standards are coming from Europe to require level limiting on mobile devices. The standard levels address the ongoing erosion of playback gain in the smaller devices. The peak-level ratio in much of the new music is being compressed, going from 17 dB to 10 dB due to overuse of compression to make the overall sound brighter. The result is that the peak short-term loudness is dropping, and the PLR is functionally dependent on the average and peak levels in the piece. The new standard addresses this issue by defining a transient loudness maximum.
Rogers presented data from Berkelee College of Music showing increases in hearing loss due to hyper compression. These hearing losses are due to changes in the auditory processing path that are the result of listening to sounds at too high a level. The hyper compression reduces transition and attention markers in the hearing paths.
It is easier to listen for a longer time to content with low peak ratios, but this listening mode can lead to increasing the SPL to over 95 dB, much above the threshold for hearing damage. These assaults on the hearing challenge the value of an audiogram, since the test is with a pure sine in a quiet setting. At high SPL levels, the hearing is exposed to traumatic and threatening levels that impair hearing for about 24 hours, for example, an exposure to a 100 dB(A) level for 2 hours.
Sound professionals are developing new test techniques to check the full auditory path and response times with a frequency following response. Tests on mice, cats, and gerbils at 100 dB SPL in the 8-16 kHz range show interesting results. After exposure to the high sound levels, the animals passed an audiogram the next day, but suffered some nerve damage. The more interesting result was detection of ongoing losses in the following weeks, confirmed with post mortem evaluation of the hearing nerves.
Hearing has three thresholds, and the high-threshold fibers were the ones that are most damaged. The physiological response I to increase the gain in all the following stages to compensate. In humans, this may be a possible implication in developing tinnitus. The loss of high threshold makes it harder to perceive pitch, speech in noise, and speech in reverb. Normally, fibers phase lock and sync across a bundle of nerves, which cannot happen when some of the nerves are damaged.
Aging is another contributor to nerve loss, and by 91, people have lost a third of all nerves. Most aging loss is focused on the fine structures, and the loss is cumulative, indicated by a gradual loss of hearing. The fine structures are responsible for processing the higher frequencies, in humans between 2-5 kHz. Overexcitement causes an overload of the synapses. It is possible that an 80 dB SPL may have long-term effects.
Massenburg noted that in a-b tests of CDs versus MP3s, the more compressed formats had much higher distortion at the same SPL. High dynamic range music comes from better mastered samples. Creators can make better records by going to higher dynamic range.
Ludwig decried the reduced headroom over time. In the '80's, the headroom was about 20 dB, dropping to 14 dB in the '90's and now down to about 3 dB. Some pop songs run 6 dB higher, but cinema is operated at 24 dB. It is possible to get good headroom and dynamic range even in the sharing services. The master for iTunes calls for a 24 bit setting and master for SoundCheck. SoundCheck references tp a -16.5 dB average target. Some groups try for 0 compression so the song gets full dynamic range when mastered. This output is possible if the content is normalized at ingest.
Camerer added details from the European standards perspective. The loudness in broadcast depends on normalization and a balanced mix. The standards changed from a peak focus to an overall loudness normalization in TV. The ITU 1770 document R128 defines the technical parameters, whole acknowledging that the perception of loudness is subjective. The three main audio parameters are program loudness, maximum true peak level, and total loudness range. New parameters are maximum momentary loudness and maximum short-term loudness.
Metering for the new parameters are integration time of 400 ms for momentary, and 3 s for short term. These measurements are to prevent creators from gaming the system and putting a large spike of sound in an otherwise acceptable piece. The target for TV is -23 LUFS +/- 0.5 LU. For live content the spec is loosened to +/- 1 LU. In the US, the equivalent is -23 LUF +/- 2 LU. Another new feature is the capability to have average levels less than -23 LUFS for commercials with background only in a deliberately low setting.
Radio is harder to set levels since many stations mix analog and digital formats. If one were to normalize the archives, the differences would be acceptable. The overall goal is to get al platforms to normalize to a standard. Reduce processor aggressiveness and use loudness and not peak normalization for the best results.