Sound Reproduction: The State of our Science
October 9, 2014, Audio Engineering Society Convention, Los Angeles—Floyd Toole talked about changes in venues and sound systems, and the emerging science to characterize the loudspeakers to be best suited for those areas. Most speakers are measured in anechoic chambers with a single mike, which differs very dramatically from real installations and our ability to adjust our perceptions.
Most audio gets to us through a speaker, whether from transducers in cabinets or in our ears. No standards are available for a reference set of sounds, so all testing for sound quality is subjective. The industry needs a common base for speakers that sound similar. Loudspeakers need to be neutral. The use of "chosen" masters to select by ear impedes meaningful changes in speaker evaluations. The choice of various types of music prevents the use of standards, even though the major standards organizations are looking into updating and revising any existing standards.
Scientific measurements can take into account the common factors for subjective evaluations and make a technical measurement that correlates with the aggregate subjective responses. There are many common factors in subjective tests, but new tests need greater rigor—randomized sources and speakers to minimize room effects such as standing waves, and ways to set equal loudness across all the test speakers.
The problem is that few are using such detailed procedures to neutralize the room and position effects. The procedure requires moving speakers to the same location in the listening room. It turns out that people are better in evaluating speakers when listening to only a mono source. They get higher levels of differentiation and smaller distribution spreads. Testing indicates that good mono speakers win when configured as a stereo pair. Surprisingly, even in mono mode, a good speaker can differentiate and produce spatial content.
The issue with stereo is that the recording matters. Spatial and timbre are adjusted in the mix. Test reviewers attribute half of a natural sound to a feeling of space, 30 percent to the sound quality, and the balance to the timbre. The pleasantness of a speaker is 70 percent spatial and 30 percent sound quality, the lack of color in the sound and a clean timbre.
The sound space matters for testing. For example, a concert hall has a quality of envelopment. The efforts to replicate this quality has lead to multi-channel systems with ever higher numbers of speakers, up to 24 in the latest systems. The issues is that a higher number of channels improves directionality, but envelopment only needs fewer channels. For a sense of envelopment, a sound systems only needs speakers at +/- 30 and +/- 60 degrees. Workable alternatives include +/-30 and +/- 90, and +/- 30 and +/- 120 degrees. The typical stereo and quad systems are not good at these placements because the mix is not set to distribute the sound in the right patterns. Stereo can be enhanced by adding a speaker at 0 degrees.
The ability to distinguish between good and poor speakers decreases with hearing loss. This hearing loss is a concern for audio engineers, since long-term hearing loss is not noticed. One issue is that an audiogram is a test that checks for hearing thresholds of single, pure sounds. People change over time and decreased hearing leads to greater variability in the test subjects. A suggestion is to wear musician ear plugs whenever possible. Any sound level over 80 dB contributes to hearing loss.
Factors that affect speakers include resonances, which are timbrel building blocks for voices and instruments. Resonances modify the sounds and add color, so the problem is to consider the threshold for the effects. The program material affects the thresholds, so 1 dB is ok for rock but too high for jazz and classical music. Measurements indicate that the threshold for resonance detection is higher with pink noise because the sound is not a single frequency and is uncorrelated like music.
Room reflections add repetition of the original sounds to the listener. The threshold for detection is 10 dB lower than the same listener in an anechoic chamber or with headphones. Nevertheless, the room can add timbre and make the sounds richer. It is possible to add a little reverb in the mix to get a richer or sweeter sound.
Resonances, however, need to be removed at the speaker. Most speakers are designed to be minimum phase, so some resonances can be equalized out. You need to have a high-resolution frequency domain measurement system to set equalization properly. Solo instruments or singers are not useful, because they are acoustically dry. A band or orchestra is the best source and the content can be any genre.
The ideal procedure is to do A-B comparisons, with a "hidden" reference as part of multiple comparisons. This process accounts for shared problems like room acoustics and generate good data at the cost of long testing times. The problem is to separate the tumbrel contributions of the room from the speakers, and also separate program material from the speakers. The results are a full auditory scene analysis that separates the components of the sound field. The interesting aspect is that people overall are very good at rating, they write a lot if the speaker is not very good, and very little if the speaker is good.
People hear sounds in a room. They automatically take into account the timbre, direction, distance of the apparent sound source width, and the sense of envelopment. The problem for testers is that a single source measurement cannot match the responses of two ears and a brain. Therefore, a critical evaluation must measure in space and determine a complex transfer function. The tester has to generate a full polar response pattern for both horizontal and vertical distributions to correlate to the aggregate listener responses.
The speakers should be rotated in both axes to develop on- and off-axis listening windows. Tests need to work to eliminate early reflections and measure at the same sound power at the listening position. If the speaker exhibits poor off-axis response it shows up as a difference between the real and predicted room curves. The room acoustics dominate at frequencies below 200 Hz while the speaker shows up above 1 kHz.
The testing should change speaker locations and operate over a 40 dB range. The low frequencies will have separate peaks and a smaller room has a physical crossover of about 300 Hz. In places like homes and control rooms the sound pressure levels drops 3 dB for every doubling of distance form the source. The steady state changes over time and frequency are about 6 dB for a room close to the height of a person. Most of the sound will not be direct, since the surfaces absorb and reflect sounds.
Speaker-room interactions are a part of the evaluations. In most cases, the sub-woofer is omni-directional and all of the sound is reflected. The mid-range is a mixture of direct and reflected sound and the high end is very direct and directional. This frequency dependence is why off-axis performance matters. A composite measurement of frequency and sound levels can produce a directivity index (DI). This index measures the direct sound field above 1 kHz and any room early reflections.
SMPTE just released a report TC25 CSS that indicates a small room needs a center channel for dialog intelligibility. The DI response up to 5 kHz should be fairly flat, and excessive high frequencies are bad. This non-flat response can be equalized out through an analysis to predict the room characteristics and some acoustic treatments or positional changes of the speakers. You cannot equalize the room, due to reverb times, reflections, seat-seat variations, directivity, and frequency dependent absorption.
As a result, it is best to start with good speakers and focus on bass tweaks. It is possible to trade sound field efficiency with dead spots and standing waves to reduce most resonances. One way to adjust the low end is to have multiple woofers at the corners of the room or in the middle front and back. This increases the SPL by 12 dB. For any arbitrary shaped room, the key is to use multiple sub-woofers plus measurements and some signal processing.
This multiple sub and sound field measurement optimizes and gives a uniform response. The installation can mix sub-woofers and can have each one running on different levels. These practices are from anechoic data the has greater than 85 percent correlation to room performance. To the extent possible, you want a flat on-axis response.
Other room issues include parameter variations which don't couple for all modes. for a5.1 system, little matters if the low frequencies are handled with multiple subs. The problem is that multiple subs can exhibit time panning, with time differences of up to 1 ms appearing as virtual movement, and over 30 ms creates aliasing. The good thing is that the delayed sound has to be over 10 dB higher than the reference to show a change in position.
Spatial effects include image spreading and a precedence effect for the speakers. It is possible to detect the thresholds which change with the type of sound and locations of the reflections. The installer has to pay attention to the walls, ceilings, and any coverings. The way to measure is to use a broadband impulse response (FFT) of the room. Any dampening materials depend on the angle of incidence and thickness to be effective. These materials change the sound of the speakers due to reflections. A solutions is to use standard speakers in an off-axis mode.