Four, five-channel, IID-based pan pot algorithms were described that were optimized for constant gain, constant power, approximate velocity and energy vector equality (optimal), and avoidance of azimuthal aliasing (Moorer). A fifth hybrid algorithm was developed as a compromise between the constant power and optimal optimizations. All five algorithms were compared analytically using the four optimization methods.
The reader will recall that the "optimal" algorithm used here was created as follows. An engineering approximation first was made to Gerzons three-channel algorithm in that its velocity and energy vectors did not match exactly for all angles. This approximation, an original contribution, also was applied to Gerzons four-channel pseudo-optimal version. Finally, both the three- and four-channel algorithms were combined piecewise to make a single, five-channel "optimal" pan pot.
A listening test was developed using the majority of Gerzons pan pot criteria and several standard listening test design considerations. All algorithms were tested except the linear, constant gain pan pot. In the stationary panning tests, the optimal algorithm was found superior in the center listening position and the constant power algorithm slightly better in the off-center position. The hybrid algorithm was found to be no better than either of these algorithms. The constant power, optimal, and hybrid algorithms performed very similarly within the scope of the moving panning tests. The Moorer algorithm, constrained with zero 2nd spatial harmonics, showed poor performance in both the stationary and moving panning tests given their respective limitations.
The constant power and optimal algorithms were implemented in software as a DirectX audio plug-in. Given the constraints of Sonic Foundrys Sound Forge host application and Plug-in Developers Kit, the pan pot was designed as a mono in, mono out device with a virtual output switch.
In this section, conclusions are drawn from the project in its entirety. For detailed interpretations of the listening test results, the reader should consult the analysis and interpretations sections of Chapter 4. Of course, many project-wide interpretations are based on analysis of these results. It is unclear how applicable the results of the listening tests are to other room and speaker configurations, especially considering the localization differences found in Griesingers study described in Chapter 4 [2]. While repetition of our experiment under anechoic conditions may yield reproducible results, one would need to be very cautious in applying interpretations of these results to surround sound panning in real rooms. The listening test in this project represents surround sound reproduction in the case of a single, real world environment.
The first conclusion is that the constant power pan pot has remained a favorite in the audio industry for good reason. It is relatively simple to implement and performs fairly well in stationary and moving panning applications in both center and off-center listening positions. Recording engineers and listeners are accustomed to intensity stereo panning based on the constant power, sine-cosine law. The disadvantage of constant power panning was found to be front-back confusion for sound sources panned behind the listener.
The five-channel optimal algorithm performed extremely well in the center listening position. The disadvantages of this algorithm are its increased complexity and the existence of front-back reversals for the off-center listening position only. Further comparison between the constant power and optimal algorithms is necessary. Phantom image width should be examined for both algorithms in the stationary tests. The moving pan tests also should be redesigned to better differentiate between the algorithms. Experimental design improvements necessary for such further testing are suggested below.
It is not clear if the optimal algorithms good performance may be attributed to its optimization method. The lack of front-back confusion in the center position may have been caused primarily because more than two speaker channels were typically active. (See Figure 3.15. At q pan = 180° , only the two surround channels have non-zero gains. For all other angles between the surround speakers, three channels are active.) This increase in the number of active channels and the algorithms approximation to constant power behavior may have been the true reasons for its good performance. The validity of velocity and energy vector theory remains unverified.
The Moorer algorithm did not meet tested pan pot criteria given the scope and limitations of the listening test. Two reasons may be conjectured for its poor performance. Moorer stated that the 1st order spatial harmonic can not be recreated if any of the angles between successive speakers is greater than 90° . The angle between the surround left and surround right speakers in this project was 120° . Recall from Figures 3.17 and 4.10b that most of this algorithms problems existed in the region between the two surround speakers. It is very possible that this panning law could not work as designed because of the chosen loudspeaker arrangement. Of course, this likely means it is unsuitable for home theater speaker set-ups in which the angle between the surrounds is at least 180 ° (see Figure 4.5).
The second reason has to do with the constraint chosen to solve the original, underdetermined matrix. Setting the 2nd order spatial harmonics to zero is but one of a universe of constraints on the matrix. Other constraints and hence other solutions to the channel gains are possible and may perform better.
Several design changes to the listening test in this project are recommended if it was to be repeated. The test should be repeated using the actual Dolby recommended speaker set-up from Table 4.9. Conducting the test in several acoustically different rooms would yield a family of test results from which to compare different panning methods. Symmetry of localization can be assumed for the left and right sides of the circle surrounding the listener. Noise bursts need only be panned between 0 and 180 degrees for the test, cutting experiment time and the number of results generated in half.
Acoustically transparent screens should be placed between the listener and the loudspeakers. One screen could be placed between the listener and the front speakers and another placed between the listener and the surround speakers. This set-up would have a convenient walkway between the screens for subjects entering and leaving the experiment room.
Presentation of the test signal also should be changed. Signals of longer duration should be used for the motional head tests. Three-second signal durations should be fine. Listeners also need more time between noise bursts to write down their answers. The duration of silence between noise bursts therefore should be increased to about five seconds for the stationary pans and ten seconds for the moving pans. For finer discrimination between algorithms in the moving pan tests, arcs of less than 90º should be used. As noted in "Moving panning" from Chapter 4, additional tests would be useful for determining the moving pan speeds at which the auditory event is perceived to change character.
It would be desirable to use real musical instruments as sources in addition to the noise bursts. Hartmann found that transient and broadband noise signals helped localization skills [85]. Zudock used the guiro, a rhythm instrument, as a signal source while studying distance cues [86]. The guiro was chosen because its impulsive and unpitched sound conformed to Hartmanns recommendations.
As noted in "Criteria for Evaluating Pan pots" from Chapter 3, a pan pot or surround sound encoder should be compatible with another surround sound system having a different number of channels/speakers. Recording engineers working in multichannel audio currently are faced with the task of remixing material into 5.1, stereo, and sometimes mono mixes. A multichannel pan pot should be designed such that a 5.1 channel mix made with it could be converted automatically into a tolerable stereo mix. Panning algorithms were not compared using this criterion in this project. Testing such surround sound system compatibility would require further listening tests of 5.1 channel mixes made with different multichannel panning algorithms and mixed down to stereo using different conversion matrices (which are related to the panning algorithms).
One recommendation is relevant to analysis of the listening tests. Recall that the ideal localization blurs/half-intervals are functions of both azimuth and signal source. The ones used in this project corresponded to another signal and were constant with azimuth. Ideally, best-case localization blurs should be known a priori for the signal used in the listening test across all angles.
Extension to N loudspeakers. Each of the algorithms studied here may be extended to the case of more than five speaker channels with varying degrees of ease. Adapting the constant gain and constant power algorithms to more than five speakers is straightforward. Extending Gerzon's vector optimal algorithm to anything more than three speakers will require new approximations to ideal vector equality. The present experiment showed that a reasonable approximation to vector equality can still lead to a panning algorithm that performs well. Changing the Moorer algorithm to a different number of speakers means changing N in the azimuthal optimization equations and finding more constraints to apply to the underdetermined matrix. Moorer mentioned that constant power could be used as another constraint if desired [44].
New panning algorithms. Shortly after conducting the listening tests, the author considered a very simple method of designing a pair-wise, IID-based algorithm using an experimental rather than theoretical optimization. This pan pot design method is extremely simple, seemingly immune to intellectual property protection, and should be superior to constant power panning for multichannel applications. Note that the experiment used for this design method would be somewhat comparable to Griesingers experiment for a stereo loudspeaker set-up [2].
The listening test environment would be configured as before but with the changes in set-up and signal presentation noted above. A recording would be played with noise bursts "panned" between adjacent speakers in steps of 3 dB. (Finer steps are also possible but may generate more data than necessary.) Based on Theile and Plenges study of lateral phantom sources [75], we would expect that a difference of no more than about 36 dB between adjacent speaker channels should result in localization at one speaker location or the other. (This was the case for speakers that were no more than 90º apart.) If 3 dB steps were used as in their experiment, the noise bursts panned to these 36 / 3 * 5 / 2 = 30 locations would be randomized as before. The stationary panning section of the listening test would proceed as before and subjects would write down the perceived azimuth for each of the phantom image locations.
Design of the panning algorithm would proceed directly from the listening test results. The design engineer simply interpolates between the mean azimuths to determine the gain difference in dB that corresponds to each desired azimuth for the panning algorithm. Care must be taken in interpolating gain differences for azimuths to the rear of the listener because of the likely front-back reversals in this region.
While not truly a panning technique, technology does exist for converting two-channel, binaural recordings for playback over more than two loudspeakers. Mori et al. [87] developed methods for (1) synthesizing stereo, binaural cues from multitrack material made without using an artificial head, and (2) reproducing real or artificial binaural signals not only through two loudspeakers but through four loudspeakers. Listening tests of the four-channel "Q-Biphonic" system showed superior localization and sound image quality as compared with intensity stereo (presumably constant power panning). Their binaural methods should be investigated for possible application to a five-channel panning algorithm.
It may also be possible to apply phased array radar theory to panning. This theory tells us that the directionality pattern of two or more antennas may be "steered" by changing the phase difference between them [88]. This occurs because their pattern of constructive and destructive interference is altered by phase differences. Monopole loudspeakers may be considered as monopole, acoustic transmission antennas. A panning algorithm could be designed that alters the relative phase difference between two or more speaker channels and guides the equivalent directionality pattern within the circle of speakers. The resulting localization would be in the direction from the center of the speaker circle towards the area of constructive interference. This method of course would not be an IID-based algorithm.
Finally, the interested reader should refer to a newly proposed IID-based panning algorithm called vector base panning (VBAP) [89]. This reference was discovered too late to be included in this project.
Several optimizations and additional features are possible for a more complete plug-in implementation. Any new optimizations, features, and user interface improvements should be added as part of a more robust software development process that includes formal specification, testing, and usability testing.
The most obvious optimization would be interpolation of the channel gains from a wavetable rather than recomputing them before each input buffer is processed. Symmetries in the panning curves should be exploited to reduce the size of the wavetables. Reducing processing time for each input buffer using wavetable interpolation should facilitate the use of this plug-in in real-time applications.
The most desired new feature would be the removal of the constraint for a single output for our multichannel pan pot. This will not be possible until host applications become friendlier to multichannel audio and accept plug-ins with more than two outputs. While DirectShow places no constraints in this area, Sonic Foundry will most likely have to produce a new PIDK allowing for this possibility.
Other features are possible and would necessitate user interface changes. A panning control that appears as a knob rather than a slider would conform better to user experiences with rotary pan pots. A second knob could be added that spins the entire multichannel mix around the listener. Other controls could affect front to back or left to right image placement. Panning could also be controlled physically through the use of a standard joystick or one of several MIDI controllers. Users also may want to change the panning angle over time in creative ways. An envelope or LFO that modulates the panning angle could be included in the plug-in for this purpose. However, these features would be unnecessary if future host applications can automate plug-in parameters such as panning angle.
Most pan pots in professional audio equipment only control the horizontal azimuth of the phantom image. A more generalized pan pot would allow control of source distance, image width or angular size, Doppler shifting (for moving sound sources), and, if the surround sound system allows for it, elevation. (Note that Doppler shifting is more useful for sound effects than musical sounds because applying the effect to the latter would change their pitches relative to the non-moving musical sounds [11].) Of course, any technology that tries to simulate all spatial information present in a real environment also must include artificial reverberation. Gerzon investigated the design of pan pots that control distance [90] and image width [42] as well as azimuth. Horbach [91] also examined controlling image width. Chowning simulated moving sound sources by controlling their azimuth, distance, and Doppler shift [92]. Moore [37] and Dodge and Jerse [93] considered several aspects of panning, including distance cues.
The market for surround sound systems and audio content will grow in coming years. Pan pots are only one technology that must be rethought for recording audio in the new format. While some audio effects like compression scale easily to five channels, others like reverb may have to be completely redesigned. Special microphone arrangements and corresponding pan settings will be developed. New recording philosophies surely will come about because of the new creative tools.
As the audio delivery format is moving to multichannel, the recording studio is moving more to digital technologies. More DSP power will be necessary to support what once were two-channel tasks in these multichannel audio signal processing devices. This can be accomplished by increasing speed of DSP chips at least by a factor of 2.5 (vs. two channels) or by running more than one DSP in parallel. If further increases occur in standard word lengths and sampling rates, higher performance DSPs will of course be necessary.
Surround sound has the very real potential of enhancing the listening experience. This is an exciting time to be an audio design engineer because it is an exciting time to be a listener.
(Previous Chapter) <- Main Page -> Next Chapter
Jim West, University of Miami, Copyright 1998