 
                 Patent Application
 Patent Application
                     20140177844
 20140177844
                    Not applicable
Not applicable
Not applicable
The invention relates to a method and apparatus for visualizing the directional sound activity of a multichannel audio signal.
Audio is an important medium for conveying any kind of information, especially sound direction information. Indeed, the human auditory system is more effective than the visual system for surveillance tasks. Thanks to the development of multichannel audio format, spatialization has become a common feature in all domains of audio: movies, video games, virtual reality, music, etc. For instance, when playing a First Person Shooting (FPS) game using a multichannel sound system (5.1 or 7.1 surround sound), it is possible to localize enemies thanks to their sounds.
Typically, such sounds are mixed onto multiple audio channels, wherein each channel is fed to a dedicated loudspeaker. Distribution of a sound to the different channels is adapted to the configuration of the dedicated playback system (positions of the loudspeakers), so as to reproduce the intended directionality of said sound.
Multichannel audio streams thus require to be played back over suitable loudspeaker layouts. For instance, each of the channels of a five channel formatted audio signal is associated with its corresponding loudspeaker within a five loudspeaker array. 
If multichannel audio is played back over an appropriate sound system, i.e. with the required number of loudspeakers and correct angular distances between them, a normal hearing listener is able to detect the location of the sound sources that compose the multichannel audio mix. However, should the sound system exhibit inappropriate features, such as too few loudspeakers, or an inaccurate angular distance thereof, the directional information of the audio content may not be delivered properly to the listener. This is especially the case when sound is played back over headphones.
As a consequence, there is in this case a loss of information since the multichannel audio signal conveys sound direction information through the respective sound levels of the channels, but such information cannot be delivered to the user. Accordingly, there is a need for conveying to the user the sound direction information encoded in the multichannel audio signal.
Some methods have been provided for conveying directional information related to sound through the visual modality. However, these methods were often a mere juxtaposition of volume meters, each dedicated to a particular loudspeaker, and thus unable to render precisely the simultaneous predominant direction of the sounds that compose the multichannel audio mix except in the case of one unique virtual sound source whose direction coincides with a loudspeaker direction. Other methods intended to more precisely display sound locations are so complicated that they reveal themselves inadequate since sound directions cannot be readily derived by a user.
For example, U.S. patent application US 2009/0182564 describes a method wherein sound power level of each channel is displayed, or alternatively wherein position and power level of elementary sound components are displayed.
The method and system according to the invention is intended to provide a simple and clear visualization of sound activity in any direction.
In accordance with a first aspect of the present invention, this object is achieved by a method for visualizing a directional sound activity of a multichannel audio signal, comprising:
Preferably, for determining the contribution of each one of said directional sound activity vectors within sub-divisions of space, a norm of a directional sound activity vector is weighted on the basis of an angular distance between a direction associated with a sub-division of space and the direction of said directional sound activity vector, and for each sub-division of space, directional sound activity level within said sub-division of space is determined by summing the weighted norms of said directional sound activity vectors.
Preferably, determining the directional sound activity vector for a frequency sub-band comprises:
In accordance with a second aspect of the present invention, there is provided a non-transitory tangible computer-readable medium having computer executable instructions embodied thereon that, when executed by a computer, perform the method according to the first aspect.
In accordance with a third aspect of the present invention, there is provided an apparatus for visualizing a directional sound activity of a multichannel audio signal, comprising:
Other aspects, objects and advantages of the present invention will become better apparent upon reading the following detailed description of preferred embodiments thereof, given as a non-limiting example, and made with reference to the appended drawings wherein:
    
    
    
A directional sound activity analyzing unit 1 is illustrated in 
The directional sound activity analyzing unit 1 receives an input signal constituted by a multichannel audio signal. This multichannel audio signal comprises K audio channels, and each channel is associated with spatial information. Spatial information describes the location of the associated loudspeaker relative to the listener's location. For example, spatial information can be coordinates or angles and distances used to locate a loudspeaker with respect to a reference point, generally a listener's recommended location. Typically three values per audio channel are provided to describe this localization. Spatial parameters constituting said spatial information may then be represented by a K×3 matrix.
The directional sound activity analyzing unit 1 receives these input audio channels, and then determines directional sound activity levels to be displayed for visualizing the directional sound activity of a multichannel audio signal. The directional sound activity analyzing unit 1 is configured to perform the steps of the above-described method. The method is performed on a extracted part of the input signal corresponding to a temporal window. For example, a 50 ms duration analysis window can be chosen for analyzing the directional sound activity within said window.
First, a frequency band analysis 2 aims at estimating the sound activity level for a predetermined number of frequency sub-bands for each channel of the windowed multichannel audio signal.
For each channel, a sound activity level is determined for each one of said plurality of frequency sub-bands by performing a time-frequency transformation. The time-frequency transformation can be performed through a Fast Fourier Transformation (FFT).
The temporal windowing stage and the time-frequency transformation can be performed within a Short-Time Fourier Transformation (STFT) framework.
The frequency sub-bands are subdivisions of the frequency band of the audio signal, which can be divided into sub-bands of equal widths or preferably into sub-bands whose widths are dependent on human hearing sensitivity to the frequencies of said sub-bands.
The input channel signals xk[n] are windowed time-domain signals, wherein n is a time index. The channel index k identifies a channel of the multichannel audio signal. These time-domain channel signals xk[n] are then converted into frequency-domain signals Xk[l], wherein l is a frequency index identifying a frequency sub-band. Accordingly, for each channel and frequency sub-band, a sound activity level is determined.
Then the directional parameter estimation 3 aims at estimating, for each frequency sub-band, the dominant sound direction that a listener would perceive if he were listening to the multichannel audio on an appropriate loudspeaker layout, i.e. corresponding to the recommended loudspeaker configuration in accordance with the multichannel audio format.
Accordingly, for each one of a plurality of frequency sub-bands, a directional sound activity vector is then estimated.
First, for each channel and frequency sub-band, a sound activity vector related to said channel is determined from the sound activity level related to said channel and frequency sub-band and from spatial information associated with said channel.
A channel configuration, i.e. the associated loudspeaker recommended positions corresponding to the signal coding, can be described by unit vectors {right arrow over (u)}k corresponding to the direction of the sound that would be emitted by loudspeakers fed by said channels. For example, three values describing this direction for each channel can constitute the required spatial information.
Accordingly, for a channel and for a frequency sub-band, a sound activity vector can be formed by associating the sound activity level corresponding to the frequency-domain signal Xk[l] of said channel and said sub-band to the unit vector {right arrow over (u)}k corresponding to the spatial information associated with said channel.
Several methods can be used. For instance, the method presented hereafter is based on Gerzon's energy vectors. The sound activity vector related to one channel and one frequency sub-band can be expressed as:
  
  {right arrow over (Ek)}[l]=|Xk[l]|2,{right arrow over (uk)}
In this case, sound activity level is directly linked to the sound energy.
Then, for each frequency sub-band, the sound activity vectors related to the channels for said frequency sub-band are combined to obtain a directional sound activity vector related to said frequency sub-band.
For example, using Gerzon's energy vectors, the directional sound activity vector related to one frequency sub-band can be calculated as a mere summation of the sound activity vectors related to the channels for said frequency sub-band:
  
    
  
This directional sound activity vector represents the predominant sound direction that would be perceived by a listener according to the recommended loudspeaker layout for sounds within that particular frequency sub-band.
An optional, however advantageous, frequency masking 4 can adapt directional sound activity vectors according to their respective frequency sub-bands. In order to tune reactivity with respect to sound frequencies, the norms of the directional sound activity vectors can be weighted based on their respective frequency sub-bands. The weighted directional sound activity vector is then
  
  {right arrow over (G)}[l]=∝[l].{right arrow over (Ek)}[l]
where α[l] is a weight, for instance between 0 and 1, which depends on the frequency sub-band of each directional sound activity vector. Such a weighting allows enhancing particular frequency sub-bands of particular interest for the user. This feature can be used for discriminating sounds based on their frequencies. For instance, frequencies related to particularly interesting sounds can be enhanced in order to distinguish them from ambient noise. The directional sound analyzing unit 1 can be fed with spectral sensitivity parameters which define the weight attributed to each frequency sub-band.
In order to directionally visualize sound activity, space is divided into sub-divisions which are intended to discretely represent the acoustic environment of the listener. 
For each frequency sub-band, the dominant sound direction and the sound activity level associated to said direction is now determined and described by the directional sound activity vector, preferably weighted as described above. The visualization of such directional information must be very intuitive so that sound direction information can be restituted to the user without interfering with other source of information.
The beam clustering stage 5 corresponds to allocating to each of the sub-division a part of each frequency sub-band sound activity.
To this end the contributions of each frequency sub-band sound activity to each sub-division of space are determined on the basis of directivity information. For each sub-division of space, a directional sound activity level is determined within said sub-division of space by combining, for instance by summing, the contributions of said frequency sub-band sound activity to said sub-division of space.
Directivity information is associated to each sub-division 6. Such directivity information relates to level modulation as a function of direction in an oriented coordinate system, typically centered on a listener's position. This directivity information can be described by a directivity function which associates a weight to space directions in an oriented coordinate system. Typically, such a directivity function exhibits a maximum for a direction associated with the related sub-division.
For each sub-division 6 of space, norms of directional sound activity vectors are weighted on the basis of a directivity information associated with said sub-division 6 of space and the directions of said directional sound activity vectors. These weighted norms can thus represent the contribution of said directional sound activity vectors within said sub-divisions of space.
For instance, a directivity function can be parameterized by a beam vector {right arrow over (vm)} and an angular value θm corresponding to the angular width of the beam, wherein m identifies a space sub-division. The direction associated with a sub-division 6 can be the main direction defined by the beam vector {right arrow over (vm)}. Accordingly, the angular distance between a beam vector {right arrow over (vm)} and a directional sound activity vector {right arrow over (G)}[l] can define the clustering weight Cm[l]. For instance, a simple directional weighting function may be 1 if the angular distance between a beam vector {right arrow over (vm)} and a directional sound activity vector {right arrow over (G)}[l] is less than θm/2 and 0 otherwise:
  
    
  
The beam vector {right arrow over (vm)} and the angular value θm used for define the parameters of the directivity function can constitute an example of directivity information by which contribution of each one of said directional sound activity vectors within sub-divisions of space can be estimated.
The directional sound activity within a beam or sub-division of space can then be determined by summing said contributions, such as weighted norms in this example, of said directional sound activity vectors related to the L frequency sub-bands:
  
    
  
Once determined, the directional sound activity for each of the M beam can be fed to a visualizing unit, typically to a screen associated with the computer which comprises or constitutes the directional sound analyzing unit 1.
For every space sub-division 6, such as the beams illustrated in 
  
Other graphical representation can be used, such a radar chart wherein directional sound activity levels are represented on axes starting from the center, lines or curves being drawn between the directional sound activity levels of adjacent axes. Preferably, the lines or curves define a colored geometrical shape containing the center.
The invention thus allows sound direction information to be delivered to the user even if said user does not possess the recommended loudspeaker layout, for example with headphones. It can also be very helpful for hearing-impaired people or for users who must identify sound directions quickly and accurately.
Preferably, the graphical representation shows several directional sound activity levels for each sub-division, these directional sound activity levels being calculated with different frequency masking parameters.
For example, at least two set of spectral sensitivity parameters are chosen to parameterize two frequency masking process respectively used in two directional sound activity level determination processes. The two set of directional sound activity vectors determined from the same input audio channels are weighted based on their respective frequency sub-bands in accordance with two different set of weighting parameters.
Consequently, for each sub-division, each one of the two directional sound activity levels enhanced some particular frequencies in order to distinguish different sound types. The two directional sound activities can then be displayed simultaneously within the same sub-divided space, for example with a color code for distinguishing them and a superimposition, for instance based on level differences.
The method of the present invention as described above can be realized as a program and stored into a non-transitory tangible computer-readable medium, such as CD-ROM, ROM, hard-disk, having computer executable instructions embodied thereon that, when executed by a computer, perform the method according to the invention.
While the present invention has been described with respect to certain preferred embodiments, it will be apparent to those skilled in the art that various changes and modifications may be made without departing from the scope of the invention as defined in the appended claims.