The present invention relates to methods, modules and a computer-readable recording media for providing a multi-channel convolution reverb.
Recently, music projects that in former times would have required an array of professional studio equipment can now be completed in a home or project studio, using a personal computer and readily available resources. A personal computer that executes digital audio studio software such as e.g. Logic Pro 7 of Apple Computer Inc. can serve as a work-station for recording, arranging, mixing, and producing complete music projects, which can be played back on the computer, burned on a CD or DVD, or distributed over the Internet. Such audio studio software also allows to record, generate, process and output audio in surround audio formats, such as e.g. 5.1 or 7.1 surround formats, having 5 or 7 audio channels as well as optionally also an additional low frequency effects LFE channel.
Such audio studio software is also often used by musicians, professional or hobbyists, to improve studio recordings by simulating real-world spaces such as e.g. a cathedral, an opera house, or a music stage. This is often performed by using a so-called convolution reverb effect, wherein a single impulse response or a set of impulse responses of such a desired location is used. These impulse responses are also sometimes referred to as acoustic fingerprint of the location. In performing the convolution reverb effect, each channel of e.g. a surround audio track is convoluted by a corresponding impulse response, each impulse response of the set of impulse responses of the desired location to be simulated having a same length in time, respectively a same number of samples in case of the impulse responses being provided as digital sample data, e.g. of 44.1 kHz or 96 kHz sampling rate, each sample corresponding to e.g. 16 bit or 24 bit. Overall, such processing results in a number of convolution processing operations that corresponds to the number of channels in the surround audio track that are subjected to convolution reverb processing. However, such processing does not take into account that also the reverberations of the location that may be audibly perceived in one channel, but are caused by, respectively originate from an audio signal in another channel contribute to the overall spatial localisation and “spaciousness” of the resulting perception.
Recently, there have also been developed systems that offer a “true surround” convolution reverb effect, wherein each reverberated output audio channel signal respectively is the sum of each inputted audio channel signal convoluted by a corresponding impulse response. In comparison, this provides for an audio convolution reverb effect that allows for a perceivably much better simulation of an existing space, however requires a number of convolution processing operations that corresponds to the square of the number of channels in the surround audio track that are subjected to convolution reverb processing in case the number of input channels is the same as the number of output channels. Otherwise the number of required convolution processing operations corresponds to the product of the number of input channels times the number of output channels. Therefore, it will be understood by those skilled in the art that such a “true surround” convolution reverb requires a number of computations that is comparably much increased. As a result, even with recent increases in processor speed, currently available personal computers cannot perform such “true surround” convolution reverb in real-time. Instead, such effects have to be processed “off-line”, requiring processing time which is usually far longer than the time of the actual surround audio file to be processed.
At least certain embodiments of the present invention provide a multi-channel audio convolution reverb that provides a room simulation while being capable of being performed in real-time.
In accordance with a first embodiment of the invention, there is provided a method of generating, on a data processing system, such as a computer system, a multi-channel audio convolution reverb, comprising:
Said multi-channel audio signal preferably comprises 5, 6 or 7 surround audio channels, and more preferably comprises an additional low frequency effect LFE audio channel not being subjected to convolution operation.
Further in accordance with the first embodiment of the invention, there is provided further a method of performing decorrelation operation for decorrelating said other audio channel and said respective audio channel, the decorrelated result being used in said cross-channel convolution operation.
In accordance with a first embodiment of the invention, there is also provided a machine-readable recording medium, having recorded thereon program instructions causing, when executed on a data processing system, the system to produce a multi-channel audio convolution reverb, by a method comprising:
Preferably, cross-channel convolution operation may be respectively performed only for an initial part of said cross-channel impulse response, said initial part being defined by a definition parameter.
Further preferably, said program instructions are realized as a software plug-in for use with an audio studio software, such as e.g. Logic Pro.
In accordance with a first embodiment of the invention, there is also provided a multi-channel audio convolution reverb module, comprising:
Preferably, said cross-channel convolution processing units being adapted to perform said convolution processing only for an initial part of said cross-channel impulse response said initial part being defined by a definition parameter.
In accordance with a first embodiment of the invention, there is also provided a data carrier having stored thereon synthesized music obtained in a computer aided process involving a reverb generation operation according to the present invention.
A result of at least certain embodiments of the invention may be a data file, created through one of the methods described herein, which may be stored on a storage device of a data processing system. The data file may be an audio data file, in a digital format, which may be used to create sound by playing the data file on a system which is coupled to audio transducers, such as speakers.
One or more of the methods described herein may be implemented on a data processing system which is operable to execute those methods. The data processing system may be a general purpose or special purpose computer device, or a desktop computer, a laptop computer, a personal digital assistant, a mobile phone, an entertainment system, a music synthesizer, a multimedia device, an embedded device in a consumer electronic product, or other consumer electronic devices. In a typical embodiment, a data processing system includes one or more processors which are coupled to memory and to one or more buses. The processor(s) may also be coupled to one or more input and/or output devices through the one or more buses. Examples of data processing systems are shown and described in U.S. Pat. No. 6,222,549, which is hereby incorporated herein by reference.
The one or more methods described herein may also be implemented as a program storage medium which stores and contains executable program instructions for, when those instructions are executed on a data processing system, causing the data processing system to perform one of the methods. The program storage medium may be a hard disk drive or other magnetic storage media or a CD or other optical storage media or DRAM or flash memory or other semiconductor storage media or other storage devices.
Further embodiments of the present invention will now be described to illustrate the above and other advantages and aspects of the invention by way of further examples and with reference to the accompanying drawings, in which:
An impulse response can be viewed as the total echoes of sound reflections in a given room following an initial signal spike impulse. Impulse responses are recordings made in acoustic spaces. To create an impulse response, the sound of a starter pistol, or a digital spike is recorded inside the desired room together with the resulting reflections. Alternatively, a sine sweep covering preferably the whole audible frequency range may be played back and recorded. Preferably, there is recorded, for a desired location, a plurality of impulse responses corresponding to different locations of sound sources. The impulse responses may be stored in the impulse response storage module 20 and/or utilized in the convolution reverb module 10 as computer readable files such as e.g. AIFF, SDII or WAV file formats, and may have sampling rates of e.g. 22.05 kHz, 24 kHz, 44.1 kHz, 48 kHz, 96 kHz or 192 kHz. Each sample may correspond to 16 or 24 bits.
wherein a(n) is the digital audio signal, and IR(n) the digital impulse response having length of m samples. Furthermore, those skilled in the art will understand that a convolution operation may not only be performed according to formula (1) as set forth in the above, but instead may also be performed by Fourier transforming the input signal and the impulse response into frequency domain, performing the point-wise product of the Fourier transformed and inversely Fourier transforming the result back into time domain. Preferably, a fast Fourier transform method is utilized in order to reduce computational load.
As can be seen in
wherein ap refers to the respective digital audio channel input signals a1 to an, IR1p refers to the respective impulse responses, and m1p refers to the length as a number of samples of the impulse response over which convolution processing is performed. For a “true surround” convolution reverb effect that should provide the best possible simulation of a location, convolution processing is respectively performed over a same respective length m1p=m.
Referring now to
As results from
For example, in order to simulate the reverberation of a room, such as a cathedral, opera house, or any other desired location, that has a reverberation time of e.g. 3 seconds, and using a sampling rate of 96 kHz, i.e., 96 000 samples per second, for high quality audio, then the resulting impulse responses respectively comprise 3 s×96 000 samples/s=288 000 samples. For a surround audio track of e.g. 3 min=180 s length, also sampled at 96 kHz, this results in each convolution processing requiring 288,000 sample×180 s×96,000 samples/s=4,976,640,000,000 multiplications. Assuming now a surround audio track in 7.1 surround format, having 7 audio channels that are subjected to convolution reverb processing, then a total of 7×7=49 convolution processing operations need to be performed, resulting in a total of 243,855,360,000,000 multiplications. As will be understood by those skilled in the art, despite the advances in computer technology offering personal computers with increasingly faster microprocessors, presently available personal computer systems are not capable of performing such a large number of mathematical operations in real-time. This has the disadvantage that a user of audio studio software first has to wait for such an “off-line” convolution reverb effect to be fully calculated and the resulting convolution reverb processed multi-channel audio signal to be output and e.g. written to a hard disk of the personal computer executing the audio studio software before the user can use this resulting convolution reverb processed multi-channel audio signal for further processing, such as mixing with other audio tracks, adding further effects offered by the audio studio software and so on. As a result, the user is greatly impeded in his or her creative work flow.
According to at least certain embodiments of the present invention therefore, at least one convolution processing is limited to a part of the respective impulse response that is shorter than the one for at least one other convolution processing. More preferably, all cross-channel convolution processing is limited to an initial part of the respective cross-channel impulse responses, wherein the initial part is defined by a definition parameter. Because a natural reverb contains most of its spatial information within an initial time duration, typically the first milliseconds, whereas with increasing time, the reflection pattern becomes progressively more diffuse and indistinct, therefore, this definition parameter allows a system to capture most of the spatial information, embedded in the initial part of the impulse responses, while maintaining the overall reverberation sensation. In this way, by calculating the early reflections and the onset of the reverb using the full set of impulse responses, while towards the tail of the reverb a reduced set of impulse responses is used, the overall computational load placed upon e.g. a personal computer performing such a convolution reverb is greatly reduced. In this way, the definition parameter provides an elegant and simple means to control the balancing of reverb quality and accuracy versus requirement in processing load on the personal computer.
The definition parameter may be a predetermined parameter which is preferably set between 50 ms and 300 ms, more preferably between 100 ms to 200 ms. Most preferably, however, the definition parameter may be set by a user e.g. of the personal computer executing the audio studio software, such as a Macintosh computer executing Logic Pro 7 audio studio software, thus giving the user the ability to determine a suitable definition parameter. A user may set the definition parameter as a time of the initial impulse response, e.g. in milliseconds ms, or as the number of samples that the cross-channel impulse responses are taken into account and evaluated. Alternatively, a user may also set the definition parameter as a percent or as a ratio of the total length of impulse response. Most preferably, a user is offered a display screen which displays some or all of the respective impulse responses and which displays an indicator such as a vertical line corresponding to the definition parameter which is displayed on the impulse responses. By moving this vertical line, a user may visually set the definition parameter. One possible display screen, with a user interface, is shown in
Accordingly, taking into account this definition parameter, an ith outputted audio channel signal bi is calculated as given in formula (4) below:
In this formula (4), the terms corresponding to i=p represent a same-channel convolution operation which is processed preferably according to the full length of mii=m samples of the same-channel impulse response IRii, whereas the terms corresponding to p≠q represent cross-channel convolution operation, respectively performed over a respective length mip. Preferably, for such cross-channel convolution, the respective length mip is set according to the definition parameter only for the first v samples of the respective cross-channel impulse responses, i.e., mp=v for p≠q.
As will be understood by those skilled in the art, in such a way the computational load placed e.g. on a personal computer performing such a multi-channel convolution reverb may be greatly reduced. As an example, in the case of a multi-channel audio signal of a 7.1 surround audio format, subjecting seven audio channels to a “true surround” convolution reverb requires a system to perform in total 49 convolution processings over a respective impulse response length of e.g. 3 s. Setting the definition parameter to e.g. 150 ms, i.e., one twentieth of the 3 s overall impulse response length, and performing cross-channel convolution processing for cross-channel convolution operation only over the initial part of the respective impulse responses corresponding to these 150 ms, then the computational load is reduced to 7 convolution processings over 3 s length, and 42 convolution processings over 150 ms=0.15 s length. In terms of computational load, this corresponds to a load of approximately 7+42*(3 s/0.15 s)=9.1 convolution processings over a length of 3 s. As will be understood by those skilled in the art, such a multi-channel convolution reverb according to the first embodiment requires only a little additional computation when compared with a convolution reverb wherein only same-channel convolution processing is performed, and therefore is suitable also for real-time applications wherein such a convolution reverb is calculated or generated with only comparatively little or no delay upon input of the multi-channel audio signal. Therefore, a user is no longer impeded by having to wait for a convolution reverb having to be performed “off-line”. The result of a method in an embodiment may be stored as audio data which can then be played back on speakers or other transducers.
Alternatively, the respective lengths mpq may also be set such that each respective length mpq is set to a different value. For example, the parameters mpq may be set such that for an initial length v convolution operation is performed according to the full set of impulse responses, then for a second length v′ following the initial length v, convolution operation is performed for same-channel operation and additionally also in cross-channel operation for left and right front audio signal, excluding other cross-channel convolution operation, and after the second length v′ only same-channel convolution operation is performed. This offers even more flexibility to a user to adjust performance of the convolution reverb module 10 according to his or her expectations and requirements. Accordingly, such increase in flexibility requires also more complexity of the settings, as now not only one definition parameter, but a plurality of different parameters has to be adjusted.
Turning now to
Although the above description has been made in context of multi-channel audio signals exemplified by surround audio signals having e.g. 5, 6 or 7 audio channels, this is not limiting. For example, the present invention may also be applied to a multi-channel audio signal in the form of a stereo signal having only two audio channels, left and right channel. In this case, the present invention allows a “true stereo” convolution reverb effect with reduced computational load. As a result, a user may subject a plurality of stereo signals to convolution reverb in parallel, while still being able to enjoy processing in real-time.
The present invention as described above can be implemented in numerous ways, e.g. by hardware only, by a program stored on a storage medium, etc. Such a program which enables a data processing system, such as a music machine or a music synthesizer or a computer system, to execute one or more of the above described features of the invention may comprise a screen on a display monitor which is connected to a processor which is coupled to a hard disc drive incorporating a temporary drive such as a CD-ROM, DVD, optical disc or floppy disc drive in which is inserted a suitable data storage medium. The computer system may also include a mouse and keyboard both connected electrically to the processor. Other variations of the computer system can be envisaged. For example the use of a joystick or roller ball or stylus pen and/or a plurality of temporary and hard disc drives and/or connection of the computer system to the Internet and/or other applications of the computer system in a specific application which may not include a keyboard or mouse but rather input buttons and menus on the screen.
The foregoing description has been given by way of example only and it will be appreciated by a person skilled in the art that numerous modifications can be made without departing from the scope of the present invention.