This disclosure relates to systems and methods for capturing environmental sounds and immediately replaying the captured sounds.
Applications for sound in Virtual and Augmented Reality (VR/AR) generally aim to provide a lifelike experience of a virtual environment and/or an acoustically augmented environment. They typically simulate events—taking place in the virtual and/or acoustically augmented environment—that a subject can interact with. A common problem that such applications face is that the subject does not hear oneself (properly) reflected in the virtual environment, i.e., the real sounds produced by the subject, e.g., by means of voice and/or body movements, do not sound as if they take place in the virtual and/or augmented environment. Additionally, especially with regards to Mixed and/or Augmented Reality applications (XR/AR), one may also want to hear other environmental sounds, e.g., sounds produced by other (human) subjects and/or any other sound sources in the real environment that are reflected in the virtual environment. Thus, although the impression of a virtual environment may be convincingly simulated, as long as the subject does not hear oneself and the real environment reflected in the virtual and/or augmented environment, the simulation is perceptually incoherent. As a result, the experience is not lifelike and less physically and/or emotionally engaging than is desirable.
Many virtual, mixed, and augmented reality applications use a so-called closed system, wherein sounds are typically delivered to a user using headphones. In a closed system, capturing the sounds produced by the subject and/or by any other sources in the environment may involve a prohibitive amount of microphones and/or sensors placed on the subject(s) and/or throughout the environment to accurately process the audio and spatial/movement data. The data delivery to the subject would then further involve full simulation of each and every sound source in the real environment to be able to provide a convincing experience of each sound source in the virtual environment. This may require prior knowledge about the real environment and/or the sound sources present in the environment and/or the type of events occurring in the environment. Such a simulation may require a prohibitive amount of real-time data processing and may require data that is often impossible to obtain either prior or in real-time.
An example of a closed system is known from US 2013/0236040 A1. This document discloses a system to combine environmental sounds and augmented reality (AR) sounds. The system comprises headphones with speakers on the inside (directed towards the user's ears) and microphones on the outside, positioned close to the speakers but acoustically insulated from the speakers. The microphones capture ambient sounds, which may be processed (e.g. enhanced or suppressed, depending on the sound and the circumstances) before being fed to the respective speakers.
It may be understood that the complexity of the problem of delivering a convincing audio experience, and thus of a closed system, increases exponentially when one considers applications that involve a larger number of subjects sharing the same physical environment. As an example, one may consider the effects of acoustically enhanced environments on various interacting groups of human subjects, e.g., a group of people discussing during an assembly meeting or an audience attending a live concert; as well as non-human subjects, e.g., a swarm of bees moving across a meadow of flowers, birds communicating with each other across trees or distributed plant growth in an open field.
Alternatively, one could consider an open delivery to all subjects at once, e.g., by means of adding loudspeakers to the environment. This eliminates the necessity to capture data from each individual subject and/or sound source in the environment, as, instead, sound is captured on the level of the environment as a whole. However, a drawback of such an open system is the feedback that occurs when microphones are placed in the same environment as the loudspeakers and the captured environmental sound is played back in the same environment at a high gain in real-time. This is especially the case when omnidirectional microphones are used. Although omnidirectional microphones are particularly suitable to capture sound on the level of the environment as a whole, they are known to be particularly sensitive to produce feedback when played back in the environment in real-time. Consequently, omnidirectional microphones are not commonly used in the design of such systems.
Hence, there is a need in the art for a device that accurately captures environmental sound, i.e., the sound produced by subject(s) and any other sources in the environment, and is able to deliver the environmental sound as incorporated in a virtual and/or augmented environment in real-time in a simple and reliable, computationally inexpensive and perceptually coherent way.
Hence, an environmental sound loudspeaker is disclosed. As used herein, an environmental sound loudspeaker is a device or system for capturing environmental sounds, processing the captured sounds, and immediately replaying the captured sounds.
In an aspect, this disclosure relates to an environmental sound loudspeaker comprising a loudspeaker driver, a first microphone pair, and a signal processor. The first microphone pair comprises a first microphone and a second microphone positioned a distance d apart, the first microphone and the second microphone being positioned diametrically opposite each other and equidistant relative to a centre of the loudspeaker driver. The signal processor may be configured to receive a first input signal from the first microphone and a second input signal from the second microphone, each input signal representing a recorded sound; determine an output signal based on the first and second input signals; and provide the output signal to the loudspeaker driver. The determination of the output signal may comprise inverting the first input signal and combining the inverted first input signal with the second input signal into a combined signal. The determination of the output signal may further comprise amplifying the combined signal and/or the first and second input signals to obtain a high-fidelity signal of environmental sounds captured by the first and/or second microphones for frequencies in an audible frequency range, preferably for all or at least substantially all frequencies in the audible frequency range. Preferably, the amplifying comprises attenuating signals with a frequency higher than a first transition frequency and/or boosting signals with a frequency lower than a second transition frequency, the first and second transition frequencies being based on the distance d between the first and second microphones.
The microphone pair may capture environmental sounds, which are to be played back by the loudspeaker driver. Because the microphones are placed equidistant from the loudspeaker driver, the phase shifting by the signal processor may prevent or at least greatly reduce feedback by effectively filtering out sound coming from the loudspeaker driver. Sounds produced by the loudspeaker driver may be referred to as non-environmental sounds. The output signal may be further processed, e.g. acoustically enhanced or mixed with other signals, before being provided to the loudspeaker driver.
The microphones in each microphone pair are identical to each other. Preferably, when an environmental sound loudspeaker comprises a plurality of microphone pairs, all microphones are identical. Preferably, the microphones are omnidirectional microphones. However, the inverting, phase-shifting (in case of a plurality of microphone pairs), and adding of the signals may amplify high-frequency sounds and attenuate low-frequency sounds. Therefore, a high-fidelity signal (with the contribution from the loudspeaker driver filtered out) may be restored, or at least approximated, by selectively amplifying the phase-shifted (where applicable) and combined signals, preferably by boosting the low-frequency signals in the audible spectrum with +6 dB per octave and attenuating the high-frequency signals in the audible spectrum with −3 dB per doubling of the number of microphones. For example, the attenuation a of the high-frequency signals may be given or approximated by a(f)=−3 dB×log2(N), with N the number of microphones. Amplifying the signal may comprise applying one or more high-shelf filters and/or one or more low-shelf filters to the signal.
As used herein, an environmental sound may be any sound not directly produced by the loudspeaker driver of the environmental sound loudspeaker. Thus, environmental sounds may include sounds from sources external to the environmental sound loudspeaker. Environmental sounds may also include reflections of sounds produced by the loudspeaker driver of the environmental sound loudspeaker.
As used herein, a high-fidelity signal may refer to a signal, typically an output signal that has a frequency spectrum that is substantially the same as a frequency spectrum of a further signal, typically an input signal. In particular, a high-fidelity signal may refer to a signal that has a loudness balance that is substantially the same as the loudness balance of the environmental sounds captured by the first and/or second microphones for substantially all frequencies in an audible frequency range.
As used herein, a frequency spectrum may refer to a sound amplitude for frequencies in the audible spectrum. The frequency spectrum may also be referred to as a loudness balance. The audible spectrum, or audible frequency range, may comprises all frequencies between 20 Hz-15 kHz, or even all frequencies between 15 Hz-20 KHz.
As used herein, “immediately” replaying refers to replaying with a time delay that is not noticeable for a typical human listener. Preferably, the time delay is as short as possible, e.g., shorter than 10 ms, preferably shorter than 5 ms, more preferably shorter than 1 ms. In an analogue implementation, the time delay can be negligible, e.g. smaller than 0.1 ms.
As used herein, amplifying includes boosting (increasing the loudness) and attenuating (reducing the loudness) of an audio signal. High-frequency sounds may comprise sounds with a frequency larger than the first transition frequency. Low-frequency sounds may comprise sounds with a frequency smaller than the second transition frequency. In some implementation, the first and second transition frequencies may be the same frequency. There are several possible definitions for the first and second transition frequencies. The skilled person understands that if a different definition of a transition frequency is used, properties (e.g. filter parameters) that are expressed in terms of a transition frequency, may have to be adjusted accordingly. In general, the transition frequency depends on at least the distance between the microphones in a pair of microphones, and optionally on the number and orientation of microphone pairs. For instance, the first transition frequency may be defined as the frequency corresponding to a wavelength of twice the distance between the microphones in a pair of microphones.
In an embodiment, the environmental sound loudspeaker comprises one or more additional microphone pairs. Each additional microphone pair comprises a first additional microphone and a second additional microphone positioned the distance d apart, and the first and second additional microphones in each additional microphone pair are positioned diametrically opposite each other relative to the centre of the loudspeaker driver. Preferably, the microphone pair and the one or more additional microphone pairs are arranged symmetrically around the centre of the loudspeaker driver. In such an embodiment, the signal processor is further configured to, for each of the one or more additional microphone pairs, receive a first additional input signal from the first additional microphone and a second additional input signal from the second additional microphone. The determination of the output signal may further comprise, for each additional microphone pair, inverting the first additional input signal and combine the inverted first additional input signal with the second additional input signal into a combined additional signal. The determination of the output signal may further comprise applying a phase shift to the combined additional signal, the phase shift being based on an angle between an axis between the first and second microphones and an additional axis between the first and second additional microphones. The determination of the output signal may further comprise combining the phase-shifted additional signal with the combined signal. In such an embodiment, the second transition frequency may further be based on the number of microphone pairs.
The first microphone pair and optionally the one or more additional microphone pairs may together be referred to as the one or more microphone pairs.
By adding additional microphone pairs, the directional sensitivity of the environmental sound loudspeaker may be improved.
In an embodiment, the first microphone pair and the one or more additional microphone pairs are equally distributed on a circle, the centre of the circle coinciding with the centre of the loudspeaker driver, the phase shift Δφi for the i-th additional microphone pair being equal to Δφ=i×360°/N, with N the number of microphones. Preferably the environmental sound loudspeaker comprises exactly one additional microphone pair placed orthogonally to the first microphone pair. In such an embodiment, the phase shift may be equal to 90°.
In an embodiment, the first microphone pair and the one or more additional microphone pairs are equally distributed on a sphere, the centre of the sphere coinciding with the centre of the loudspeaker driver. Preferably, the environmental sound loudspeaker comprises exactly two additional microphone pairs, the first microphone pair and the two additional microphone pairs being placed on the axes of a cartesian coordinate system with an origin in the centre of the loudspeaker driver and the phase shift being equal to 90°.
In an embodiment, the environmental sound loudspeaker further comprises an acoustic module for sound manipulation, e.g., adding reverberation and/or virtual acoustics to a signal provided to the acoustic module, preferably to the output signal. Thus, the determination of the output signal may comprise the acoustic module modifying the output signal. This way, virtual or augmented reality effects may be achieved based on and/or comprising the environmental sounds.
In an embodiment, the environmental sound loudspeaker further comprises an external signal input for receiving an external input signal, the external input signal encoding a sound, and wherein the determination of the output signal further comprises combining the external input signal with the signal generated by the signal processor. This way, the environmental sounds may be mixed with other sounds, e.g. sounds from virtual sound sources or musical sounds.
In an embodiment, the amplifying comprises attenuating signals with a frequency higher than a first transition frequency with −3 dB for the first microphone pair and, optionally, for each doubling of the number of microphone pairs. The number of microphone pairs may be equal to the number of additional microphone pairs plus one (for the first microphone pair). Additionally or alternatively, the amplifying comprises boosting signals with a frequency lower than a second transition frequency with +6 dB per octave.
The first transition frequency ft,1 may be defined by
The second transition frequency may be approximately equal to
Herein, v denotes the speed of sound and N denotes the number of microphones. Alternatively, the second transition frequency may be the same frequency as the first frequency, and the amplifying may further comprise attenuating signals with a frequency lower than the first transition frequency with −3 dB for the first microphone pair and, optionally, for each doubling of the number of microphone pairs.
In an embodiment, the amplifying comprises applying a series of low-shelf filters and/or applying a high-shelf filter. In an embodiment, the series of low-shelf filters is defined by a first transfer function
wherein G0 denotes a gain factor preferably equal to G0=1, B denotes a variable bandwidth preferably defined by
Q denotes a Q-factor determining the slope of the gain curve, the Q-factor preferably being equal to Q=5, and wherein fn denotes a central frequency of the nth low-shelf filter (n=0, 1, . . . ), preferably fn being determined by
wherein v denotes the speed of sound, and N denotes the number of microphones. In an embodiment, the high-shelf filter is defined by a second transfer function
wherein G∞ denotes a gain factor preferably equal to
wherein N denotes the number of microphones, B denotes a variable bandwidth preferably defined by
denotes a Q-factor determining the slope of the gain curve, the Q-factor preferably being equal to Q=5, and wherein fh denotes a central frequency of the high-shelf filter, preferably fh being determined by
wherein v denotes the speed of sound. The number of microphones is twice the number of microphone pairs.
Applying a phase shift Δφ to a signal may comprise creating a first copy and a second copy of the signal; applying a Hilbert transform to the first copy to apply a 90° phase shift; amplifying the first copy with a first factor a, and the second copy with a second factor b; and combining the first and second copies and amplifying the combined copies with a third factor c. Herein the factors a, b, and c are selected such that Δφ=arctan(a/b) and c=1/√{square root over ((a2+b2))}.
Alternatively, applying a phase shift Δφ to a signal may comprise creating a first copy and a second copy of the signal; applying a first frequency-dependent phase shift θA(f) to the first copy using one or more first all-pass filters with associated first corner frequencies f0,A(i) and first quality factors QA(i), preferably the first frequency-dependent phase shift θA(f) being given by
applying a second frequency-dependent phase shift θB(f) to the second copy using one or more second all-pass filters with associated second corner frequencies f0,B(i) and second quality factors QB(i), preferably the second frequency-dependent phase shift θB (f) being given by
and taking a difference between the first and second phase-shifted copies. The first and second corner frequencies and/or the first and second quality factors may be optimised such that Δφ≈θA(f)−θB(f) for all f in the audible frequency range. The number of terms n is equal to or larger than 1, for example 1, 2, 3, or 5.
In an embodiment, the environmental sound loudspeaker may be adapted for use under water. In an embodiment, the environmental sound loudspeaker may be adapted for aerial use. In an embodiment, the environmental sound loudspeaker may be adapted to generate sounds suitable to be heard by predefined non-human subjects.
In an aspect, this disclosure may relate to a method for recording, processing and immediately replaying sounds. The method may comprise receiving a first input signal from a first microphone and a second input signal from a second microphone, each input signal representing a recorded sound. The first microphone and the second microphone may form a first microphone pair, the first microphone and the second microphone being positioned a distance d apart, the first microphone and the second microphone being positioned diametrically opposite each other and equidistant relative to a centre of a loudspeaker driver. The method may further comprise determining an output signal based on the first and second input signals; optionally, manipulating the output signal, the manipulation preferably comprising adding reverberation and/or virtual acoustics to the output signal; and providing the, optionally manipulated, output signal to the loudspeaker driver. The determination of the output signal may comprise inverting the first input signal and combining the inverted first input signal with the second input signal into a combined signal. The determination of the output signal may further comprise amplifying the combined signal and/or the first and second input signals such that the amplified combined signal has a frequency spectrum that is substantially the same as a frequency spectrum of environmental sounds captured by the first and/or second microphones for frequencies in an audible frequency range, preferably the amplifying comprising attenuating signals with a frequency higher than a first transition frequency and/or boosting signals with a frequency lower than a second transition frequency, the first and second transition frequencies being based on the distance d between the first and second microphones.
In an embodiment, the method may further comprise receiving a first additional input signal from a first additional microphone and a second additional input signal from a second additional microphone from each of one or more additional microphone pairs. Each additional microphone pair comprises a first additional microphone and a second additional microphone positioned the distance d apart, the first and second additional microphones in each additional microphone pair being positioned diametrically opposite each other relative to the centre of the loudspeaker driver, the first microphone pair and the one or more additional microphone pairs being arranged symmetrically around the centre of the loudspeaker driver. In such an embodiment, the determination of the output signal may further comprise, for each additional microphone pair, inverting the first additional input signal and combining the inverted first additional input signal with the second additional input signal into a combined additional signal; applying a phase shift to the combined additional signal, the phase shift being based on an angle between an axis between the first and second microphones and an additional axis between the first and second additional microphones; and combining the phase-shifted additional signal with the combined signal. In an embodiment, the second transition frequency is further based on the number of microphone pairs.
One aspect of this disclosure relates to a computer comprising a computer readable storage medium having computer readable program code embodied therewith, and a processor, preferably a microprocessor, coupled to the computer readable storage medium, wherein responsive to executing the computer readable program code, the processor is configured to perform any of the methods described herein.
One aspect of this disclosure relates to a computer program or suite of computer programs comprising at least one software code portion or a computer program product storing at least one software code portion, the software code portion, when run on a computer system, being configured for executing any of the methods described herein.
One aspect of this disclosure relates to a non-transitory computer-readable storage medium storing at least one software code portion, the software code portion, when executed or processed by a computer, is configured to perform any of the methods described herein.
As will be appreciated by one skilled in the art, aspects of the present invention may be embodied as a system, a method or a computer program product. Accordingly, aspects of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-code, etc.) or an embodiment combining software and hardware aspects that may all generally be referred to herein as a “circuit,” “module” or “system.” Functions described in this disclosure may be implemented as an algorithm executed by a processor/microprocessor of a computer. Furthermore, aspects of the present invention may take the form of a computer program product embodied in one or more computer readable medium(s) having computer readable program code embodied, e.g., stored, thereon.
Any combination of one or more computer readable medium(s) may be utilized. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a computer readable storage medium may include, but are not limited to, the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), an optical fibre, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of the present invention, a computer readable storage medium may be any tangible medium that can contain, or store, a program for use by or in connection with an instruction execution system, apparatus, or device.
A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fibre, cable, RF, etc., or any suitable combination of the foregoing. Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java™, Smalltalk, C++ or the like and conventional procedural programming languages, such as the “C” programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider).
Aspects of the present invention are described below with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the present invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor, in particular a microprocessor or a central processing unit (CPU), of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer, other programmable data processing apparatus, or other devices create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer readable medium that can direct a computer, other programmable data processing apparatus, or other devices to function in a particular manner, such that the instructions stored in the computer readable medium produce an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.
The computer program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide processes for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the blocks may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
Moreover, a computer program for carrying out the methods described herein, as well as a non-transitory computer readable storage-medium storing the computer program are provided. A computer program may, for example, be downloaded (updated) to the existing data processing systems or be stored upon manufacturing of these systems.
Elements and aspects discussed for or in relation with a particular embodiment may be suitably combined with elements and aspects of other embodiments, unless explicitly stated otherwise. Embodiments of the present invention will be further illustrated with reference to the attached drawings, which schematically will show embodiments according to the invention. It will be understood that the present invention is not in any way restricted to these specific embodiments.
Aspects of the invention will be explained in greater detail by reference to exemplary embodiments shown in the drawings, in which:
In the figures, identical reference numbers indicate similar or identical elements. Further, elements that are depicted with dashed lines are optional elements. It should be understood that in general, components may be performed in alternative order, by using a signal process that may differ from what is illustrated, and that not all steps are required in every embodiment. In other words, one or more steps may be omitted or replaced, performed in different orders, in parallel with one another and/or additional steps may be added, without departing from the scope of the invention.
This disclosure relates to methods and systems for recording environmental sounds and immediately playing back those sounds, possibly in a modified form and/or combined with other sounds. The methods and systems described herein enable acoustically highly realistic augmented reality (AR), mixed reality (XR) and virtual reality (VR) experiences, in particular for a plurality of simultaneous users.
In some embodiments, the environmental sound loudspeaker may comprise a plurality of loudspeaker drivers, preferably identical loudspeaker drivers, and the microphones may be arranged symmetrically around the centre of mass of the loudspeaker drivers.
In some embodiments, the microphones and the loudspeaker driver(s) may be integrated in a single device, whereas in other embodiments, the microphones and the loudspeaker driver(s) may be implemented as a system of separate devices.
The environmental sound loudspeaker further comprises a signal processor 110. The signal processor is arranged to combine input signals from the microphones, typically into a single output signal 129 that can be provided to the loudspeaker driver. The signal processor is further arranged to provide feedback reduction. The signal processor is furthermore arranged to maintain or restore the fidelity of the combined input signals.
The signal processor 110 can be integrated into the loudspeaker device, or it can be (part of) a separate device. As depicted, the signal processor comprises a plurality of microphone signal inputs 1121-4 for receiving input signals from microphones 1041-4. In the depicted embodiment, there is one signal input for each individual microphone. In some embodiments, several inputs can be physically or logically combined. The connection between a microphone and an input can be wired or wireless. During transport, the signals from the microphones may be combined into a single data stream, to be separated into the constituent microphone signals after reception. However, in a preferred embodiment, each microphone is connected via a dedicated wire to the corresponding input. It should be noted that pairs of microphones are connected to pairs of inputs; in this example, inputs 1121 and 1122 form a first pair, receiving input from microphone pair 1041 and 1043, and inputs 1123 and 1124 form a second pair, receiving input from microphone pair 1042 and 1044.
One input 1122,4 from each pair of inputs is connected to an inverter 1141,2 which inverts the input signal 1132,4 received from the corresponding microphone. Inverting a signal corresponds to applying a phase shift of 180° to the signal or, equivalently, to multiplying a signal with a factor −1. After inversion of one of the signals from an input pair, the two signals from a pair of microphones are combined, typically added or summed, by a signal combiner 1161,2, resulting in combined signals 1171,2. Combining a first signal and an inverted second signal can be understood as subtracting the second signal from the first signal, or as determining a difference between the first signal and second signal. Because the microphones in a microphone pair are positioned equidistant from the loudspeaker driver, the contributions of the loudspeaker driver at the respective microphones are in phase with each other, and are thus cancelled out. Sounds from different sources will typically arrive at the microphones of a microphone pair with a phase difference, and will thus not cancel out by subtracting the signals from each other. This is further explained in more detail with reference to
The combined signal 1172 from one or more microphone pairs may be phase-shifted 118 with a phase shift Δφ. In the depicted example with two pairs of microphones, one pair may be phase-shifted with a 90° phase shift, for example by applying a Hilbert transform. Phase shifting will be discussed in more detail below with reference to
The combined, and optionally phase-shifted, signals from the microphone pairs are then again combined by a further signal combiner 120 resulting in a further combined signal 121.
The further combined signal 121 is provided to a signal amplifier 122 for restoring, or at least approximating, the frequency spectrum (or loudness balance) of the input signal representing environmental sounds, i.e. sounds from other sources than directly from the loudspeaker driver 102. The signal amplifier typically comprises components for boosting the low frequencies and for attenuating the high frequencies. For example, the amplifier may comprise one or more high-shelf filters for attenuating the high-frequency signals and/or one or more low-shelf filters for boosting the low-frequency signals.
Preferably, the amplifier attenuates frequencies above a first transition frequency with −3 dB per doubling of the number of microphones, for example, −3 dB for one microphone pair, −6 dB for two microphone pairs, −7.8 dB for three microphone pairs, −9 dB for four microphone pairs, −10 dB for five microphone pairs, −10.8 dB for six microphone pairs, −12 dB for eight microphone pairs, or −13 dB for ten microphone pairs. In general, the attenuation a(f) may be given or approximated by
where N is the number of microphones. The amplifier may comprise a high-shelf filter to implement the attenuation, as is described in more detail below with reference to
Preferably, the amplifier boosts frequencies below a second transition frequency with +6 dB per octave. In general, the boost b(f) may be given or approximated by
where ft,2 is the second transitions frequency. The amplifier may comprise a plurality of low-shelf filters in series, as is described in more detail below with reference to
The first and second transition frequencies can be the same frequency. The first and second transition frequencies typically depend on the distance between the microphones in a microphone pair, on the number of microphones pairs, and on the parameters of the filters. This will be discussed in more detail below with reference to
The output of the amplifier 122 may be further enhanced by an acoustic module 124. The acoustic module may add sound effects to the input, such as virtual acoustics, reverberation effects, and so on. The acoustic module may also mix the input signal with a signal from an external source, which may be provided by a further sound input 126. The further sound input may be a wired or wireless connection. The signal processor 110 may comprise a further amplifier to amplify the input signal received by the further sound input.
The acoustic module 124 may, for example, process the captured environmental sounds in one of the following ways. The audio signal obtained by the microphones 1041-4 of the environmental sound loudspeaker 100, optionally combined and amplified, in principle contains all the sounds present in the physical environment surrounding the subject(s). This is particularly relevant when considering methods for adding acoustical properties of a virtual environment to the audio signal. In a typical use scenario, the sound sources captured by the environmental sound loudspeaker are, as such, already perceived by one or more (possibly a great many) users or subjects. These sound sources may be perceived from the discrete location of each subject within the environment, and these sound sources may be perceived as having a shape, dimensionality and other characteristics by which the subject may relate the type of sound source, its location and its directionality in space, among other things.
Thus, the acoustic module 124 does not have to add various effects to the environmental audio signal that are regularly used by other devices to simulate the spatial location and movement of sound sources, such as distance attenuation, distance damping due to air absorption and Doppler shifts, et cetera. Furthermore, the acoustic module does not need to take the individual positions and movements of subjects into account in the virtual acoustic simulation, as each subject will already relate to all other sound sources in the physical environment. This may include the added sounds of a virtual environment and/or additional sound sources inhabiting a virtual environment, played back by one or more loudspeakers at specific locations in the physical environment.
In general, with regards to adding reflections of a sound source in a virtual environment, these may be categorized in the perceptual phases of first reflections (FR) of the incident sound waves from the location of virtual walls and/or objects, early reflections (ER) of the sound waves occurring between the virtual walls and/or objects, and late reflections (LR) which are composed of the sum of many consecutive orders of reflections between the virtual walls and/or objects. This forms a distinct frequency and amplitude envelope of a decaying signal, i.e., the reverberation of the sound in a given virtual environment.
A virtual environment may have a distinct shape, size and material characteristics and may be composed of several walls and/or objects. Reverberation is thus the product of sound waves reflecting off surfaces in such a distinct environment. For example, the interior of a cathedral has a distinctly different shape and dimensions compared to a living room, and this in turn produces distinctly different reflections and results in a different character of reverberation. Surface properties may also be taken into account for accurate simulation of the reflections in a virtual environment. Reflection and reverberation properties are also shaped by the surface's roughness which influences the diffraction, i.e. diffusion or scattering of frequency components in the reflected sound waves. Another aspect determining the type of reflections and reverberation is the hardness of the surfaces, e.g. a surface covered with soft fur is less reflective of sound waves than a marble stone surface. The amount of reflection of incident sound waves from a surface is inversely proportional to the amount of absorption of the sound waves, i.e., the amount of energy dissipating through the surfaces of walls and/or objects in the virtual environment, which results in a reduction of amplitude for particular frequency regions of the incident sound, and thus a reduction of those frequency components in the resulting envelope of the reverberation signal.
In some embodiments, the responses of virtual surfaces and/or objects to incident sound waves and its subsequent reflections, absorption, diffraction and reverberation within a virtual space of distinct shape and size may be represented in the acoustic module 124 as a distinct setting of an artificial reverberation system, for example as described in Dutch patent application NL2026361 by the same applicant, which is hereby incorporated by reference. In other embodiments, such characteristics may be represented in the acoustic module by convolution of the input audio signals with a set of impulse responses (IR) that were recorded prior in real environments of the intended size, shape and material characteristics.
Optionally, further audio input signals may be added as input to the acoustic module 124, for example using further sound input 126, and all audio input signals may be combined to obtain a modified audio signal. The further audio input signals may be other environmental sound signals, e.g., the modified and combined audio signals obtained from a set of omnidirectional microphones integrated in a further environmental sound loudspeaker; and/or other pre-recorded, generated and/or live-obtained sound signals associated with a virtual sound source which are added to enrich the environment.
With regards to adding virtual sound sources to enrich a physical environment, such sound sources may be added in a direct manner, e.g., the audio input signal(s) can be distributed and played back directly to the loudspeaker(s). Additionally or alternatively, the virtual sound sources may be assigned a location in a virtual space, e.g. defined by cartesian coordinates (xyz). In such an embodiment, the loudspeaker(s) may be configured in a loudspeaker configuration and each loudspeaker may similarly be assigned a location in a virtual space, preferably using the same coordinate system. The audio signal components obtained from the audio input signals for each discrete loudspeaker can be attenuated based on the distance and angle of the location of the virtual sound source respective to the location of each loudspeaker. The resulting gain of each audio signal component obtained from each audio input signal for each discrete loudspeaker may be zero for some loudspeakers and/or larger than zero for other loudspeakers. In another embodiment, virtual sound sources may be combined with one or more environmental sound signals and the same acoustical properties of a virtual environment may be added to them, which may comprise methods to generate reflections, diffractions, absorption and/or reverberation of the audio input signal(s) in a virtual environment.
In an embodiment, an environmental sound signal may be used as an audio trigger and/or marker to activate playback of a (portion of) a pre-recorded and/or generated audio signal, optionally marked with a time code to play back the particular portion based on the received audio trigger and/or marker. In another embodiment, real-time obtained data of an environmental sound signal, e.g., the measured intensity of an environmental sound signal or the frequency characteristics of an environmental sound signal, which may be obtained from e.g. a real-time Fast-Fourier Transform (FFT) analysis of the environmental sound signal, may be compared and associated with a pre-recorded audio signal, i.e. an audio watermark. Based on predetermined and/or real-time generated matching features, such real-time obtained information of the environmental sound signal may be used as an audio trigger and/or marker to activate playback of a pre-recorded, generated and/or live-obtained audio input signal associated with a virtual sound source. Such real-time obtained data of an environmental sound signal may also be used to alter and/or modulate in real-time the properties of a pre-recorded, generated and/or live-obtained audio input signal, e.g. the higher the intensity of a particular environmental sound signal, optionally analysed, compared and matched to an audio watermark, the higher the intensity and/or activity of the pre-recorded, generated and/or live-obtained audio input signal, or vice versa. As such, many properties of an audio input signal associated with a virtual sound source may be altered and/or modulated in real-time by data obtained from the environmental sound signal.
Other implementations and/or uses of the acoustic module 124 can also be envisioned without departing from the scope of the invention.
The components of the signal processor 110 can be distributed over one or more hardware components. For example, the amplifier 122 and the acoustic module 124 may be implemented as a (single) microprocessor coupled to a memory storing computer-readable code. In general, the components may be implemented as analogue components, digital components, or a mixture of analogue and digital components.
The signal processor 110 further comprises an output 128 for providing an output signal 129. The output is connected to the loudspeaker driver 102. The connection may be wired or wireless. Thus, the output signals from the microphones 1041-4, i.e., the captured environmental sounds, are provided as input signals to the loudspeaker driver, after being processed by the signal processor.
The signal processor 110 further comprises a power source 130 for providing power to components of the signal processor. The components may be implemented as, e.g., an electric circuit board comprising a microprocessor and signal amplifiers. The power source may comprise an integrated power source such as a battery and/or a connector to connect the device to an external power source.
Because microphones from a microphone pair are positioned at equal distance from the loudspeaker driver 102, and signals from the microphones are inverted before being combined, sounds originating from the loudspeaker driver may efficiently be removed from the output that is fed to the loudspeaker driver. Thus, unwanted feedback may be prevented or at least minimised. The amplification ensures that the environmental sounds, i.e., sounds recorded by the microphones that are not coming (directly) from the loudspeaker driver, may be played back with high fidelity. Sounds coming from the loudspeaker driver that have been reflected by an object in the environment, are considered environmental sounds and will typically not be cancelled by the signal processing.
In an embodiment, the acoustic module 124 can be an external sound processor. The signal processor 110 may therefore comprise a WiFi sensor or ethernet connection to connect to an external sound processor or a digital audio network via a wired or wireless ethernet connection, and/or a connector to plug in a line-out audio cable. The signal processor may also comprise a line-in for the loudspeaker input signal to transfer the loudspeaker input signal from an external sound processor to the signal processor. This line-in may comprise a WiFi sensor to transfer wireless from a digital audio network through ethernet and/or a connector to plug in a line-in audio cable.
The environmental sound loudspeaker may comprise a location sensor to determine location information specifying a location of the device. The environmental sound loudspeaker may be configured based on the location information and, optionally, location information of other devices such as other loudspeakers. The location sensor may comprise a WiFi signal-based sensor and/or an ultra-wide band (UWB) frequency based sensor for spatial localization of the device. Alternatively or additionally, absolute position sensors such as GPS sensors may be used.
In general the behaviour of long wavelengths is different from that of short wavelengths. If the distance between the microphones is larger, the low-frequency region is decreased and the high-frequency region is increased, and thus the averaged gain will be larger, approaching +3 dB (for a system with one microphone pair) for very large distances d. The frequency-dependency will be discussed in more detail with reference to
In this context, long wavelengths, corresponding to low frequencies, may be defined as wavelengths longer than a wavelength corresponding to the second transition frequency, or approximately λ>2.5√{square root over (N)}d, with N the number of microphones. Similarly, short wavelengths, corresponding to high frequencies, may be defined as wavelengths shorter than a wavelength corresponding to the first transition frequency, i.e., λ<2 d. Wavelengths corresponding to a frequency between the first and second transition frequencies may be referred to as intermediate wavelengths. However, as will be discussed in more detail below with reference to
In the depicted case with two microphones, sounds from sources on the +/−y-axis are cancelled, because the signals arrive at the microphones with no phase difference. As the gain pattern is rotationally symmetric around the x-axis, in three dimensions, sounds from all sources on (or near) the yz-plane are not, or not effectively, recorded by the environmental sound loudspeaker. It depends on the application whether or not such a directional gain pattern is considered sufficiently omnidirectional.
Experiments in-situ have shown that the distance between the microphones may be arbitrarily small to arbitrarily large without negatively affecting the feedback sensitivity. This has been tested with varying the distance between two pairs of microphones between 8 mm to 8 m. A limiting condition on the lower side is that the minimum distance d between pairs of opposite microphones must be at least the diameter of the loudspeaker driver. However, there is no theoretical size limit to the diameter of the loudspeaker (coaxial) driver itself. An upper limit for the distance d can in practice be formed by the size of the room in which the environmental sound loudspeaker (system) is positioned, or the position of (large) obstacles in the room.
Consequently, in order to restore the original loudness balance, i.e., a frequency response that is uniformly zero (representing an output signal that is identical to the input signal), the low frequencies must be boosted by +6 dB per octave, whereas the high frequencies must be attenuated by −3 dB.
In general, there are several possible definitions for the transition frequency or transition frequencies demarcating the low and high frequencies. As depicted here, the second transition frequency 310 may be defined as
where v is the speed of sound and N the number of microphones. The value 2/5 is an approximate value determined based on simulations. This second transition frequency is the frequency where the graph representing the +6 dB boost per octave for the low frequencies intersects the frequency axis. The first transition frequency 308 may be defined as
which is the lowest frequency at which the graph representing the spatially averaged frequency-dependent gain first intersects the +3 log2(N) dB line, with N again the number of microphones. An advantage of the first transition frequency is that it is independent of the number of microphones.
It is noted that restoring the loudness balance may also be obtained by boosting the frequencies below the first transition frequency ft,1 by +6 db/octave, and attenuating all frequencies by −3 log2(N) dB. This way, only the first transition frequency needs to be determined. Additionally such an implementation may simplify the amplifier hardware and/or software.
where d1 and d2 denote the distance between the microphones in a microphone pair in, respectively, a first and a second system; e.g.,
In the depicted embodiment, the series of low-shelf filters Isfn is defined by a first transfer function
wherein G0 denotes a gain factor preferably equal to G0=1, B denotes a variable bandwidth preferably defined by
Q denotes a Q-factor determining the slope of the gain curve, the Q-factor preferably being equal to Q=5, and wherein fn denotes a central frequency of the nth low-shelf filter (n=0, 1, . . . ), preferably fn being determined or approximated by
wherein v denotes the speed of sound, and wherein N denotes the number of microphones. The first central frequency of the low-shelf filter 336 has been indicated in the figure.
The high-shelf filter hsf is defined by a second transfer function
wherein G∞ denotes a gain factor preferably equal to
wherein N denotes the number of microphones, B denotes a variable bandwidth preferably defined by
Q denotes a Q-factor determining the slope of the gain curve, the Q-factor preferably being equal to Q=5, and wherein fh denotes a central frequency of the high-shelf filter, preferably fh being determined or approximated by
wherein v denotes the speed of sound. The central frequency of the high-shelf filter 334 has been indicated in the figure.
The central frequency of a filter is generally lower than the corresponding transition frequency, for example about half an octave lower. This is because the central frequency indicates the centre of the filter, whereas the first and second transition frequencies as defined in this disclosure typically correspond to an upper cut-off frequency of the filter. Because the high-shelf filter may partially overlap the 0th low-shelf filter, the low-shelf filter central frequencies fn may be chosen slightly lower than half an octave below the second transition frequency. Additionally, the gain of the combined signal 302 (i.e., before amplification) starts to deviate from the −6/dB line already for frequencies below the second transition frequency. Consequently, it follows from the simulation that low-shelf filter central frequencies as defined in equation (9) give good results in particular in combination with a high-shelf filter as described above.
To generate the graph depicted in
Thus, an output signal may be obtained with almost identical gain as the input signal, with high fidelity of the captured environmental sound signal, i.e. optimised frequency balance in the omnidirectional field, while the risk of destructive and/or audibly disturbing feedback is entirely or at least mostly eliminated.
The measurements depicted in
In a first step, the average gain boost for frequencies with a wavelength λ≤2d of ˜+3 dB per doubling of the number of microphones may be accounted for, as the output gain should not exceed the input gain to maintain effective feedback reduction. Therefore, signal attenuation is required, in particular to eliminate feedback of the high-frequencies. This may be achieved by using a high-shelf filter. The high-shelf filter may be defined by the second transfer function as defined above in eq. (10). This corresponds to a −3 dB attenuation per doubling of the number of microphones. As the measurements depicted in
In a second step, the low frequencies may be boosted, for example using a series of n low-shelf filters (an nth-order filter). In accordance with the experimental observations and the simulated results, the low-shelf filter may be implemented with the first transfer function defined above in eq. (8). This corresponds to a low-frequency gain boost of ˜6 dB (at f=0) while the high-frequency gain (f=∞) is constrained to be 1. After applying the low-shelf filter in a series of
the equal loudness balance of the low-frequency region is restored in the obtained audio signal after combining all microphones.
If, for example, the microphones are placed relatively close together, a longer series of low-shelf filters (higher-order low-shelf filter) may be required to boost the lowest frequencies. When the microphones are placed relatively far apart, a shorter series of low-shelf filters (lower-order low-shelf filter) may be used. More in general, the second transition frequency ft,2 (and hence the central frequencies of the low-shelf filters) moves with the distance d between the microphones. Effectively this means that the low-shelf filter may become obsolete for distances between microphone pairs of d>4 m, as then
and the effective boosting of the frequencies starts below the threshold of hearing.
In some embodiments, the second transition frequency may be selected to be identical to the first transition frequency, i.e.,
In such embodiments, the high-shelf filter may be replaced by a simple multiplier reducing the signal gain with G1=√{square root over (1/2)}=−3 dB for a single microphone pair, or as Gmic=√{square root over (1/N)}=−3 log2 NdB for N microphones.
Furthermore, if a plurality of loudspeakers are configured to receive and playback the environmental sound obtained from one or more pairs of microphones integrated in (or coupled to) an environmental sound loudspeaker, as discussed in more detail below with reference to
The gain reduction of the combined signals may also be adjusted based on the inter-microphone distance d. In an embodiment where the microphones are not integrated in the same physical device as the loudspeaker driver, but are placed in an environment equidistant and symmetrically spaced around the loudspeaker driver, then for the effective distance of the microphone to the loudspeaker driver r=d/2, i.e. half the distance between two microphones, the sound pressure at the microphone decreases in inverse proportion to the distance, that is, with 1/r from the measuring point to the sound source. Consequently, doubling of the distance decreases the sound pressure to a half of its initial value.
Therefore, an additional parameter
may be introduced for gain correction of the multiplier as a function of the distance between the microphones and the loudspeaker. Thus, the attenuation may become obsolete for distance between a microphone pair of d>4 m when using one loudspeaker and two microphone pairs, as the sum result of the attenuation would amount to 0 dB.
The combined attenuation a in −dB of all contributions discussed above, in so far as they are applicable to a system under consideration, as well as any optional further contributions, may be given by
where Gref=1 is the reference gain, and i is an index through the set I={mic, sp, r, . . . } of potential gain sources.
The solid line 402 represents an unprocessed microphone input signal 1131 of a microphone 1041, integrated in the environmental sound loudspeaker. The dotted line 404 represents a first combined (summed) signal 1171 of a single microphone pair, where the input signal from one microphone has been inverted 1151, corresponding to a phase shift over a 180° phase angle. On average, the amplitude of low frequencies λ>2 d) is reduced by −24 dB whereas the amplitude of high frequencies (λ<2 d) is reduced by −6 dB.
Although the first combined signal 1171 has a much lower amplitude than the first input signal 1131, the first combined signal is not completely zero, because the signal also comprises components that are reflected in the room in which the environmental sound loudspeaker was placed. These arrive at the microphones of a microphone pair with different phases, and are therefore not annulled by the combination of the first input signal with the inverted second input signal 1151.
The dashed line 406 represents a combined and amplified signal 123 of two microphone pairs. A second combined signal 1172 of the second microphone pair has been phase-shifted over a 90° phase angle, resulting in a phase-shifted signal 119. The first combined signal and the phase-shifted signal have been combined (summed), resulting in a third combined signal. The third combined signal 121 was amplified by an amplifier 122, amplifying low frequencies by a series of five low-shelf filter and attenuating the signal −3 dB per doubling of the number of microphones, or −6 dB in total for the four microphones. Consequently, the amplifier increases the amplitude of the low frequencies (frequencies corresponding to wavelengths λ>2d) with +6 dB per octave and reduces the high frequencies (frequencies corresponding to wavelengths λ<2d) with −6 dB.
It can be seen that the environmental sound loudspeaker effectively cancels most of the sound coming from the loudspeaker itself. The remaining frequencies that are not zero may be attributed to reflections from the loudspeaker in the environment (−24 dB) that the environmental sound loudspeaker does not cancel out.
The solid line 412 represents an unprocessed microphone input signal 1131 of a microphone 1041, integrated in the environmental sound loudspeaker. The dotted line 414 represents a first combined (summed) signal 1171 of a single microphone pair, where the input signal from one microphone has been inverted 1151, corresponding to a phase shift over a 180° phase angle. On average, the amplitude of low frequencies (λ>2d) is reduced by −18 dB whereas the amplitude of high frequencies (λ<2d) is increased by up to +6 dB.
As predicted, the low frequencies of the first combined signal 1171 have a much lower amplitude than the first input signal 1131, whereas the high frequencies of the inverted and combined signal are often louder than the first input signal.
The dashed line 416 represents a combined and amplified signal 123 of two microphone pairs. A second combined signal 1172 of the second microphone pair has been phase-shifted over a 90° phase angle, resulting in a phase-shifted signal 119. The first combined signal and the phase-shifted signal have been combined (summed), resulting in a third combined signal. The third combined signal 121 was amplified by an amplifier 122, amplifying low frequencies by a series of five low-shelf filters and attenuating the signal −3 dB per doubling of the number of microphones, or −6 dB in total for four microphones. Consequently, the amplifier increases the amplitude of the low frequencies (frequencies corresponding to wavelengths λ>2d) with +6 dB per octave and reduces the high frequencies (frequencies corresponding to wavelengths λ<2d) with −6 dB.
In
In this case all axes are adjacent to each other, i.e., each microphone neighbours a microphone from each other pair of microphones. Consequently, the signals of the first, second, and third pairs of opposite microphones may be combined (added) 5161-3 into, respectively, first, second and third combined signals after inverting 5141-3 one of the each pair of signals, i.e., after applying a phase shift of Δφ=180° to one of the signals of each microphone pair. Subsequently, the first and/or second combined signals may be phase-shifted 5181 such that the first and second combined signals have a mutual phase difference of Δφ=90°, after which they may be combined (added) 520 into a fourth combined signal. Then, the fourth combined signal and/or the third combined signal may be phase shifted 5182 to obtain a mutual phase difference of Δφ=90°, and both signals may again be combined 524 into a fifth combined signal. The fifth combined signal may then be provided as input to the amplifier.
For embodiments with more (pairs of) microphones (N>6) distributed in three dimensions around the loudspeaker driver, other regular polyhedra, such as the platonic solids may be used. The microphones may be positioned at the vertices of a virtual regular polyhedron.
As an example,
The seventh combined signal is provided as input to the amplifier. The remaining components may be as discussed above with reference to
Thus, as the phase shift may vary for any angle between two microphone pairs, any amount of microphone pairs positioned at opposite ends of an axis crossing the centre of the circular loudspeaker driver may be configured as θ=360°/N (or 2π/N radians) to obtain a more or less equal pattern for the frequency sensitivity in the omnidirectional field. The subdivision preferably satisfies the condition that the axes of adjacent microphone pairs are at identical angles to each other and their positions are equidistant around the loudspeaker driver's centre.
For larger numbers of microphones (N>20), a geodesic polyhedron 626 may be formed by subdividing an icosahedron 624 into triangular faces and projecting all obtained vertices on a sphere. The new faces on the sphere are not equilateral triangles but they are approximately equal edge length. As such, a class I geodesic polyhedron with a frequency of (2,0) (depicted) would provide a solution for a microphone configuration with N=42, and one with a frequency of (3,0) would provide a configuration with N=92. For such configurations, a consistent angle θ may be obtained that is equal for all adjacent axes between two diametrically opposite vertices of the geodesic polyhedron crossing the centre of the polyhedron, and a consistent phase-shift Δφ=0 may be applied when combining each subsequent microphone pair associated with an axis.
Although embodiments with large numbers of microphones are possible and may obtain a good omnidirectional sensitivity even with non-omnidirectional microphones, and may be useful in particular in embodiments with a larger number of loudspeaker drivers, embodiments with N≤8 microphones (four or less microphone pairs) are preferred. It has been found that embodiments with N≤8 microphones strike a good balance of cost (due to e.g. components and signal processing) and sensitivity pattern. Simulations show little to no qualitative improvement of frequency sensitivity in the omnidirectional field above N=6 microphones in either a 2D or 3D implementation. As such, embodiments with N=6 microphones (three microphone pairs) are especially preferred.
The applied phase shift Δφ should preferably be frequency-independent, that is, the phase shift should be equal or at least approximately equal for all frequencies within a given frequency range. Typically, the frequency range may be set as the audible hearing range of, e.g., 16 Hz-20 KHz. A (constant) phase shift in the frequency domain corresponds to a (frequency-dependent) time shift in the time domain. Consequently, the corresponding time difference Δt may vary between multiple frequencies in the frequency range.
As depicted in
where H(t) is the impulse response of a hypothetical filter. It follows that taking the Hilbert transform of the signal is equivalent to passing it through a filter whose impulse response is
Thus the Hilbert transform is a linear operator given by convolution with the function 1/(πt). In the frequency domain, this operation imparts a phase shift of 90° (π/2 radians) to every frequency component of a function, wherein the sign of the shift depends on the sign of the frequency. Consequently, the output audio signal 706 is phase-shifted over 90° compared to the input signal and comprises one or more components sin(ωt+90°)=cos(ωt). Various methods to implement a Hilbert transform are known in the art.
As depicted in
A second copy 7142 of the first input signal 712 is amplified 724 by a second amplification factor b, resulting in a second amplified signal 726. The factors a and b are selected such that
or equivalently,
where θ is the desired phase shift.
The first amplified signal 722 and the second amplified signal 726 are combined 728, resulting in a combined signal 730. Compared to the input signal 712, the combined signal is phase-shifted with a phase shift Δφ=θ and amplified by a factor √{square root over (a2+b2)}. The combined signal may be amplified 732 with a third amplification factor c=1/√{square root over (a2+b2)}, resulting in a phase-shifted output signal 734. In general, the factors a, b, and c may be chosen such that at least one of those equals +/−1. In that case, the corresponding amplifier may be left out or may be implemented as an inverter, as the case may be.
In brief, in this method, two copies of an input signal are processed such that the copies obtain a Δφ=90° phase difference, for example by applying a Hilbert transform as described above with reference to
It follows that the phase shift Δφ is equal for every frequency f, whereas the time shift Δt may vary per frequency. The gain of the 90° phase-shifted signals combined in the additive mixer may be determined by:
for phase shifts of 0°≤θ<90°.
The resulting attenuation a in −dB may be defined as
where Gref=1 is the reference gain.
To obtain a phase shift with angles θ>90°, the signs of a and b are shifted as:
where the respective component comprises an inverting amplifier on the signal if the sign is negative.
The gain of the third amplification factor c comprises an attenuation operation applied to the final output signal to compensate for the varying amplitude resulting from the summation of the two phase-shifted signals, so that the output gain is always identical to the input gain. This results in a phase-shift and all-pass filter for all frequencies within the set frequency range (e.g., 16 Hz-20 kHz).
The gain G may be determined as c
In general, a phase shift is applied to a signal in order to be added to a non-phase-shifted signal. In some embodiments, instead of phase-shifting one signal by Δφ=0, both signals may be phase shifted provided a mutual phase-difference θ is obtained. For instance, each signal may be phase-shifted by half the amount, i.e., a phase shift Δφ=+θ/2 may be applied to the one signal and a phase shift Δφ=−θ/2 may be applied to the other signal.
An alternative method to apply an arbitrary (relative) phase shift is provided below. This method uses a combination of all-pass filters with a general frequency-dependent phase shift
where ω0 is known as the corner frequency or central frequency and Q is a quality factor. The all-pass filter may be implemented as a digital filter or as an analogue filter, e.g. an op-amp-based all-pass filter (where typically
An almost constant, frequency-independent phase shift over a frequency range of interest (ωmin, ωmax) may be obtained by using the difference between two (series of) all-pass filters.
In general, the combined function may have the following form:
Other embodiments may have a different pre-factor than −2, e.g., −2. Some embodiments may furthermore comprise an overall shift. For example, opamp-based all-pass filters may have
In general, a more constant phase shift may be obtained by using a larger amount of terms (i.e., n>1). An optimal choice of parameters may be found using known optimisation functions, for example by minimising
or a similar function, e.g., using a different weight for ω.
In order to reduce the parameter space, additional constraints may be applied. For example, the corner frequencies ω0,A (i) and ω0,B(i) may be chosen (substantially) equally spread over and/or around the frequency range of interest. The ratio of the central frequencies of corresponding terms may be substantially constant, i.e.,
for all i. The quality factors of corresponding terms may be substantially equal, i.e., QA(i)=QB(i) for all i, or even equal for all terms, i.e., QA(i)=QB(i)=Q for all i. Furthermore, the search space may be limited based on practical considerations; e.g., very high or low corner frequencies may be less practical to implement.
with n=3, and X={A, B}. The parameters have been optimised to obtain a −π/2 phase shift using the difference θA−θB, and are, in this example, given in Table 1. The parameters have been optimised to obtain a substantially constant phase shift over the interval (ωmin, ωmax)=(15 Hz, 20 kHz).
In this and other embodiments, the sounds may additionally be played back by other loudspeakers which are located in the environment and configured together into a loudspeaker system configuration. The other loudspeakers may comprise one or more environmental sound loudspeakers 808.
The played back sounds may be enhanced with added acoustical properties of a virtual environment, e.g., by audio processing the captured environmental sound signal. The audio processing may comprise methods to generate reflections, diffractions, absorption and/or reverberation to the captured sounds, in order to provide an auditive experience of a virtual and/or acoustically augmented and/or acoustically enriched environment. As a result, the subject 802 hears the sound as if diffracted, absorbed, reflecting and/or reverberating in a distinct virtual environment and/or in an acoustically augmented environment, i.e. an acoustically modified and/or enriched version of the real physical environment 804 of the subject. In this way, the human subject's experience of being present in the virtual environment becomes more lifelike compared to a system where the sounds produced by the human subject are not included in the virtual or augmented environment. Consequently, the environmental sound loudspeaker 806 creates a virtual experience that is more physically and emotionally engaging.
The played back sound signal(s) may be enhanced with added acoustical properties of a virtual environment, e.g., by audio processing the captured environmental sound signal. The audio processing may comprise methods to generate reflections, diffractions, absorption and/or reverberation to provide an auditive experience of a virtual and/or acoustically augmented and/or acoustically enriched environment. As a result, each subject 9021-3 hears the sound produced by oneself and each other as if diffracted, absorbed, reflecting and/or reverberating in a distinct virtual environment and/or in an acoustically augmented environment, i.e. an acoustically modified and/or enriched version of the real physical environment 904 of the subjects. This way, the environmental sound loudspeaker 906 makes the experience of human subjects present in one-and-the-same virtual environment more lifelike and consequently, the virtual experience becomes more physically and emotionally engaging.
As a result, both the one or more human subjects 1002 and any audience present hears the sounds produced in the concert hall 1004 as if produced in a hall 1008 that is magnified in size and loudness. The magnification depends on the time delay, gain and positioning of the environment sound loudspeakers. Thus, environmental sound loudspeakers 10061-4 create an acoustically augmented version of the real physical environment of the subject. Consequently, the acoustic experience of the musical performance in the concert hall improves, e.g., the musical experience becomes more intelligible everywhere in the room and the experienced acoustics of the hall may better match the intended musical performance.
This way, the use of environmental sound loudspeakers constitutes a solution for Active Field Control (AFC) which may be successfully achieved with a simplified and more economic design, i.e., significantly less loudspeakers and microphones and minimized computational power involved compared to conventional AFC systems.
The environmental sound loudspeaker 1106 may enhance the played back sounds with added acoustical properties of a virtual environment, e.g. by audio processing of the captured environmental sound signal. The audio processing may comprise methods to generate reflections, diffractions, absorption and/or reverberation to provide an auditive experience of a virtual and/or acoustically augmented and/or acoustically enriched environment. Additionally or alternatively, other virtual sound sources may be added to the played back sound signal(s) to enrich the environment, such as pre-recorded and/or generated voices and/or movements of virtual subjects and/or any other sources e.g. musical sounds such as a guitar 1112, nature sounds such as the chirping of (other) birds 11141,2, etc.
The environmental sound loudspeaker 1106 may enhance the virtual sound sources with the same added acoustical properties of a virtual environment as those added to the real environmental sound. As a result, the subject(s) hear all sound produced in the environment, including the sounds of oneself and all other sound sources in the environment, including the sound produced by the environment sound loudspeaker(s) and optionally other loudspeakers, which may add other virtual sound sources that are not present in the physical environment, as if diffracted, absorbed, reflecting and/or reverberating in a distinct virtual environment and/or in an acoustically augmented environment, i.e. an acoustically modified and/or enriched version of the real physical environment of the subject(s). In this way, subjects may feel more attracted to dwell in a particular environment and this may positively affect or guide the subject's behaviour in the particular environment.
The environmental sound loudspeaker 1206 may capture the sound of the subjects 12021-6 at and around the location and may immediately play back the captured sounds. The real-time captured sounds of the subjects, e.g. the school of fish, may be enhanced with added acoustical properties of a virtual environment, e.g. one comprising healthy coral reefs 1208 where healthy coral reefs are mostly absent, by audio processing of the captured environmental sounds. The audio processing may comprise methods to generate reflections, diffractions and/or absorption of the vibrations produced by the fish or other water animals to and from the virtual coral reefs. As a result, the subjects hear the sounds of themselves as if reflecting, diffracted and/or absorbed in the acoustically enriched environment with healthy coral reefs. In this way the subjects may feel more attracted to dwell in the particular area, which may be a means to enhance biodiversity of the particular area.
In this embodiment, the environmental sound loudspeaker 1206 may be adjusted to be waterproof.
In such an embodiment, the captured sound of the insects 13021-10 may function as a spatial audio attractor. For example, the one or more environment sound loudspeakers 13061-3 may be mobilized as a drone and may be moving towards a location where a particular pre-defined audio signal, e.g. the sound of swarming insects of a particular type, are tracked and captured at the highest intensity. At the same time, the environment may be enriched with an added virtual sound source which comprises frequencies which are particularly attractive to the particular subjects, e.g. a swarm of insects of a particular type, played back by each environmental sound loudspeaker; and as such, the insects and drone-mobilized environmental sound loudspeakers are automatically coupled and attracted to each other. As a result, the subjects are captured together and following the environmental sound loudspeaker. In this way, an environment sound loudspeaker may divert the subjects away from a particular area, e.g. to protect crops from injury of insects eating or burrowing in crops of plants or flowers; or, to guide the subjects towards particular areas, e.g. to stimulate pollination of plants and flowers 13081-3 by insects in particular areas.
In this embodiment, the environmental sound loudspeakers 13061-3 may be adjusted to be air-borne, for example by mounting the environmental sound loudspeakers on a drone.
In a first step 1402, the method comprises receiving a first input signal from a first microphone and a second input signal from a second microphone, each input signal representing a recorded sound. The first microphone and the second microphone may form a first microphone pair, the first microphone and the second microphone being positioned a distance d apart, the first microphone and the second microphone being positioned diametrically opposite each other and equidistant relative to a centre of a loudspeaker driver.
In an optional second step 1404, the method may comprise receiving a first additional input signal from a first additional microphone and a second additional input signal from a second additional microphone from each of one or more additional microphone pairs. Each additional microphone pair may comprise a first additional microphone and a second additional microphone positioned the distance d apart, the first and second additional microphones in each additional microphone pair being positioned diametrically opposite each other relative to the centre of the loudspeaker driver, the first microphone pair and the one or more additional microphone pairs being arranged symmetrically around the centre of the loudspeaker driver. The first microphone pair and the optional one or more additional microphone may together be referred to as the one or more microphone pairs.
As has been discussed in more detail above, for example with reference to
The steps 1402 and 1404 may be combined into a single step comprising receiving one or more first input signals and one or more second input signals from, respectively, a first microphone and a second microphone from each of one or more microphone pairs.
The microphones in the one or more microphone pairs are preferably identical. The microphones in the one or more microphone pairs are preferably omnidirectional microphones. By arranging the one or more microphone pairs symmetrically around the centre of the loudspeaker driver, a substantially omnidirectional polar pattern may be obtained based on a combination of the one or more first and second input signals.
In a step 1406, the method further comprises determining an output signal based on the first and second input signals. The determination of the output signal comprises removing from the one or more input signals, signal components representing sounds produced by the loudspeaker driver, or at least reducing or suppressing those signal components. This way, undesired feedback may be prevented, or at least sufficiently reduced for practical applications as discussed herein. The determination of the output signal further comprises restoring, or at least approximating, the loudness balance or frequency spectrum of the environmental sounds captured by the environmental sound loudspeaker.
In an optional step 1408, the method may further comprise manipulating the output signal. The manipulation preferably comprises adding reverberation and/or virtual acoustics to the output signal. Further examples of manipulation have been discussed above, for example with reference to
In a step 1410, the method comprises providing the, optionally manipulated, output signal to the loudspeaker driver.
The step 1406 may comprise multiple steps 1412-1420. Thus, the determination of the output signal may comprise, in a step 1412, inverting the first input signal and, in an embodiment with more than one microphone pair, inverting each first additional input signal. In a step 1414, the determination of the output signal may comprise combining the inverted first input signal with the second input signal from the same microphone pair. Performing this step for each of the one or more microphone pairs results in one or more combined signals. Each combined signal may be associated with the respective one or more microphone pairs.
In an embodiment with more than one microphone pair, the determination of the output signal may comprise, in a step 1416, applying a phase shift to the combined signal associated with at least one of the one or more microphone pairs. In a step 1418, the phase-shifted signal associated with a first (additional) microphone pair may be combined with a combined signal associated with a second (additional) microphone pair, resulting in a further combined signal. Typically, the combined signals associated with the one or more microphone pairs are combined one by one, possibly in parallel, with a different combined signal or with a further combined, until a single ‘fully’ combined single is obtained. Before each pairwise combination, at least one of the signals to be combined is phase-shifted to obtain a predetermined phase angle between the two signals to be combined. The predetermined phase angle is based on an angle between axes representative for each of the two signals to be combined. For a combined signal associated with a microphone pair, the representative axis is typically the axis defined by the line between the first and second microphones of the microphone pair.
In a step 1420, the determination of the output signal may further comprise amplifying the (fully) combined signal such that the output signal has a frequency spectrum that is substantially the same as a frequency spectrum of environmental sounds captured by the first and/or second microphones for frequencies in an audible frequency range, preferably substantially all frequencies in the audible frequency range. Preferably, the amplifying comprises attenuating signals with a frequency higher than a first transition frequency, preferably by −3 dB per doubling of the number of microphones. Preferably, the amplifying comprises boosting signals with a frequency lower than a second transition frequency, preferably by +6 dB per octave. The first and second transition frequencies are typically based on the distance d between the first and second microphones in each microphone pair. The second transition frequency may further be based on the number of microphone pairs.
Alternatively or additionally to amplification of the (fully) combined signal, each of the first and second input signals or each of the combined signals associated with the one or more microphone pairs may be amplified.
Memory elements 1504 may include one or more physical memory devices such as, for example, local memory 1508 and one or more bulk storage devices 1510. Local memory may refer to random access memory or other non-persistent memory device(s) generally used during actual execution of the program code. A bulk storage device may be implemented as a hard drive or other persistent data storage device. The processing system 1500 may also include one or more cache memories (not shown) that provide temporary storage of at least some program code in order to reduce the number of times program code must be retrieved from bulk storage device 1510 during execution.
Input/output (I/O) devices depicted as input device 1512 and output device 1514 optionally can be coupled to the data processing system. Examples of input device may include, but are not limited to, for example, a keyboard, a pointing device such as a mouse, or the like. Examples of output device may include, but are not limited to, for example, a monitor or display, speakers, or the like. Input device and/or output device may be coupled to data processing system either directly or through intervening I/O controllers. A network adapter 1516 may also be coupled to data processing system to enable it to become coupled to other systems, computer systems, remote network devices, and/or remote storage devices through intervening private or public networks. The network adapter may comprise a data receiver for receiving data that is transmitted by said systems, devices and/or networks to said data and a data transmitter for transmitting data to said systems, devices and/or networks. Modems, cable modems, and Ethernet cards are examples of different types of network adapter that may be used with data processing system 1500.
As pictured in
In one aspect, for example, data processing system 1500 may represent a client data processing system. In that case, application 1518 may represent a client application that, when executed, configures data processing system 1500 to perform the various functions described herein with reference to a “client”. Examples of a client can include, but are not limited to, a personal computer, a portable computer, a mobile phone, or the like. In another aspect, data processing system may represent a server. For example, data processing system may represent a server, a cloud server or a system of (cloud) servers.
Various embodiments of the invention may be implemented as a program product for use with a computer system, where the program(s) of the program product define functions of the embodiments (including the methods described herein). In one embodiment, the program(s) can be contained on a variety of non-transitory computer-readable storage media, where, as used herein, the expression “non-transitory computer readable storage media” comprises all computer-readable media, with the sole exception being a transitory, propagating signal. In another embodiment, the program(s) can be contained on a variety of transitory computer-readable storage media. Illustrative computer-readable storage media include, but are not limited to: (i) non-writable storage media (e.g., read-only memory devices within a computer such as CD-ROM disks readable by a CD-ROM drive, ROM chips or any type of solid-state non-volatile semiconductor memory) on which information is permanently stored; and (ii) writable storage media (e.g., flash memory, floppy disks within a diskette drive or hard-disk drive or any type of solid-state random-access semiconductor memory) on which alterable information is stored. The computer program may be run on the processor 1502 described herein.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used herein, the singular forms “a” “an” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms “comprises” and/or “comprising” when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
The corresponding structures, materials, acts, and equivalents of all means or step plus function elements in the claims below are intended to include any structure, material, or act for performing the function in combination with other claimed elements as specifically claimed. The description of embodiments of the present invention has been presented for purposes of illustration, but is not intended to be exhaustive or limited to the implementations in the form disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the present invention. The embodiments were chosen and described in order to best explain the principles and some practical applications of the present invention, and to enable others of ordinary skill in the art to understand the present invention for various embodiments with various modifications as are suited to the particular use contemplated.
Number | Date | Country | Kind |
---|---|---|---|
2028723 | Jul 2021 | NL | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/NL2022/050415 | 7/14/2022 | WO |