This application is based on and claims priority under 35 U.S.C. § 119 to Korean Patent Application No. 10-2018-0166605, filed on Dec. 20, 2018, in the Korean Intellectual Property Office, the disclosure of which is incorporated herein in its entirety by reference.
The disclosure relates to spatial audio recording devices, spatial audio recording methods, and electronic apparatuses including spatial audio recording devices.
Use of sensors that are mounted on household appliances, image display apparatuses, virtual reality (VR) apparatuses, augmented reality (AR) apparatuses, artificial intelligence speakers, etc. and that are capable of detecting a direction where audio comes from and recognizing voice has increased.
Sensors for detecting audio direction generally calculate a direction where audio comes from by using a time difference of audio reaching a plurality of non-directional microphones. Such a structure requires a sufficient distance between the plurality of microphones for high-quality and high-resolution audio sensing and requires a huge system size and a lot of power consumption.
The disclosure relates to spatial audio recording devices and methods capable of efficiently sensing spatial audio.
Additional aspects will be set forth in part in the description which follows and, in part, will be apparent from the description, or may be learned by practice of the presented embodiments.
In accordance with an aspect of the disclosure, a spatial audio recording device includes a plurality of directional vibrating bodies arranged such that at least one directional vibrating body from among the plurality of directional vibrating bodies selectively reacts according to a direction of input audio; a non-directional vibrating body configured to react regardless of the direction of the input audio; a read-out circuit configured to output a directional audio signal including a plurality of channels based on reactions of the plurality of directional vibrating bodies and a non-directional audio signal based on a reaction of the non-directional vibrating body; and a processor configured to correct the directional audio signal based on the non-directional audio signal.
A resolution of the plurality of directional vibrating bodies may be lower than a resolution of the non-directional vibrating body.
The processor may be further configured to select a first channel from among the plurality of channels; form an intermediate correction signal by removing a directional audio signal of at least one second channel from the non-directional audio signal; compute a ratio of signal powers of frequency bands of a directional audio signal of the first channel; and form a final correction signal by adding or deducting signal power for each frequency band of the intermediate correction signal to correspond to the computed ratio.
The at least one second channel may include a plurality of second channels, and the processor may be further configured to form the intermediate correction signal by removing every directional audio signal of the plurality of second channels from the non-directional audio signal.
The directional audio signal of the at least one second channel may include a major component including frequency bands having different signal powers and a minor component including frequency bands having a same signal power, and the processor may be further configured to form the intermediate correction signal by removing the major component from the non-directional audio signal.
The directional audio signal of the first channel may include a major component including frequency bands having different signal powers and a minor component including frequency bands having a same signal power, and the processor may be further configured to form the final correction signal by adding or deducting respective signal powers of frequency bands of the major component to correspond to the computed ratio.
The processor may be further configured to decrease signal power of the minor frequency band by half to form the final correction signal.
For each channel from among the plurality of channels, the processor may be further configured to form an intermediate correction signal by removing a directional audio signal of at least one other channel from the non-directional audio signal; compute a ratio of signal powers of frequency bands of a directional audio signal of the respective channel; and form a final correction signal by adding or deducting signal power for each frequency band of the intermediate correction signal according to the ratio.
The plurality of directional vibrating bodies may be arranged on a same plane to surround a central point on the plane, and a center of the non-directional vibrating body may be located directly above the central point in a direction perpendicular to the plane.
The plurality of directional vibrating bodies may be arranged on a plurality of planes, each plane from among the plurality of planes being located at a same distance from the non-directional vibrating body.
The plurality of planes may include a first plane and a second plane parallel to each other.
The plurality of planes may further include a third plane and a fourth plane perpendicular to the first plane and the second plane, the third plane and the fourth plane being parallel to each other.
The plurality of planes may further include a fifth plane and a sixth plane perpendicular to the first plane, the second plane, the third plane, and the fourth plane, the fifth plane and the sixth plane being parallel to each other.
An electronic apparatus may include the spatial audio recording device in accordance with the above-noted aspect of the disclosure.
The electronic apparatus may further include a multichannel speaker configured to reproduce a corrected audio signal based on the corrected directional audio signal.
The electronic apparatus may further include an omnidirectional imaging module configured to capture an image in a plurality of directions corresponding to the plurality of channels.
In accordance with an aspect of the disclosure, a spatial audio recording method includes receiving a directional audio signal including a plurality of channels from a plurality of directional vibrating bodies arranged such that at least one directional vibrating body from among the plurality of directional vibrating bodies selectively reacts according to a direction of the input audio; receiving a non-directional audio signal from a non-directional vibrating body configured to react regardless of the direction of the input audio; and correcting the directional audio signal based on the non-directional audio signal.
The correcting the directional audio signal may include selecting a first channel from among the plurality of channels; forming an intermediate correction signal by removing a directional audio signal of at least one second channel from the non-directional audio signal; computing a ratio of signal powers of frequency bands of a directional audio signal of the first channel; and forming a final correction signal by adding or deducting signal power for each frequency band of the intermediate correction signal to correspond to the ratio.
The at least one second channel may include a plurality of second channels, and the forming the intermediate correction signal may include removing every directional audio signal of the plurality of second channels from the non-directional audio signal.
The directional audio signal of the at least one second channel may include a major component including frequency bands having different signal powers and a minor component including frequency bands having a same signal power, and the forming of the intermediate correction signal may include removing the major component from the non-directional audio signal.
The directional audio signal of the first channel may include a major component including frequency bands having different signal powers and a minor component including frequency bands having a same signal power, and the forming of the final correction signal may include adding or deducting respective signal powers of frequency bands of the major component to correspond to the computed ratio.
These and/or other aspects will become apparent and more readily appreciated from the following description of embodiments, taken in conjunction with the accompanying drawings in which:
Reference will now be made in detail to embodiments, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to like elements throughout. In this regard, the present embodiments may have different forms and should not be construed as being limited to the descriptions set forth herein. Accordingly, embodiments are described below by referring to the figures merely to explain aspects. Sizes of components in the drawings may be exaggerated for convenience and clarity of description. Expressions such as “at least one of” and “at least one from among”, when preceding a list of elements, modify the entire list of elements and do not modify the individual elements of the list.
It will be understood that, when a component is referred to as being “on” another component, it may be directly or indirectly on the other component.
While such terms as “first” and “second” may be used to describe various components, such components are not limited to the above terms. The above terms are used only to distinguish one component from another. These terms are not intended to imply any difference between materials or structures of components.
As used herein, the singular forms “a,” “an,” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that, when a portion “includes” or “comprises” an element, another element may be further included, rather than excluding the existence of the other element, unless otherwise described.
Also, the terms, such as “unit” or “module”, used herein refer to a unit that processes at least one function or operation, and the unit may be implemented by hardware or software, or by a combination of hardware and software.
The operations of all methods described herein may be performed in any suitable order unless otherwise indicated herein or clearly indicated otherwise by context. The use of any and all examples, or exemplary language (e.g., “such as”) provided herein, is intended merely to better illuminate the spirit and does not pose a limitation on the scope unless otherwise claimed.
The spatial audio recording device 100 includes a plurality of directional vibrating bodies 110_k arranged so that at least one of the plurality of directional vibrating bodies 110_k may selectively react according to a direction of input audio and a non-directional vibrating body 115 which reacts regardless of the direction of input audio. When the number of the plurality of directional vibrating bodies 110_k is referred to as N, k is an integer from 1 to N. The spatial audio recording device 100 also includes a read-out circuit 170 which outputs directional audio signals of a plurality of channels and a non-directional audio signal generated in response to the input audio with respect to the plurality of directional vibrating bodies 110_k and the non-directional vibrating body 115, respectively, and a processor 180 which corrects the audio signals of a plurality of channels by referring to the non-directional audio signal. The spatial audio recording device 100 may also include a memory 190 in which a code for execution of the processor 180, an execution result of the processor 180, etc. are stored.
As shown in
A supporting portion 120 which supports the plurality of directional vibrating bodies 110_k and provides space where the plurality of directional vibrating bodies 110_k react to audio and vibrate may be inside the case 130. As shown in
The plurality of directional vibrating bodies 110_k are arranged so that one or more may selectively react according to a direction of audio input to the audio inlet 134. The plurality of directional vibrating bodies 110_k may surround the audio inlet 134. The plurality of directional vibrating bodies 110_k may be coplanar without overlapping each other and may be arranged so that all the plurality of directional vibrating bodies 110_k may be exposed with respect to the audio inlet 134. In other words, each of the plurality of directional vibrating bodies 110_k may be affected by sound passing through the audio inlet 134 in at least one direction. As shown in
The audio outlet 135 may face all of the plurality of directional vibrating bodies 110_k. A size of the audio outlet 135 shown is an example, and a size of the audio outlet 135 may be different from the size of the audio outlet 135 shown. Sizes or shapes of the audio inlet 134 and the audio outlet 135 are not particularly limited, and the audio inlet 134 and the audio outlet 135 may have any size and shape sufficient to expose the plurality of directional vibrating bodies 110_k to the same extent.
For example, the non-directional vibrating body 115 may be located within the boundary of the audio outlet 135 and may be on the same plane as the plurality of directional vibrating bodies 110_k. However, the disclosure is not limited thereto, and the non-directional vibrating body 115 may be on a different plane. As shown, the plurality of directional vibrating bodies 110_k may surround the non-directional vibrating body 115. However, a location of the non-directional vibrating body 115 is not limited thereto, and the non-directional vibrating body 115 may be at other various locations. For example, the non-directional vibrating body 115 may be outside the case 130.
Unlike the plurality of directional vibrating bodies 110_k, the non-directional vibrating body 115 may have the same output or almost the same output with respect to audio input from every direction. To this end, the non-directional vibrating body 115 may have the form of a circular thin film. When the non-directional vibrating body 115 is within the boundary of the audio outlet 135, a center of the non-directional vibrating body 115 having a circular shape may coincide with a central point of the audio outlet 135.
Physical angle resolution, that is, accuracy of detecting the traveling direction of incident audio, of the spatial audio recording device 100 may be determined by a number N of the plurality of directional vibrating bodies 110_k. The spatial audio recording device 100 may detect a direction of incident sound by comparing respective sizes of output signals of the plurality of directional vibrating bodies 110_k and as the number of the plurality of directional vibrating bodies 110_k to be compared with each other increases, the traveling direction of incident audio may be more precisely determined.
Sensitivity resolution where the plurality of directional vibrating bodies 110_k each sense audio may be determined by a circuit element that converts such vibrational movement into an electrical signal when the plurality of directional vibrating bodies 110_k react to an external force and move (i.e., vibrate). To increase resolution, a more complex and fine circuit element is required, and as the number of the plurality of directional vibrating bodies 110_k, that is, N, increases, complexity of a system increases. Such circuit elements may be included in the read-out circuit 170. Although the read-out circuit 170 is shown with a block diagram in the drawing, in order to read signals received in the plurality of directional vibrating bodies 110_k and the non-directional vibrating body 115, individual circuit elements constituting the read-out circuit 170 may be electrically connected to each one of the plurality of directional vibrating bodies 110_k and to the non-directional vibrating body 115, respectively, and may be arranged inside the case 130. As a system becomes complex with a demand for a fine circuit element, a volume of the spatial audio recording device 100 increases, and power consumption also increases.
The spatial audio recording device 100 according to an embodiment may set resolution of the plurality of directional vibrating bodies 110_k to be lower than that of the non-directional vibrating body 115. As described above, increasing resolution of the plurality of directional vibrating bodies 110_k involves increasing a volume of the overall system, complexity, and power consumption. To more efficiently increase resolution where the spatial audio recording device 100 senses audio, the non-directional vibrating body 115 may be allowed to have high resolution and the plurality of directional vibrating bodies 110_k may be allowed to have relatively low resolution. For example, resolution of the plurality of directional vibrating bodies 110_k may be equal to or lower than 1/10 of resolution of the non-directional vibrating body 115. The processor 180 of the spatial audio recording device 100 may correct an output signal of the plurality of directional vibrating bodies 110_k of such low resolution so as to approach the original audio by using an output signal of the non-directional vibrating body 115.
A spatial audio recording method according to an embodiment will now be described in detail with reference to
Referring to
Next, the directional audio signal of a plurality of channels is corrected with reference to the non-directional audio signal (S30). The directional audio signal may be a signal that has low resolution compared to the non-directional audio signal. Multiple directional vibrating bodies, that is, as many directional vibrating bodies as possible, are provided to obtain directional nature, and thus, when all of the directional vibrating bodies are allowed to have high resolution, complexity of system and power consumption may increase significantly. Accordingly, a spatial audio recording method according to an embodiment involves correcting a directional audio signal with relatively low resolution by referring to a non-directional audio signal obtained at relatively high resolution.
Referring to
Next, an intermediate correction signal is formed by removing an audio signal of another channel (i.e., a second channel) other than the target channel from the non-directional audio signal (S33). When there is more than one audio signal of another channel, all other audio signals may be used to form the intermediate correction signal.
Next, a final correction signal is formed by adding or deducting the intermediate correction signal according to a power ratio for each frequency of a target channel audio signal (S35).
The above process of forming an intermediate correction signal and forming a final correction signal to reconstruct an audio signal of a target channel from a non-directional audio signal of high resolution will now be described by illustrating an example directional audio signal graph and non-directional audio signal graph.
First original audio OR1 shown in
Second original audio OR2 shown in
A mixed signal SG0 is a signal where audio from every direction is mixed together and directional nature thereof is not distinguished by a non-directional vibrating body, and is a signal that has high resolution compared to the first signal SG1 and the second signal SG2 having directional nature.
The first signal SG1 and the second signal SG2 of
Referring to
In the description, a channel of the second signal SG2 is a target channel and an example of a signal of another channel other than the target channel is the first signal SG1. However, signals for more channels than the single first signal SG1 may be considered. In such a case, the intermediate correction signal SG_TM may be extracted by deducting all major components of signals of a plurality of other channels.
Referring to
The second signal SG2 of
In other words, signal values of frequency bands f1, f2, and f5 in the intermediate correction signal SG_TM are adjusted to match a relative ratio of signal values in frequency bands f1, f2, and f5 of the second signal SG2. To this end, initially, a signal value P0 of the frequency band f1 may be amplified to match a signal value P1 of the frequency band f1 of the second signal SG2.
Next, based on the above correction, signal values of frequency bands f2 and f5 may be corrected to match a ratio of a signal value in the frequency band f1, a signal value in the frequency band f2, and a signal value in the frequency band f5 with a ratio in the frequency bands of the second signal SG2 of
Signal values of the minor frequency bands f3, f4, f6, and f7 may be reduced by half. Such deduction correction is performed because a signal value of a minor frequency band includes a minor component and a noise component of an audio signal of another channel other than the target channel. However, a decrease by half is an example, and a decrease by another proportion is also possible.
According to the above method, it is possible to correct the second signal SG2 of relatively low resolution, which has directional nature, so as to be close to original audio by using the mixed signal SG0 of high resolution, which has no directional nature, and the first signal SG1 having different directional nature.
In the above description, setting the second signal SG2 as a target channel is given as an example, and the first signal SG1 may be set as a target channel and be corrected so as to approach the original audio through similar processes.
In the above description, directional audio signals of two channels, that is, the first signal SG1 and the second signal SG2, are given as an example. However, the disclosure is not limited thereto. For example, when audio signals of a plurality of three or more channels are obtained, a process of forming a final correction signal may be repeated by successively selecting all of the plurality of channels as a target channel one by one. Accordingly, every directional nature included in original audio may be estimated, and a related audio signal may be corrected so as to approach the original audio.
The spatial audio recording device 200 may have substantially the same configurations as the spatial audio recording device 100 of
The plurality of directional vibrating bodies 110_k may be arranged on a plurality of planes located at the same distance from the non-directional vibrating body 115. As shown, the plurality of directional vibrating bodies 110_k may be arranged on two planes spaced parallel to each other with the non-directional vibrating body 115 located therebetween. That is, some of the plurality of directional vibrating bodies 110_k may be arranged on a plane parallel to the XY plane of
The spatial audio recording device 300 is different from the spatial audio recording device 200 of
The plurality of directional vibrating bodies 110_k may be divided into the first group GR1, the second group GR2, the third group GR3, and the fourth group GR4. The first group GR1 and the second group GR2 may be respectively located on two planes parallel to the XY plane, and the third group GR3 and the fourth group GR4 may be respectively located on two planes parallel to the YZ plane.
In some embodiments, the plurality of directional vibrating bodies 110_k may be arranged on two planes parallel to the XY plane, two planes parallel to the YZ plane, and two planes parallel to the XZ plane with the non-directional vibrating body 115 at the center.
A spatial audio recording device according to the previous embodiments may be used in various electronic apparatuses. The spatial audio recording device may be realized as a sensor in the form of a chip to perform sound source tracking, noise removal, spatial recording, etc. in the field of mobile devices, information technology (IT), household appliances, automobiles, etc. and may also be used in the field of panoramic exposure, augmented reality (AR), virtual reality (VR), etc.
Electronic apparatuses using a spatial audio recording device according to an embodiment will now be described.
The electronic apparatus 500 is a spatial audio recording/reproduction apparatus.
The electronic apparatus 500 includes a spatial audio recording device 510 and a multichannel speaker 550 for reproducing recorded audio in accordance with directional nature. The electronic apparatus 500 may also include a memory 530 for storing a signal sensed and corrected in the spatial audio recording device 510, and a processor 520 for controlling the multichannel speaker 550 to reproduce an audio signal stored in the memory 530 in accordance with directional nature.
Any one of the spatial audio recording devices 100, 200, and 300 according to previous embodiments or a modified and combined structure thereof may be used as the spatial audio recording device 510. As described above, the spatial audio recording device 510 may estimate directional nature of surrounding audio and may correct a sensed audio signal so as to be close to original audio.
The memory 530 may store a program for signal processing of the processor 520 and may store an execution result of the processor 520. In addition, the memory 530 may store various programs and pieces of data required for the processor 520 to control an overall operation of the electronic apparatus 500.
The memory 530 may include at least one type of storage medium from among flash memory type memory, hard disk type memory, multimedia card micro type memory, card type memory (e.g., secure digital (SD) or extreme digital (XD) memory, etc.), random access memory (RAM), static random access memory (SRAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), programmable read-only memory (PROM), magnetic memory, magnetic disk, and optical disk.
The electronic apparatus 500 may perform recording focused on an intended sound source or may selectively record only an intended sound source by using a result of estimating an input direction of audio.
The electronic apparatus 500 may sense, correct, and record directional audio and reproduce a recorded sound source in accordance with directional nature and thus may augment realism of content and improve level of immersion and feeling of reality.
The electronic apparatus 500 may be used in an AR or VR apparatus.
The electronic apparatus 600 is an omnidirectional camera capable of performing panoramic exposure on an object placed in any direction. The electronic apparatus 600 includes a spatial audio recording device 610, an omnidirectional imaging module 640, a processor 620 for controlling the spatial audio recording device 610 and the omnidirectional imaging module 640 to match a directional audio signal sensed in the spatial audio recording device 610 with an omnidirectional image signal captured in the omnidirectional imaging module 640, and a memory 630 for storing the directional audio signal and the omnidirectional image signal.
A general panoramic exposure module may be used as the omnidirectional imaging module 640, and for example, a form including configurations of optical lenses and an image sensor in a 360-degree rotatable main body may be used.
The spatial audio recording device 610 may be any one of the spatial audio recording devices 100, 200, and 300 according to previous embodiments or may have a modified and combined structure thereof. As described above, the spatial audio recording device 610 may estimate directional nature of surrounding audio and may correct a sensed audio signal so as to approach the original audio.
According to control of the processor 620, from among signals sensed in the spatial audio recording device 610, audio of a direction corresponding to a capturing direction of the omnidirectional imaging module 640 may be selectively stored in the memory 630. As described above, a 360° panoramic image signal and an audio signal matching the panoramic image may be stored in the memory 630 by the electronic apparatus 600. Such image/audio information may be reproduced by a display apparatus including a multichannel speaker and may maximize realism and may also be used in an AR/VR apparatus.
Electronic apparatuses described herein may include a processor, a memory for storing and executing program data, a permanent storage unit such as a disk drive, a communication port for communicating with an external apparatus, and a user interface apparatus such as a touch panel, a key, a button, etc.
Methods implemented by software modules or algorithms in electronic apparatuses described herein may be stored as program instructions or computer-readable codes executable on the processor on a computer-readable medium. Examples of the computer-readable recording medium include magnetic storage media (e.g., ROM, RAM, floppy disk, hard disk, etc.) and optical recording media (e.g., CD-ROM, DVD, etc.). The computer-readable recording medium may also be distributed over network-coupled computer systems so that the computer-readable code is stored and executed in a distributive manner. This medium may be read by the computer, stored in the memory, and executed by the processor.
According to one or more embodiments, a spatial audio recording device and method make it possible to sense and record spatial audio with low power consumption by using a non-directional vibrating body and a plurality of directional vibrating bodies.
According to one or more embodiments, a spatial audio recording device may be used in various electronic apparatuses that may utilize the sensed spatial audio.
It should be understood that embodiments described herein should be considered in a descriptive sense only and not for purposes of limitation. Descriptions of features or aspects within each embodiment should typically be considered as available for other similar features or aspects in other embodiments.
While one or more embodiments have been described with reference to the figures, it will be understood by those of ordinary skill in the art that various changes in form and details may be made therein without departing from the spirit and scope as defined by the following claims.
Number | Date | Country | Kind |
---|---|---|---|
10-2018-0166605 | Dec 2018 | KR | national |
Number | Name | Date | Kind |
---|---|---|---|
5193117 | Ono | Mar 1993 | A |
5471538 | Sasaki | Nov 1995 | A |
9924264 | Yoshino | Mar 2018 | B2 |
20120140948 | Terada | Jun 2012 | A1 |
20120257779 | Kimura | Oct 2012 | A1 |
20150281834 | Takano | Oct 2015 | A1 |
20160157011 | Yoo | Jun 2016 | A1 |
20170013355 | Kim | Jan 2017 | A1 |
20190072635 | Kang | Mar 2019 | A1 |
20190174244 | Kim | Jun 2019 | A1 |
20190387285 | Ferren | Dec 2019 | A1 |
20200068302 | Kang | Feb 2020 | A1 |
Number | Date | Country |
---|---|---|
1994-031837 | Aug 1994 | JP |
2010057167 | Mar 2010 | JP |
2012-104905 | May 2012 | JP |
2017-028603 | Feb 2017 | JP |
10-1521363 | May 2015 | KR |
Entry |
---|
Brown, Eric. “Matrix Voice RPi Add-on with FPGA-Driven Mic Array Relaunches.” LinuxGizmos.com, Jan. 22, 2018, linuxgizmos.com/matrix-voice-rpi-add-on-with-7-mic-array-relaunches/. |
“Vocal Technologies.” Vocal.com, www.vocal.com/beamforming-2/acoustic-source-localization-using-circular-array-microphones/. |
Williams , Michael. “The 'Williams Star' Surround Microphone Array.” Posthorn, 91st Audio Engineering Society Convention in New York, Oct. 1991, www.posthorn.com/Micarray_williamsstar.html. |
Number | Date | Country | |
---|---|---|---|
20200204910 A1 | Jun 2020 | US |