This patent application is a U.S. National Stage application of International Patent Application Number PCT/IB2019/051040 filed Feb. 8, 2019, which is hereby incorporated by reference in its entirety, and claims priority to EP 18157327.0 filed Feb. 19, 2018.
This specification relates to receiving audio data from multiple directions using a user device.
When using a user device, such as a mobile communication device, to receive audio data regarding a scene, it is possible to move the user device such that different parts of the scene can be captured. An audio focus arrangement can be provided in which audio is boosted in the direction in which the user device is directed. This can lead to boosting of unwanted noise or to privacy concerns.
In a first aspect, this specification describes a method comprising: receiving audio data from multiple directions at a first user device; receiving instructions at the first user device from a remote device; and generating an audio focus arrangement, wherein the audio focus arrangement is a direction-dependent amplification of the received audio data and wherein the audio focus arrangement is dependent on an orientation direction of the first user device and is modified in accordance with the instructions from the remote device. Modifying the audio focus arrangement may include one of: attenuating audio from a first direction; neither attenuating nor amplifying audio from the first direction; and amplifying audio from the first direction.
An audio output may be generated based on the received audio data and the generated audio focus arrangement.
The audio data may be amplified when the audio data is received from a direction within the audio focus arrangement.
The generated audio focus arrangement may include amplifying the audio data when the audio data is in the orientation direction of the user device, unless the instructions from the remote device instruct otherwise.
Modifying the audio focus arrangement may include modifying the audio focus arrangement in a direction of said remote device relative to the first user device. Alternatively, or in addition, modifying the audio focus arrangement may include modifying the audio focus arrangement in a direction indicated by the remote device.
The said instructions may be generated automatically by the remote device.
In some example embodiments, instructions may be received at the first user device from one or more further remote devices and the audio focus arrangement may be modified in accordance with the instructions from the one or more further remote devices.
In a second aspect, this specification describes an apparatus configured to perform any method as described with reference to the first aspect.
In a third aspect, this specification describes computer-readable instructions which, when executed by computing apparatus, cause the computing apparatus to perform any method as described with reference to the first aspect.
In a fourth aspect, this specification describes an apparatus comprising: means (such as one or more microphones) for receiving audio data from multiple directions; means (such as an input) for receiving instructions from a remote device; and means (such as a processor) for generating an audio focus arrangement, wherein the audio focus arrangement is a direction-dependent amplification of the received audio data and wherein the audio focus arrangement is dependent on an orientation direction of the apparatus and is modified in accordance with the instructions from the remote device.
The apparatus may further comprise means (such as an output) for providing an audio output based on the received audio data and the generated audio focus arrangement.
The means for generating the audio focus arrangement may be configured to modify the audio focus arrangement either in a direction of said remote device relative to the first user device and/or in a direction indicated by the remote device.
The audio focus arrangement may be configured to perform one or more of: attenuating audio from a first direction; neither attenuating nor amplifying audio from the first direction; and amplifying audio from the first direction.
The apparatus may be a mobile communication device.
In a fifth aspect, this specification describes an apparatus comprising: means for receiving audio data from multiple directions at a first user device; means for receiving instructions at the first user device from a remote device; and means for generating an audio focus arrangement, wherein the audio focus arrangement is a direction-dependent amplification of the received audio data and wherein the audio focus arrangement is dependent on an orientation direction of the first user device and is modified in accordance with the instructions from the remote device.
In a sixth aspect, this specification describes a computer readable medium comprising program instructions for causing an apparatus to perform at least the following: receive audio data from multiple directions at a first user device; receive instructions at the first user device from a remote device; and generate an audio focus arrangement, wherein the audio focus arrangement is a direction-dependent amplification of the received audio data and wherein the audio focus arrangement is dependent on an orientation direction of the first user device and is modified in accordance with the instructions from the remote device.
In a seventh aspect, this specification describes an apparatus comprising: at least one processor; at least one memory including computer program code which, when executed by the at least one processor, causes the apparatus to: receive audio data from multiple directions at a first user device; receive instructions at the first user device from a remote device; and generate an audio focus arrangement, wherein the audio focus arrangement is a direction-dependent amplification of the received audio data and wherein the audio focus arrangement is dependent on an orientation direction of the first user device and is modified in accordance with the instructions from the remote device.
In an eighth aspect, this specification describes a non-transitory computer readable medium comprising program instructions for causing an apparatus to perform at least the following: receive audio data from multiple directions at a first user device; receive instructions at the first user device from a remote device; and generate an audio focus arrangement, wherein the audio focus arrangement is a direction-dependent amplification of the received audio data and wherein the audio focus arrangement is dependent on an orientation direction of the first user device and is modified in accordance with the instructions from the remote device
Example embodiments will now be described, by way of non-limiting examples, with reference to the following schematic drawings, in which:
As described further below, the audio focus beam 8 is typically used to amplify audio recorded in a direction of orientation of the first user device 2. By way of example, in the example system 1, the audio focus beam is directed towards the third audio source 6. Thus, for example, the first user device 2 can be moved to capture audio and video in different directions, with the audio being amplified in the direction in which the video images are being taken at the time. Moreover, in some example embodiments, video and audio data may be captured in different directions (providing, in effect, different video and audio focus beams).
In the system 20a, the first user device 12 is directed towards the second audio object 15. As shown in
Consider the following arrangement in which the third source 16 is a source of potentially disturbing sounds. By way of example, consider a children's party in which the first, second, third and fourth objects represent children at the party. Assume that the third object 16 represents a child who is crying. Consider now a scenario in which the user device 12 is being used to take a video and audio recording of the birthday party by sweeping the video recording across the audio objects (for example, from being focused on the second object 15 as shown in
The system 30 also includes a second user device 39 (such as a mobile communication device) that may be similar to the first user device 32 described above. The second user device 39 is at or near the third audio object 36. The second user device 39 sends a message (labelled 39a in
The message 39a may take many forms. By way of example, the message 39a may make use of local communication protocols, such as Bluetooth® to transmit messages to other user devices (such as the first user device 32) in the vicinity of the second user device 39. The skilled person will be aware of many other suitable message formats.
It should be noted that the width of the audio focus beam 38 in the system 30 (and the width of comparable audio focus beams in other embodiments) may be a definable parameter and may, for example, be set by a second user device 39. Alternatively, that parameter could be pre-set or set in some other way.
In the event that the direction determined in operation 42 is an audio focus direction, then the algorithm 40 moves to operation 46, where the normal audio focus is used, such that audio in the relevant direction captured by the user device 32 is amplified. If the direction determined in operation 42 is not an audio focus direction, then the algorithm moves to operation 48, where the captured audio in the relevant direction is attenuated (or, in some embodiments, not amplified).
The message 39a described above may be sent from the second user device 39 to the first user device 32 in a number of ways. For example, the user of the device 39 (such as a parent of the child that forms the audio object 36) may select an ‘unhear me’ option on the second user device 39, which causes the message 39a to be output using the Bluetooth® standard, or some other messaging scheme. The skilled person will be aware of many other suitable mechanisms for sending such a message.
Many mechanisms exist for implementing the audio focus arrangement described above. Different arrangements are described below, by way of example, with references to
The systems 50a, 50b and 50c include the first to fourth audio objects 34 to 37 described above and also include a user device 52 (similar to the user devices 2, 12 and 32 described above). In
Assume that the third object 36 is deemed to be a noisy object. Thus, when the user device 52 is directed towards the third object 36, the operation 44 in the algorithm 40 is answered in the negative (such that the algorithm 40 moves to operation 48). When the user device 52 is directed in any other direction, then the operation 44 is answered in the positive (such that the algorithm 40 moves to operation 46).
When the user device 52 is directed towards the second audio object 35 (as shown in
When the user device 52 is directed towards the third audio object 36 (as shown in
When the user device 52 is directed towards the fourth audio object 37 (as shown in
It can be seen in
There are many alternatives to the arrangement described above with reference to
The systems 60a, 60b, 60c and 60d include the first to fourth audio objects 34 to 37 described above and also include a user device 62 (similar to the user devices 2, 12, 32 and 52 described above). In
Assume, once again, that the third object 36 is deemed to be a noisy object. Thus, when the user device 62 is directed towards the third object 36, the operation 44 in the algorithm 40 is answered in the negative (such that the algorithm 40 moves to operation 48). When the user device 62 is directed in any other direction, then the operation 44 is answered in the positive (such that the algorithm 40 moves to operation 46).
When the user device 62 is directed towards the second audio object 35 (as shown in
When the user device 62 is directed between the second object 35 and the third object 36 (as shown in
When the user device 62 is directed between the third object 36 and the fourth object 37 (as shown in
When the user device 62 is directed towards the fourth audio object 47 (as shown in
As described above with reference to
The system 70 includes the first to fourth audio objects 34 to 37 described above and also include a user device 72 (similar to the user devices 2, 12, 32, 52 and 62 described above). In
Assume that the third object 36 is deemed to be a noisy object. Thus, when the user device 72 is directed towards the third object 36, the operation 44 in the algorithm 40 is answered in the negative (such that the algorithm 40 moves to operation 48). When the user device 72 is directed in any other direction, then the operation 44 is answered in the positive (such that the algorithm 40 moves to operation 46).
In the system 70, there is no audio focus beam directed towards the third object 36, but audio focus regions 75 and 76 are shown either side of the third object 36. (This can be considered to be an audio focus beam 74 with the portion directed towards the third object 36 omitted.) Thus, audio from all directions other than the direction of the object 36 can be boosted. It should be noted that the width of the portion missing from the audio focus beam 74 could be a definable parameter and may, for example, be set by a remote device (such as the remote device 39 described above). Alternatively, that parameter could be pre-set.
As described above with reference to
The attenuate flag 84 may be associated with the direction of the user device 39 such that operation 44 of the algorithm 40 can be implemented by determining whether an attenuate flag has been set for the direction identified in operation 42. Of course, this functionality could be implemented in many different ways. In particular, not all embodiments include an attenuation—in many examples described herein unamplified directions are neither amplified nor attenuated.
The second user device may take many forms. For example, the second user device could be a mobile communication device, such as a mobile phone. However, this is not essential to all embodiments. For example, the second user device may be a wearable device, such as a watch or a fitness monitor.
The principles described herein are not restricted to dealing with issues of noise. For example, the ‘unhear me’ arrangement may be used for privacy purposes. For example, a person may be having a conversation that is not related to a scene being captured by the first user device 2, 12, 32, 52, 62, 72. The ‘unhear me’ setting described herein can be used to attenuate (or at least not amplify) such a conversation. By way of example, a user may receive a telephone call on a user device (such as the second user device 39). In order to keep that telephone call private, the user may make use of the ‘unhear me’ feature described herein to prevent sounds from that call being captured by the first user device.
In some example embodiments, a mobile device receiving or initiating a telephone call will indicate an ‘unhear me’ control message to all nearby mobile devices. In such an embodiment, the ‘unhear me’ control message may be output automatically by the mobile device when a telephone call is received or initiated.
The embodiments described above relate to controlling the use of an audio focus arrangement of a user device when capturing audio data. It is also possible to use the principles described herein to modify an audio focus arrangement in different ways.
As described above, the first audio focus beam 110 is typically used to amplify audio in a direction of orientation of the first user device 102. Thus, for example, the first user device 102 can be moved to capture audio and video in different directions, with the audio being amplified in the direction in which the video images are being taken at the time.
The system 100 also includes a second user device 109 (similar to the user device 39 described above). The second user device 109 is at or near the third audio object 106. The second user device 109 sends a message (labelled 109a in
The message 109a described above may be sent from the second user device 109 to the first user device 102 in a number of ways. For example, the user of the device 109 (such as a parent of the child that forms the audio object 36) may select an ‘hear me’ option on the second user device 109, which causes the message 109a to be output using the Bluetooth® standard, or some other messaging scheme. The skilled person will be aware of many other suitable mechanisms for sending such a message.
The boost flag 134 may be associated with the direction of the second user device 109 such that audio data received at the first user device 102 in the direction indicated in the boost flag is boosted. The boost flag may therefore be used in the operation 124 of the algorithm 120 described above. Of course, this functionality could be implemented in many different ways.
In the algorithms 80, 90 and 130 described above, the direction of the second user device relative to the first user device is deemed to be the relevant direction for the instruction. This is not essential to all embodiments. For example, the message sent by the second user device 39 or 109 may include direction, location or some other data, such that the second user device 39 or 109 can be used to modify the audio amplification functionality of the first user device in some other direction. For example, in the example system 30 described above with reference to
The algorithm 40 described above may be extended such that multiple areas are defined for which the audio should be attenuated (or at least not amplified). Similarly, the algorithm 120 may be extended such that multiple area are defined for which audio should be amplified. Furthermore, the algorithms 40 and 120 described above may be combined such one or more areas may be defined for which audio should be attenuated (or at least not amplified) and one or more areas may be defined for which audio should be boosted.
Many implementations of the principles described herein are possible. By way of example, a first user may use a first user device (such as any one of the user devices 2, 12, 32, 52, 62, 72 or 102) to obtain audio data (and optionally also video images). At the same time, a second user may use a second user device (such as the user device 39 or 109) to define audio boosting and/or audio attenuation areas within a defined space (such audio boosting and/or audio attenuation being the boosting or attenuation of the audio content captured by the first user device).
In this way, the first user can concentrate on capturing the audio data (and, optionally, video data), whilst the second user can concentrate on the appropriate audio requirements (such as attenuating audio in the direction of a crying child or boosting audio in the direction of someone giving a speech). Returning to example of a children's party, the second user may define zones in which audio focus should not be applied (e.g. due to one or more noisy or crying children) and/or may define one or more zones, other than the orientation direction of the first user device, in which audio focus should be applied (e.g. the direction from which a parent is singing to the children at the party).
In some implementations, a user may make use of a remote device (such as the second user device 39 or 109) to indicate a noise source. This is not essential. For example, an audio analysis engine may be used to automatically detect noise sources. For example, such an audio analysis engine may analyse the content of its closest sounds sources and compare the obtained pattern to a database of noise sources and at least one threshold level. This may allow for automatic creation and sending of messages such as the ‘unhear me’ message 82 discussed above.
At least some of the embodiments described herein may make use of spatial audio techniques in which an array of microphones is used to capture a sound scene and subjected to parametric spatial audio processing so that, during rendering, sounds are presented so that sounds are heard as if coming from directions around the user that match video recordings. Such techniques are known, for example, in virtual reality or augmented reality applications. Such spatial audio processing may involve estimating the directional portion of the sound scene and the ambient portion of the sound scene.
For completeness,
The processor 302 is connected to each of the other components in order to control operation thereof.
The memory 304 may comprise a non-volatile memory, such as a hard disk drive (HDD) or a solid state drive (SSD). The ROM 312 of the memory 304 stores, amongst other things, an operating system 315 and may store software applications 316. The RAM 314 of the memory 304 is used by the processor 302 for the temporary storage of data. The operating system 315 may contain code which, when executed by the processor implements aspects of the algorithms 40, 80, 90, 120, 130 and 140 described above.
The processor 302 may take any suitable form. For instance, it may be a microcontroller, plural microcontrollers, a processor, or plural processors.
The processing system 300 may be a standalone computer, a server, a console, or a network thereof.
In some embodiments, the processing system 300 may also be associated with external software applications. These may be applications stored on a remote server device and may run partly or exclusively on the remote server device. These applications may be termed cloud-hosted applications. The processing system 300 may be in communication with the remote server device in order to utilize the software application stored there.
Embodiments of the present invention may be implemented in software, hardware, application logic or a combination of software, hardware and application logic. The software, application logic and/or hardware may reside on memory, or any computer media. In an example embodiment, the application logic, software or an instruction set is maintained on any one of various conventional computer-readable media. In the context of this document, a “memory” or “computer-readable medium” may be any non-transitory media or means that can contain, store, communicate, propagate or transport the instructions for use by or in connection with an instruction execution system, apparatus, or device, such as a computer.
Reference to, where relevant, “computer-readable storage medium”, “computer program product”, “tangibly embodied computer program” etc., or a “processor” or “processing circuitry” etc. should be understood to encompass not only computers having differing architectures such as single/multi-processor architectures and sequencers/parallel architectures, but also specialised circuits such as field programmable gate arrays FPGA, application specify circuits ASIC, signal processing devices and other devices. References to computer program, instructions, code etc. should be understood to express software for a programmable processor firmware such as the programmable content of a hardware device as instructions for a processor or configured or configuration settings for a fixed function device, gate array, programmable logic device, etc.
As used in this application, the term “circuitry” refers to all of the following: (a) hardware-only circuit implementations (such as implementations in only analogue and/or digital circuitry) and (b) to combinations of circuits and software (and/or firmware), such as (as applicable): (i) to a combination of processor(s) or (ii) to portions of processor(s)/software (including digital signal processor(s)), software, and memory(ies) that work together to cause an apparatus, such as a server, to perform various functions) and (c) to circuits, such as a microprocessor(s) or a portion of a microprocessor(s), that require software or firmware for operation, even if the software or firmware is not physically present.
If desired, the different functions discussed herein may be performed in a different order and/or concurrently with each other. Furthermore, if desired, one or more of the above-described functions may be optional or may be combined. Similarly, it will also be appreciated that the flow diagrams of
It will be appreciated that the above described example embodiments are purely illustrative and are not limiting on the scope of the invention. Other variations and modifications will be apparent to persons skilled in the art upon reading the present specification.
Moreover, the disclosure of the present application should be understood to include any novel features or any novel combination of features either explicitly or implicitly disclosed herein or any generalization thereof and during the prosecution of the present application or of any application derived therefrom, new claims may be formulated to cover any such features and/or combination of such features.
Although various aspects of the invention are set out in the independent claims, other aspects of the invention comprise other combinations of features from the described embodiments and/or the dependent claims with the features of the independent claims, and not solely the combinations explicitly set out in the claims.
It is also noted herein that while the above describes various examples, these descriptions should not be viewed in a limiting sense. Rather, there are several variations and modifications which may be made without departing from the scope of the present invention as defined in the appended claims.
Number | Date | Country | Kind |
---|---|---|---|
18157327 | Feb 2018 | EP | regional |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/IB2019/051040 | 2/8/2019 | WO | 00 |
Publishing Document | Publishing Date | Country | Kind |
---|---|---|---|
WO2019/159050 | 8/22/2019 | WO | A |
Number | Name | Date | Kind |
---|---|---|---|
8525868 | Forutanpour et al. | Sep 2013 | B2 |
20080130918 | Kimijima | Jun 2008 | A1 |
20100019715 | Roe et al. | Jan 2010 | A1 |
20100195836 | Platz | Aug 2010 | A1 |
20120330653 | Lissek | Dec 2012 | A1 |
20130342731 | Lee | Dec 2013 | A1 |
20180270602 | Laaksonen | Sep 2018 | A1 |
20190088099 | Garg | Mar 2019 | A1 |
Number | Date | Country | |
---|---|---|---|
20200382864 A1 | Dec 2020 | US |