Examples of the disclosure relate to an apparatus, method and computer program for enabling audio zooming. Some relate to an apparatus, method and computer program for enabling audio zooming while maintaining signal levels.
Audio zoom is an audio operation where sound sources in one or more directions can be amplified compared to sound sources in other directions. This can be achieved using two or more microphones and beamforming.
According to various, but not necessarily all, examples of the disclosure there is provided an apparatus comprising means for:
The first direction may be within a region of interest and the second direction may be outside of the region of interest.
The amount of headroom provided may be controlled so as to enable audio zooming.
If the sound energy in the at least one second direction is not different to the sound energy by at least the threshold amount then the amount of headroom may be controlled to be large enough to enable amplification of the audio signal when audio zooming is selected.
If the sound energy in the at least one second direction is different to the sound energy by at least the threshold amount then the amount of headroom may be controlled to not be large enough to enable amplification of the audio signal when audio zooming is selected.
If the headroom provided is not large enough to enable amplification of the audio signal when audio zooming is selected the apparatus may be configured to enable audio zooming by attenuation of unwanted sound sources.
The means may be for detecting a change in whether or not the sound energy in the at least one first direction is higher than sound energy in the least one second direction by at least the threshold amount and adjusting the headroom provided based on the detected change.
The amount of headroom provided may be controlled by using automatic gain control.
The amount of headroom provided may be controlled by the compression used.
The sound energy may be measured as a sum of a beamformed signal.
The means may be for determining, for an audio signal, if sound energy in at least one first direction is higher than sound energy in at least one second direction by at least a threshold amount.
According to various, but not necessarily all, examples of the disclosure there is provided an apparatus comprising at least one processor; and at least one memory including computer program code, the at least one memory and the computer program code configured to, with the at least one processor, cause the apparatus at least to perform:
According to various, but not necessarily all, examples of the disclosure there is provided an electronic device comprising an apparatus as described herein.
According to various, but not necessarily all, examples of the disclosure there is provided a method comprising:
According to various, but not necessarily all, examples of the disclosure there is provided a computer program comprising computer program instructions that, when executed by processing circuitry, cause:
Some examples will now be described with reference to the accompanying drawings in which:
Examples of the disclosure relate to apparatus, methods and computer programs for enabling audio zooming. The audio zooming can enable sounds within a region of interest to be amplified compared to sounds outside of the region of interest. Audio zoom could be used together with a camera zoom. In such examples the region of interest could be the field of view of the camera or a section of the field of view of the camera. In order to enable effective audio zooming the amount of headroom provided in the audio signals can be controlled based on the types of processing that are to be used to implement the audio zooming. The types of processing that are used to implement the audio zooming can be determined by whether or not there are any loud sound sources outside of the region of interest.
The electronic device 101 can be a user electronic device 101. In some examples the electronic device 101 could be a hand-held electronic device 101. In some examples the electronic device 101 could be a communications device. The electronic device 101 could be a mobile telephone, a tablet computer or any other suitable type of electronic device 101.
The processor 103 and the memory 105 can provide an apparatus such as a controller apparatus. An example processor 103 and memory 105 are shown in more detail in
The electronic device 101 comprises two or more microphones 107. The microphones 107 can comprise any means that can be configured to capture sound and enable a microphone audio signal to be provided. The microphones 107 can comprise omnidirectional microphones. The microphone audio signals comprise an electrical signal that represents at least some of the sound field captured by the microphones 107.
In the example shown in
The microphones 107 are coupled to the processor 103 and the memory 105 so that the microphone audio signals are provided to the processor 103 for processing. In the example of
The processing performed by the processor 103 can comprise enabling audio zooming, locating sound sources and/or any other suitable processing. The processing could comprise methods as shown in any of
The camera 115 can comprise any means that can enable images to be captured. The images could comprise video images, still images or any other suitable type of images. The images that are captured by the camera 115 can accompany the microphone audio signals from the two or more microphones 107. The camera 115 can be controlled by the processor 103 to enable images to be captured.
In some examples of the disclosure the electronic device 101 can be used to capture audio signals to accompany images captured by the camera 115. In such examples if a user zooms in on the camera 115 or on images captured by the camera this could also cause audio zooming. The audio zooming could amplify the sound sources that are within the images captured by the camera 115. The sound sources that are within the images captured by the camera 115 can be determined based on the field of view of the camera 115, the amount of zoom used by the camera 115 and the locations of the one or more sound sources. The effective amplification of the sound sources within the images captured by the camera can be achieved by amplifying the wanted sound sources and/or by attenuating unwanted sound sources.
In the example shown in
The apparatus 201 illustrated in
In some examples the apparatus 201 can be implemented using instructions that enable hardware functionality, for example, by using executable instructions of a computer program 203 in a general-purpose or special-purpose processor 103 that can be stored on a computer readable storage medium (disk, memory etc.) to be executed by such a processor 103.
The processor 103 is configured to read from and write to the memory 105. The processor 103 can also comprise an output interface via which data and/or commands are output by the processor 103 and an input interface via which data and/or commands are input to the processor 103.
The memory 105 is configured to store a computer program 203 comprising computer program instructions (computer program code 205) that controls the operation of the apparatus 201 when loaded into the processor 103. The computer program instructions, of the computer program 203, provide the logic and routines that enable the apparatus 201 to perform the methods illustrated in
The apparatus 201 therefore comprises: at least one processor 103; and at least one memory 105 including computer program code 205, the at least one memory 105 and the computer program code 205 configured to, with the at least one processor 103, cause the apparatus 201 at least to perform:
As illustrated in
The computer program 203 comprises computer program instructions for causing an apparatus 201 to perform at least the following:
The computer program instructions can be comprised in a computer program 203, a non-transitory computer readable medium, a computer program product, a machine readable medium. In some but not necessarily all examples, the computer program instructions can be distributed over more than one computer program 203.
Although the memory 105 is illustrated as a single component/circuitry it can be implemented as one or more separate components/circuitry some or all of which can be integrated/removable and/or can provide permanent/semi-permanent/dynamic/cached storage.
Although the processor 103 is illustrated as a single component/circuitry it can be implemented as one or more separate components/circuitry some or all of which can be integrated/removable. The processor 103 can be a single core or multi-core processor.
References to “computer-readable storage medium”, “computer program product”, “tangibly embodied computer program” etc. or a “controller”, “computer”, “processor” etc. should be understood to encompass not only computers having different architectures such as single/multi-processor architectures and sequential (Von Neumann)/parallel architectures but also specialized circuits such as field-programmable gate arrays (FPGA), application specific circuits (ASIC), signal processing devices and other processing circuitry. References to computer program, instructions, code etc. should be understood to encompass software for a programmable processor or firmware such as, for example, the programmable content of a hardware device whether instructions for a processor, or configuration settings for a fixed-function device, gate array or programmable logic device etc.
As used in this application, the term “circuitry” can refer to one or more or all of the following:
(a) hardware-only circuitry implementations (such as implementations in only analog and/or digital circuitry) and
(b) combinations of hardware circuits and software, such as (as applicable):
(i) a combination of analog and/or digital hardware circuit(s) with software/firmware and
(ii) any portions of hardware processor(s) with software (including digital signal processor(s)), software, and memory(ies) that work together to cause an apparatus, such as a mobile phone or server, to perform various functions and
(c) hardware circuit(s) and or processor(s), such as a microprocessor(s) or a portion of a microprocessor(s), that requires software (e.g. firmware) for operation, but the software might not be present when it is not needed for operation.
This definition of circuitry applies to all uses of this term in this application, including in any claims. As a further example, as used in this application, the term circuitry also covers an implementation of merely a hardware circuit or processor and its (or their) accompanying software and/or firmware. The term circuitry also covers, for example and if applicable to the particular claim element, a baseband integrated circuit for a mobile device or a similar integrated circuit in a server, a cellular network device, or other computing or network device.
The blocks illustrated in
At block 301 the method comprises determining, for an audio signal, if sound energy in at least one first direction is different to the sound energy in at least one second direction by at least a threshold amount. For example, the method can comprise determining if sound energy in at least one first direction is higher than the sound energy in at least one second direction by at least a threshold amount.
The first direction and the second direction can be selected so that the first direction is within a region of interest and the second direction is outside of the region of interest. For example, the first direction could be within the field of view of a camera and the second direction could be outside of the field of view of camera. The sound sources in the first direction could therefore be wanted sound sources that a user might want to listen to. For example, sound sources in the first direction could correspond to images captured by the camera 115. The sound sources in the second direction could be unwanted sound sources that user might not want to listen to. For example, these could comprise sound sources that are not in the field of view of the camera 115.
The first direction and the second direction can change depending upon the orientation of the camera 115, the level of zoom used by the camera 115 and/or any other suitable factor.
At block 303 the method comprises controlling an amount of headroom provided based on whether or not sound energy in at least one first direction is different to the sound energy in at least one second direction by at least a threshold amount. For example, the amount of headroom provided can be controlled based on whether or not sound energy in at least one first direction is higher than the sound energy in at least one second direction by at least a threshold amount. Any suitable means can be used to measure the sound energy in the respective directions. In some examples the sound energy can be measured as a sum of a beamformed signal.
The amount of headroom provided can be controlled so as to enable audio zooming. The amount of headroom provided can be controlled so as to enable audio zooming while maximizing, or substantially maximising, signal levels.
If sound energy in the first direction is significantly higher than the sound energy in the second direction this indicates that the loudest sounds are wanted sounds. For example, the loudest sounds could come from sound sources that are within the field of view of the camera 115.
If sound energy in the first direction is not significantly higher than the sound energy in the second direction this indicates that at least some of the loudest sounds could be unwanted sounds. For example, there could be some loud sound sources that are not located within the field of view of the camera 115.
If the loudest sounds are wanted sounds then the audio zooming can be implemented by using amplification or other suitable processes. In order to allow for the amplification sufficient headroom has to be provided within the signal. Therefore, if the loudest sounds are wanted sounds then the amount of headroom can be controlled so that a large amount of headroom is provided. The large amount of headroom is large enough so as to enable amplification of the audio signal if audio zooming is selected. In some examples the headroom could be around 12 dB. This amount of headroom can enable a clear change in the audio when the user selects audio zooming. This enables a user to clearly perceive that audio zooming has been used.
If the loudest sounds are unwanted sounds then the audio zooming can be implemented by using attenuation of the unwanted sound sources or other suitable processes. The attenuation will not use headroom and so, if the loudest sounds are unwanted sounds then the headroom can be controlled so that a small amount of headroom is provided. The small amount of headroom might not be large enough to enable amplification of the audio signals when audio zooming is selected, however this could maximise, or substantially maximise, signal levels. For example, the small amount of headroom could be much less than 12 dB. Using the small amount of headroom can maximise the loudness of the audio signal.
In some examples the apparatus 201 can be configured to detect a change in whether or not the sound energy in the at least one first direction is higher than sound energy in the least one second direction by at least the threshold amount. For example, the apparatus 201 could detect if one or more of the sound sources has moved, or if the loudness of any of the sound sources has changed or any other suitable factor.
If a change in whether or not the sound energy in the at least one first direction is higher than sound energy in the least one second direction by at least the threshold amount is detected then the apparatus 201 can be configured to adjust the headroom provided based on the detected change. For example, if it is detected that the sound sources have changed so that the loudest sound source is now an unwanted sound source then the headroom can be decreased. Conversely if it is detected that the sound sources have changed so that the loudest sound source is now a wanted sound source then the headroom can be increased.
Any suitable means can be used to control the amount of headroom provided. In some examples the amount of headroom provided can be controlled by using automatic gain control. In some examples the amount of headroom provided can be controlled by using different types of compression.
The method comprises, at block 401, analysing a sound signal to determine if sound energies in a first direction are larger than sound energies in a second direction. The first direction can comprise a region of interest and the second direction can comprise one or more directions outside of the region of interest. At block 401 it can be determined if the sound energies in the first direction are larger than the sound energies in the second direction by at least a threshold amount. The threshold amount can be determined by the processing that is to be used for the audio zooming or any other suitable factor.
If the sound energies in the first direction are larger than the sound energies in the second direction by at least the threshold amount then this indicates that the sound sources in the region of interest are the dominant sound sources. If this is the case then, at block 403 the method comprises controlling the amount of headroom provided in the audio file so as to leave a lot of headroom.
Leaving a lot of headroom can comprise leaving sufficient headroom to enable implementing audio zooming by using amplification. In some examples the headroom could be around 12 dB.
Any suitable means can be used to control the amount of headroom that is provided. The amount of headroom provided can be controlled by controlling an algorithm such as automatic gain control and/or by using appropriate compression and/or by using any other suitable means.
At block 405 it is determined whether or not audio zoom is selected. A user of the electronic device 101 could select audio zoom by making an input using a user interface of the electronic device 101 or by any other suitable means. For instance, a user could be zooming images captured by the camera 115 which could also cause audio zooming.
If audio zoom is selected then, at block 407, the audio zoom is implemented using a process that comprises amplification. The process can comprise amplification of the wanted sound sources. This amplification can make use of the headroom that is provided within the audio file.
If the sound energies in the first direction are not larger than the sound energies in the second direction by at least the threshold amount then this indicates that the sound sources in the region of interest are not the dominant sound sources. For instance, there could be some loud sound sources that are not in the region of interest or there could be a lot of background noise. If this is the case then, at block 409 the method comprises controlling the amount of headroom provided to leave little headroom in the audio file.
Leaving little headroom can comprise leaving insufficient headroom to enable implementing audio zooming by using amplification. Leaving little headroom can comprise leaving much less headroom compared to the cases when a lot of headroom is left. For, example the headroom provided could be much less than 12 dB.
Any suitable means can be used to control the amount of headroom that is provided. The amount of headroom provided can be controlled by controlling an algorithm such as automatic gain control and/or by using appropriate compression and/or by using any other suitable means.
At block 411 it is determined whether or not audio zoom is selected. As described above a user of the electronic device 101 could select audio zoom by making an input using a user interface of the electronic device 101 or by any other suitable means. For instance, a user could be zooming images captured by the camera 115 which could also cause audio zooming.
If audio zoom is selected then, at block 413, the audio zoom is implemented using attenuation. The attenuation does not need to make use of any headroom. The attenuation could comprise attenuating the unwanted sound sources. The attenuation could comprise attenuating the sound sources that are in the second direction.
Once the audio zoom has been implemented the process returns, or if it is determined that audio zoom has not been selected then the method returns to block 401 and the audio signals are analysed to determine, for a different time period, whether or not the sound energies are louder in the first direction than the second direction. This can enable changes in the sound sources to be detected.
In examples of the disclosure, such as the method shown in
In the example of
The plurality of microphones 107 provide audio signals to an audio gain control (ACG) module 501 and also to a sound source location module 503.
The sound source location module 503 can be configured to determine the location of one or more sound sources. The sound source location 503 module can determine whether sound sources are within a region of interest or outside of a region of interest. For example, the sound source location module can determine whether or not a sound source is within a field of view of a camera 115 or outside of a field of view of a camera 115.
The sound source location module 503 can also be configured to determine the relative sound energies of the different sound sources and determine whether or not sound sources within the region of interest are significantly louder than sound sources outside of the region of interest. This provides an indication as to whether or not the dominant sound sources are wanted sound sources or unwanted sound sources.
The sound source location module 503 can also be configured to determine the amount of headroom that is to be provided. For instance, if it is determined that wanted sound sources are the dominant sound sources then a large amount of headroom can be provided. If it is determined that unwanted sound sources are the dominant sound sources then a small amount of headroom can be provided. The sound source location module 503 provides a control signal to the AGC module 501 indicating the amount of headroom that is to be provided within the audio file.
The ACG module 501 is configured to receive the audio signals from the microphones 107 and the input signal from the sound source location module 503 indicating the amount of headroom that is to be provided.
The ACG module 501 can be configured to control the level of the audio signals from the microphones 107. The ACG module 501 can control the level of the audio signals so that they are set at a level which is comfortable for a user to listen to. The ACG module 501 can use the input signal from the sound source location module 503 to control the amount of headroom that is provided.
The signals from the ACG module 501 are provided to a spatial audio processing module 505. The spatial audio processing module can process the audio signals to provide spatial audio output. The spatial audio output can comprise an output so that a user can perceive special effects of the audio when the spatial audio output is rendered and played back to a user.
Any suitable process can be used to generate the spatial audio output. The process for generating the spatial audio output can also comprise an audio zoom module 507 that can be configured to enable audio zooming. The audio zoom module 507 can indicate whether the audio zooming can be implemented by amplification of the wanted sound sources or by attenuation of the unwanted sound sources or by any other suitable process.
Once the spatial audio has been generated an output audio signal 509 is provided. The output audio signal 509 comprises the spatial audio signals. The headroom provided in the audio file comprising the output audio signal 509 is provided based on whether or not the dominant sound sources are wanted sound sources or unwanted sound sources and the processes used to implement the audio zooming.
In the example of
In
The second sound source 605 is positioned outside of the region of interest 601. The second sound source 605 can therefore be an unwanted sound source. In this example the second sound source 605 is positioned toward the rear of the electronic device 101. The second sound source 605 is provided on the opposite side of the electronic device 101 to the first sound source 603 and the region of interest 601.
In the example of
In some examples the beamformer patterns 607. 609, 611. 613 can be used to determine the sound energy within a given direction and so provide an estimate of the locations of the sound sources 603. The sound energy in a given direction can be measured by summing the energy of a beamformed signal where the look direction of the beamformer corresponds to the direction. Other methods for estimating the sound energy in a given direction can be used in other examples of the disclosure. For example, direction of arrival analysis of the sound signals or any other suitable processes can be used.
The different beamformer patterns 607. 609, 611. 613 can be used to amplify or attenuate the sound sources 603, 605 as appropriate. For example, different gains can be applied to the different beamformer patterns 607. 609, 611. 613 based on the look directions of the beamformer patterns 607. 609, 611. 613 and the positions of the wanted and unwanted sound sources 603, 605.
In the example of
In this example only a small amount of headroom is provided because the audio zooming can be implemented using attenuation of the unwanted sound source 605. This maximizes, or substantially maximizes, the loudness of the audio signal 701.
In this example a large amount of headroom is provided because the audio zooming can be implemented using amplification of the wanted sound source 603. Therefore, the audio file needs to comprise sufficient headroom to enable the amplification.
In the example of
In this example a large amount of headroom is provided because the audio zooming can be implemented using amplification of the wanted sound source 603. Therefore, the audio file needs to comprise sufficient headroom to enable the amplification.
Variations to the above described examples can be used in implementations of the disclosure. For instance, in some examples processes other than ACG can be used to control the loudness of the audio signals and the amount of headroom provided. For instance, in some examples compression of the audio signal can be used to control the loudness of the audio signals and the amount of headroom provided.
The compression can comprise using different compression curves. The compression can be used with a gain factor so that the more compression is sued the more the audio signal can be amplified without clipping. In some examples the compression could comprise multiband compression which could comprise using different compression in different frequency bands.
The compression curve that is used can be dependent upon whether or not audio zooming is selected.
The audio zooming might be more effective in some frequency bands than others. In such examples multiband compression could be used and the compression curve might only be dependent upon whether or not audio zooming is selected for the frequencies that are affected by the audio zooming.
The different compressions curves can be used to control the amount of headroom and may also be used to adjust the amount of headroom that is needed. The different compression curves could be used together with ACG and/or any other suitable processes.
Also in the above described examples the headroom is controlled to provide either a lot of headroom or a small amount of headroom. In some examples the headroom provided could be in between these two extremes. For example, if it is determined that the relative sound energies in a sound environment are changing then the amount of headroom provided could be changed to take this into account. The amount of headroom provided could be changed gradually to avoid a sudden switch between the two extremes. Therefore, for a time period over which the gradual change is taking place, the headroom provided could be in between the maximum and minimum amounts.
Therefore, examples of the disclosure control the amount of headroom provided based on whether dominants sounds are unwanted sounds or wanted sounds. This can enable audio zooming to be used while using the headroom available within the audio file to maximizing, or substantially maximizing, the loudness of the audio signals. The examples of the disclosure reduce audio clipping by ensuring that there is always sufficient headroom available for audio zooming.
The term ‘comprise’ is used in this document with an inclusive not an exclusive meaning. That is any reference to X comprising Y indicates that X may comprise only one Y or may comprise more than one Y. If it is intended to use ‘comprise’ with an exclusive meaning then it will be made clear in the context by referring to “comprising only one . . . ” or by using “consisting”.
In this description, reference has been made to various examples. The description of features or functions in relation to an example indicates that those features or functions are present in that example. The use of the term ‘example’ or ‘for example’ or ‘can’ or ‘may’ in the text denotes, whether explicitly stated or not, that such features or functions are present in at least the described example, whether described as an example or not, and that they can be, but are not necessarily, present in some of or all other examples. Thus ‘example’, ‘for example’, ‘can’ or ‘may’ refers to a particular instance in a class of examples. A property of the instance can be a property of only that instance or a property of the class or a property of a sub-class of the class that includes some but not all of the instances in the class. It is therefore implicitly disclosed that a feature described with reference to one example but not with reference to another example, can where possible be used in that other example as part of a working combination but does not necessarily have to be used in that other example.
Although examples have been described in the preceding paragraphs with reference to various examples, it should be appreciated that modifications to the examples given can be made without departing from the scope of the claims.
Features described in the preceding description may be used in combinations other than the combinations explicitly described above.
Although functions have been described with reference to certain features, those functions may be performable by other features whether described or not.
Although features have been described with reference to certain examples, those features may also be present in other examples whether described or not.
The term ‘a’ or ‘the’ is used in this document with an inclusive not an exclusive meaning. That is any reference to X comprising a/the Y indicates that X may comprise only one Y or may comprise more than one Y unless the context clearly indicates the contrary. If it is intended to use ‘a’ or ‘the’ with an exclusive meaning then it will be made clear in the context. In some circumstances the use of ‘at least one’ or ‘one or more’ may be used to emphasis an inclusive meaning but the absence of these terms should not be taken to infer any exclusive meaning.
The presence of a feature (or combination of features) in a claim is a reference to that feature or (combination of features) itself and also to features that achieve substantially the same technical effect (equivalent features). The equivalent features include, for example, features that are variants and achieve substantially the same result in substantially the same way. The equivalent features include, for example, features that perform substantially the same function, in substantially the same way to achieve substantially the same result.
In this description, reference has been made to various examples using adjectives or adjectival phrases to describe characteristics of the examples. Such a description of a characteristic in relation to an example indicates that the characteristic is present in some examples exactly as described and is present in other examples substantially as described.
Whilst endeavoring in the foregoing specification to draw attention to those features believed to be of importance it should be understood that the Applicant may seek protection via the claims in respect of any patentable feature or combination of features hereinbefore referred to and/or shown in the drawings whether or not emphasis has been placed thereon.
Number | Date | Country | Kind |
---|---|---|---|
2110058.1 | Jul 2021 | GB | national |