The disclosure relates to an active audio adjustment method; particularly, the disclosure relates to an active audio adjustment method and a host.
Open-back headphones and closed-back headphones are two of the most common types of headphones on the market. They differ in the way they seal around the ears, which has a significant impact on their sound quality, comfort, and ability to block out ambient noise.
For the closed-back headphones, active noise cancellation is a technology that uses sound waves to reduce unwanted noise (e.g., ambient noise). Active noise cancellation works by creating a sound wave that is 180 degrees out of phase with the unwanted noise. These two waves cancel each other out, creating a quieter listening environment, creating improving listening experience. However, for open-back headphones, since the ambient sound can pass through the headphones, the active noise cancellation may not be able to create effective sound waves to cancel out the ambient sound.
The disclosure is direct to an active audio adjustment system and an active audio adjustment method, so as to improve listening experience for wearable audio playback devices.
The embodiments of the disclosure provide an active audio adjustment method. The active audio adjustment method includes: receiving, by a host, an ambient sound from a sound pickup device; analyzing, by the host, the ambient sound to obtain an ambient parameter of the ambient sound and determine an adjustment strategy; adjusting, by the host, an original parameter of an output audio to determine an optimized parameter based on the ambient parameter of the ambient sound and the adjustment strategy; generating, by the host, an optimized output audio based on the optimized parameter; and outputting, by the host, the optimized output audio to an audio output device.
The embodiments of the disclosure provide a host. The host includes a storage circuit and a processor. The storage circuit is configured to store a program code. The processor is coupled to the storage circuit and configured to access the program code to execute: receiving an ambient sound from a sound pickup device; analyzing the ambient sound to obtain an ambient parameter of the ambient sound and determine an adjustment strategy; adjusting an original parameter of an output audio to determine an optimized parameter based on the ambient parameter of the ambient sound and the adjustment strategy; generating an optimized output audio based on the optimized parameter; and outputting the optimized output audio to an audio output device.
Based on the above, according to the active audio adjustment method and the host, by generating the output audio based on the optimized parameter, the user may clear hear the output audio in a noisy environment without manually turning up the volume, thereby improving the listening experience for wearable audio playback devices.
To make the aforementioned more comprehensible, several embodiments accompanied with drawings are described in detail as follows.
The accompanying drawings are included to provide a further understanding of the disclosure, and are incorporated in and constitute a part of this specification. The drawings illustrate exemplary embodiments of the disclosure and, together with the description, serve to explain the principles of the disclosure.
Reference will now be made in detail to the present preferred embodiments of the invention, examples of which are illustrated in the accompanying drawings. Wherever possible, the same reference numbers are used in the drawings and the description to refer to the same or like parts.
In order to bring an immersive experience to user, technologies related to extended reality (XR), such as augmented reality (AR), virtual reality (VR), and mixed reality (MR) are constantly being developed. AR technology allows a user to bring virtual elements to the real world. VR technology allows a user to enter a whole new virtual world to experience a different life. MR technology merges the real world and the virtual world. Further, to bring a fully immersive experience to the user, visual content, audio content, or contents of other senses may be provided through one or more devices.
Open-back headphones or open-back sound devices are often used to provide audio content to the user. Open-back headphones are a type of headphone that allows ambient sound to pass through. Open-back headphones often have a more natural and spacious soundstage than closed-back headphones. This is because they do not block out ambient sound, which can give the music a more realistic and immersive feel. Further, open-back headphones may be more comfortable to wear for extended periods of time than closed-back headphones. This is because they do not create a seal around the ears, which can lead to pressure buildup and fatigue.
However, open-back headphones may not be suitable for active noise cancellation, since the ambient sound can pass through the open-back headphones. That is, in noisy environments, users may need to turn up the volume of open-back headphones to hear the sound inside the headphones clearly. It is worth mentioned that, manually adjusting the volume may be inconvenient and time-consuming. In addition, a loud volume may be harmful to hearing and lead to missing important sounds in the environment. Therefore, it is the pursuit of people skilled in the art to provide an improved listening experience for wearable audio playback devices.
In
The processor 104 may be coupled with the storage circuit 102, and the processor 104 may be, for example, a general purpose processor, a special purpose processor, a conventional processor, a digital signal processor (DSP), a plurality of microprocessors, one or more microprocessors in association with a DSP core, a controller, a microcontroller, Application Specific Integrated Circuits (ASICs), Field Programmable Gate Array (FPGAs) circuits, any other type of integrated circuit (IC), a state machine, and the like.
In some embodiments, the host 100 may further include a sound pickup device 106 or the host 100 may be coupled to the sound pickup device 106. The sound pickup device 106 may be a microphone, a sonar, other similar devices, or a combination of these devices.
In some embodiments, the host 100 may further include an audio output device 108 or the host 100 may be coupled to the audio output device 108. The audio output device 108 may be a audio playback device, an open-back sound device, an open-back headphone, a speaker, a megaphone, other similar devices, or a combination of these devices. That is, the audio output device 108 may allow ambient sound to pass through. However, this disclosure is not limited thereto.
In some embodiments, the host 100 may further include a communication circuit and the communication circuit may include, for example, a wired network module, a wireless network module, a Bluetooth module, an infrared module, a radio frequency identification (RFID) module, a Zigbee network module, or a near field communication (NFC) network module, but the disclosure is not limited thereto. That is, the host may communicate with external device(s) (such as a microphone, a speaker, or the like) through either wired communication or wireless communication.
In the embodiments of the disclosure, the processor 104 may access the modules and/or the program code stored in the storage circuit 102 to implement the active audio adjustment method provided in the disclosure, which would be further discussed in the following.
In a step S210, an ambient sound may be obtained by the sound pickup device 106 and the ambient sound may be provided to the processor 104. The ambient sound may include various sounds around the user, because the sound pickup device 106 (e.g., included in the host 100) may be close to a user or may be worn by the user.
In one embodiment, the ambient sound may include ambient noise (e.g., machine noise, traffic noise, sound of chatter, or the like), an important sound event (e.g., siren, warning sound, sound of ambulance, shout, yelling, or the like), or other sounds. The ambient sound may pass through the audio output device 108 (e.g., the open-back headphone) and may be heard by the user. Meanwhile, the host 100 may output audio signals through the audio output device 108 and these audio signals may be referred to as “output audio” or “device output” as shown in
It is noteworthy that, the ambient noise may make it difficult for the user to hear other sounds (e.g., the important sound event or the output audio). In one embodiment, as shown in the original frequency spectrum 310A of
In a step S220, the ambient sound may be analyzed to obtain an ambient parameter of the ambient sound and determine an adjustment strategy. The ambient parameter may include an ambient frequency of the ambient sound and/or an ambient energy level (i.e., the volume, shown as “sound pressure level” on the figure) of the ambient sound. The adjustment strategy may be used to determine an optimized parameter of an optimized output audio and/or an optimized important sound parameter of optimized an important sound event.
In one embodiment, the ambient sound may include a plurality of sounds (e.g., the ambient noise and/or the important sound event) and the ambient sound may be further analyzed to categorize (classify) the plurality of sounds in the ambient sound. For example, each of the plurality of sounds in the ambient sound may be categorized as either the ambient noise or the important sound event. The categorizing may be performed based on a sound database or a pre-trained model, but is not limited thereto. Further, during the analysis of the ambient sound, each of the plurality of sounds may be analyzed to find out its own parameter. For example, the ambient parameter may include a noise parameter and/or an important sound parameter. The noise parameter may include a noise frequency and/or a noise energy level of the ambient noise. The important sound parameter may include an important sound frequency and/or an important sound energy level of the important sound event. However, this disclosure is not limited thereto.
In a step S230, an original parameter of an output audio may be adjusted to determine an optimized parameter based on the ambient parameter of the ambient sound and the adjustment strategy. The output audio may be originally designed to be played with the original parameter. Due to the influence of the ambient sound, an optimized output audio may be generated and the optimized output audio may be played with the optimized parameter. For example the optimized parameter of the optimized output audio may be determined based on the masking effect of ambient sound utilizing the pre-trained psychoacoustics model.
In one embodiment, as shown in the original frequency spectrum 310A of
Reference is now made to the optimized frequency spectrum 320A of
In another embodiment, to enhance an auditory intelligibility of the second peak of the output audio, a frequency of the second peak may be shifted to separate the second peak from the masking threshold (masking range). This kind of adjustment strategy may be referred to as “pitch shift optimization” or “frequency modulation”, but is not limited thereto. That is, the original parameter of the output audio and the optimized parameter of the optimized output audio may include, respectively, an original frequency of the second peak and an optimized frequency of the second peak. In one embodiment, the masking threshold (masking range) of the ambient sound (e.g., the ambient noise and/or the important sound event) may be determined based on a masking effect of the ambient sound. The optimized parameter of the optimized output audio may be determined based on an overlapping frequency band of the masking threshold (masking range) and an original frequency band of the output audio. To put it briefly, the original frequency of the output audio may be adjusted to determine the optimized frequency of the optimized output audio based on an ambient frequency of the ambient sound (e.g., the noise frequency or the masking range of the ambient noise and/or the important sound frequency or the optimized important sound frequency), wherein a frequency difference between the optimized frequency and the ambient frequency is greater than a threshold value. However, this disclosure is not limited thereto.
In addition, to further enhance the auditory intelligibility of the second peak, an energy level of the second peak may be amplified at the same time. That is, the original parameter of the output audio and the optimized parameter of the optimized output audio may further include, respectively, an original energy level of the second peak and an optimized energy level of the second peak. To put it briefly, the original energy level and the original frequency of the output audio may be adjusted, respectively, to determine the optimized energy level and the optimized frequency of the optimized output audio based on the ambient sound. However, this disclosure is not limited thereto.
In yet another embodiment, since part of the second peak overlaps with the important sound event, the important sound event may also hinder the user's comprehension of the output audio. Further, a masking effect of the important sound event may also occur. For ease of illustration, a masking threshold (masking range) of the important sound event is not depicted on the figure. That is, the optimized parameter of the optimized output audio may be determined based on the noise parameter of the ambient noise and/or the important sound parameter of the important sound event. In other words, the whole ambient sound (including the ambient noise and/or the important sound event) may be utilized to enhance an auditory intelligibility of the output audio. However, this disclosure is not limited thereto.
In a step S240, the optimized output audio may be generated based on the optimized parameter, which is shown in optimized frequency spectrum 320A of
In a step S250, the optimized output audio may be outputted to the audio output device 108. That is, instead of the raw output audio, the user may experience the optimized output audio. Therefore, the active audio adjustment method 200 may deliver a demonstrably improved soundscape for the wearable audio playback devices, thereby increasing the user experience.
Reference is now made back to original frequency spectrum 310A of
Reference is now made to the optimized frequency spectrum 320A of
Reference is first made to the original frequency spectrum 310B of
Reference is first made to the optimized frequency spectrum 320B of
In addition, when there is no important sound event detected in the ambient sound, only the output audio may be optimized. That is, the audio output device 108 may be configured to output optimized output audio only. Alternatively, when there is an important sound event detected in the ambient sound, both the output audio and the important sound event may be optimized. That is, the audio output device 108 may be configured to output the optimized output audio and the optimized important sound event at the same time.
It is noteworthy that, reference is now made to
In one embodiment, during the analysis of the ambient sound, the processor 104 of the host 100 may be configured to determine a direction and a distance of the important sound event 499 relative to the user utilizing a well-known technology (e.g., time difference of arriving, beam forming, machine learning model, or the like). For example, a distance from the user to the important sound event 499 may be determined. In addition, an elevation angle and an azimuth angle from the user to the important sound event 499 may be determined.
Next, the processor 104 may be configured to generate the optimized important sound event based on the direction and the distance utilizing a spatial audio effect algorithm. That is, while the user hears the optimized important sound event from the audio output device 108, the user may be able to know the direction and the distance of the important sound event 499. In one embodiment, a left head related transfer function (HRTF) 402 and a right HRTF 404 may be utilized (e.g., by convolution) to generate the optimized important sound event. Further, the details of a process of generating the optimized important sound event will be described below with the components shown in
In a step S410, a time-frequency analysis may be performed to analyze a change of frequency distribution of the important sound event 499 over time. It is worth mentioned that, a traditional Fourier transform (e.g., Short Time Fourier Transform, STFT) can only obtain the frequency distribution of a signal at a specific time point, while time-frequency analysis can obtain the frequency distribution of a signal at different time points. In a step S420, based on a result of the time-frequency analysis, an audio optimization may be performed to generate an optimized important sound event (e.g., the optimized important sound event as depicted in the optimized frequency spectrum 320A or the optimized frequency spectrum 320B). In one embodiment, the optimized important sound event may be generated by optimizing the important sound event 499 based on the output audio and the ambient noise. However, this disclosure is not limited thereto.
In a step S430, a sound location analysis may be performed to determine a spatial origin of the important sound event 499 within the environment. In one embodiment, a direction and a distance of the important sound event 499 relative to the user may be determined. In a step S440 and a step S450, a left HRTF and a right HRTF corresponding to a head of the user may be generated to reconstruct a spatial dimension of the important sound event 499 respectively for the left ear and right ear. The left HRTF and the right HRTF may be generated based on a HRTF database 406. However, this disclosure is not limited thereto.
In a step S460 and a step S470, the optimized important sound event with the reconstructed spatial dimension may be output respectively through a left speaker and a right speaker. In this manner, the user may clearly hear the optimized important sound event under the influence of the ambient noise and the output audio. Further, the user may be able to know the direction and the distance of the important sound event 499 through the optimized important sound event, thereby enhancing the safety.
In a step S510, the ambient sound around the user may be recorded through a microphone of an AR device (e.g., HMD device). In a step S520, the sounds in the ambient sound may be classified (categorized) and separated from each other. In one embodiment, the sounds in ambient sound may be classified as either the ambient noise 502 or the important sound event 504. In a step S530, a sound location analysis may be performed to determine a spatial origin of the important sound event 504 within the environment. Further, the HRTF corresponding to the head of the user may be calculated.
In a step S540, a step S550, and a step S560, a time-frequency analysis may be respectively performed on the ambient noise 502, the important sound event 504, and the output audio 506. Next, in a step S570, based on the analysis result, the important sound event 504 and the output audio 506 may be optimized to generated the optimized important sound event and the optimized output audio. In a step S580, the spatial audio effect may be applied to the optimized important sound event based on the calculated HRTF. In a step S590, the optimized important sound event and the optimized output audio may be output through a speaker of the HMD device.
In addition, the implementation details of the active audio adjustment method 500 may be referred to the descriptions of
In summary, according to the host 100 and the active audio adjustment method 200, since the output audio and/or the important sound event are optimized, the user may still hear the optimized output audio and/or the optimized important sound event clearly in a noisy or complicated environment, thereby enhancing both immersion and safety.
It will be apparent to those skilled in the art that various modifications and variations can be made to the structure of the present invention without departing from the scope or spirit of the disclosure. In view of the foregoing, it is intended that the present disclosure cover modifications and variations of this invention provided they fall within the scope of the following claims and their equivalents.
This application claims the priority benefit of U.S. provisional application Ser. No. 63/449,602, filed on Mar. 3, 2023. The entirety of the above-mentioned patent application is hereby incorporated by reference herein and made a part of this specification.
Number | Date | Country | |
---|---|---|---|
63449602 | Mar 2023 | US |