An aspect of the disclosure relates to a system that adapts volume to changes in the environment based on historical user context and user behavior. Other aspects are also described.
Headphones are an audio device that includes a pair of speakers, each of which is placed on top of a user's ear when the headphones are worn on or around the user's head. Similar to headphones, earphones (or in-ear headphones) are two separate audio devices, each having a speaker that is inserted into the user's ear. Both headphones and earphones are normally wired to a separate playback device, such as an MP3 player, that drives each of the speakers of the devices with an audio signal in order to produce sound (e.g., music). Headphones and earphones provide a convenient method by which the user can individually listen to audio content without having to broadcast the audio content to others who are nearby.
According to an aspect of the disclosure, a method includes: receiving audio content; playing back the audio content through a speaker at a first volume setting of a volume control, where the volume control includes a plurality of sequential volume settings; determining a noise level within an ambient environment captured by a microphone; receiving a single adjustment to the volume control to change the first volume setting by one volume setting; responsive to receiving the single adjustment, determining a second volume setting based on at least one of the noise level, a content level of the audio content, and historical data indicating past adjustments to the volume control, where the second volume setting is either greater than or less than the first volume setting by more than one volume setting; and changing the first volume setting of the volume control to the second volume setting.
In one aspect, determining the second volume setting includes using at least one or the noise level, the content level, and historical behavior data as input into a machine learning model that produces the second volume setting as output. In another aspect, the volume control includes a volume up and a volume down, where receiving the single adjustment includes receiving one user selection of either the volume up or the volume down. In some aspects, the second volume setting is determined responsive to receiving the one selection but is not determined based on which of the volume up or the volume down is selected.
In one aspect, each pair of adjacent volume settings is separated by a first value, where the single adjustment is to increase the first volume setting by the first value and the second volume setting is greater than the first volume setting by a second value that is different than the first value. In another aspect, the second value is greater than the first value, where the method further includes: receiving another single adjustment to the volume control to decrease the second volume setting by the first value; and determining, based at least in part on the historical data and the second volume setting, a third volume setting that is less than the second volume setting by a third value that is less than the second value. In another aspect, the method further includes determining that the noise level within the ambient environment has not increased above a threshold since a previous adjustment to the volume control has been received, where responsive to determining that the noise level has not increased, the third value is determined to be less than the first value. In one aspect, the method further includes determining whether the noise level within the ambient environment has increased above a threshold since a previous adjustment to the volume control has been received, where the second volume setting is determined responsive to determining that the noise level has increased above the threshold and responsive to receiving the single adjustment.
According to another aspect of the disclosure, an electronic device including: a speaker; at least one processor; a volume control; and memory having instructions stored therein which when executed by the processor causes the electronic device to: drive the speaker with an audio signal at a volume setting of the volume control, where the volume control includes a series of incremental volume settings, each volume setting associated with a different volume level of the electronic device, determine a noise level within an ambient environment in which the electronic device is located, receive an adjustment to the volume control to increase the volume setting, determine a new volume setting based on the noise level and historical behavior data indicating past adjustments to the volume control, where the new volume setting is higher than the volume setting in the series by at least two volume settings, and responsive to determining the new volume setting, drive the speaker with the audio signal at the new volume setting.
In one aspect, the electronic device is a headset. In another aspect, the electronic device does not include a touch-sensitive display screen that is arranged to display a user interface. In another aspect, the memory has further instructions to retrieve, from the memory, the historical behavior that indicates past volume settings of the volume control with respect to one or more contexts in which a user used the electronic device. In some aspects, the instructions to determine the new volume setting includes instructions to: determine a context in which the user is currently using the electronic device; and produce the new volume setting as output of a machine learning model in response to input that is based on the historical behavior that indicates past volume settings of the volume control with respect to the context and the noise level. In another aspect, the memory has further instructions to receive sensor data from one or more sensors of the electronic device; and determine one or more characteristics of the electronic device based on the sensor data and the noise level, where the context is determined based on the one or more characteristics.
In one aspect, the adjustment is a first adjustment and the new volume setting is a first new volume setting, where the memory has further instructions to: receive a second adjustment to the volume control to decrease the first new volume setting; determine whether a noise of the ambient environment that is captured by a microphone includes speech or an ambient sound; responsive to determining that the noise includes the ambient sound, determine a second new volume setting that is less than the first new volume setting; and responsive to determining that the noise includes speech, determine a third new volume setting that is less than the first and second new volume settings.
According to another aspect of the disclosure, a non-transitory machine-readable medium having instructions which when executed by at least one processor of a headset, causes the headset to: receive audio content; play back the audio content through a speaker at a first volume setting of a volume control, where the volume control includes a plurality of sequential volume settings; determine a noise level within an ambient environment captured by a microphone; receive a single adjustment to the volume control to change the first volume setting by one volume setting; responsive to receiving the single adjustment, determine a second volume setting based on at least one of the noise level, a content level of the audio content, and historical data indicating past adjustments to the volume control, where the second volume setting is either greater than or less than the first volume setting by more than one volume setting; and change the first volume setting of the volume control to the second volume setting.
In one aspect, the instructions to determine the second volume setting includes using at least one of the noise level, the content level, and historical behavior data as input into a machine learning model that produces the second volume setting as output. In another aspect, each pair of adjacent volume settings in the plurality of sequential volume settings is separated by a first value, where the single adjustment is to increase the first volume setting by the first value and the second volume setting is greater than the first volume setting by a second value that is different than the first value.
In one aspect, the second value is greater than the first value, where the non-transitory machine-readable medium includes further instructions to: receive another single adjustment to the volume control to decrease the second volume setting by the first value; and determine, based at least in part on the historical data and the second volume setting, a third volume setting that is less than the second volume setting by a third value that is less than the second value. In another aspect, the non-transitory machine-readable medium includes further instructions to determine that the noise level within the ambient environment has not increased above a threshold since a previous adjustment to the volume control has been received, where responsive to determining that the noise level has not increased, the third value is determined to be less than the first value. In another aspect, the non-transitory machine-readable medium includes further instructions to determine whether the noise level within the ambient environment has increased above a threshold since a previous adjustment to the volume control has been received, where the second volume setting is determined responsive to determining that the noise level has increased above the threshold and responsive to receiving the single adjustment.
The above summary does not include an exhaustive list of all aspects of the disclosure. It is contemplated that the disclosure includes all systems and methods that can be practiced from all suitable combinations of the various aspects summarized above, as well as those disclosed in the Detailed Description below and particularly pointed out in the claims. Such combinations may have particular advantages not specifically recited in the above summary.
The aspects are illustrated by way of example and not by way of limitation in the figures of the accompanying drawings in which like references indicate similar elements. It should be noted that references to “an” or “one” aspect of this disclosure are not necessarily to the same aspect, and they mean at least one. Also, in the interest of conciseness and reducing the total number of figures, a given figure may be used to illustrate the features of more than one aspect, and not all elements in the figure may be required for a given aspect.
Several aspects of the disclosure with reference to the appended drawings are now explained. Whenever the shapes, relative positions and other aspects of the parts described in a given aspect are not explicitly defined, the scope of the disclosure here is not limited only to the parts shown, which are meant merely for the purpose of illustration. Also, while numerous details are set forth, it is understood that some aspects may be practiced without these details. In other instances, well-known circuits, structures, and techniques have not been shown in detail so as not to obscure the understanding of this description. Furthermore, unless the meaning is clearly to the contrary, all ranges set forth herein are deemed to be inclusive of each range's endpoints.
Electronic headsets have become increasingly popular with users because they are capable of reproducing media content such as music, podcasts, and movie sound tracks with high fidelity while at the same time not disturbing others who are nearby. Such headsets may also allow users the ability to move about an environment or between different environments while continuing to hear audio that is reproduced through speakers of the headset. For example, a runner may listen to a podcast while navigating through a running trail. As another example, a commuter who rides a train to work may wear a wireless headset while riding the train to and from work.
As a wearer of a headset that is reproducing audio content moves throughout an environment, noise levels within the environment may change. Returning to the previous examples, as the runner navigates the running trail there may be times at which there is little noise, such as portions of the trail that pass through a forested area, whereas there may be other portions of the trail that are close to a highway that have a considerable amount of highway noise. With the commuter example, a noise level of the train may be high, due to the movement of the train, which may contrast with less noise around the train station platform where the user boarded the train. In either case, the fluctuations in noise level may cause the user to periodically change the volume level of the sound output. For example, the runner may wish to change the volume between different environments in order to maintain user awareness within the environments, such as turning down the volume in quiet areas. The commuter, on the other hand, may want a higher volume level while on the train in order to prevent the background noise from interfering with or masking the reproduced audio content that is perceived by the user.
Between these scenarios, both users are required to manually change the volume to their desired levels. This has many disadvantages that may reduce the overall user experience. First, most wireless headsets stream audio from a user companion device, such as a smart phone or a desktop computer. In the case of a smartphone, each time the user wishes to change the volume, the user must acquire the smartphone (e.g., pulling the smart phone from a pocket), and adjust a volume control of the smartphone. Having to interact with the smartphone to manually change the volume each time the noise level increases or decreases may be cumbersome and time consuming. In addition, although the user may change the volume, the user may not adjust the volume setting to a user-preferred setting. For example, when there is a lot of noise, the user may turn up the volume, which may result in the volume setting being too loud. As a result, the user may then again have to adjust the volume downward in order to fine-tune the output. Again, such actions may be cumbersome and may retract from the overall user experience. Therefore, there is a need for a system that performs adaptive volume control based on the environment and based on historical user context and/or user behavior.
To overcome these deficiencies, the present disclosure describes a contextual adaptive volume (or auto-volume) system that is capable of automatically (e.g., without user intervention) adapting volume control based on environmental conditions (or characteristics), such as noise level, and based on learned user preferences. Returning to the commuter example, the system may be configured to automatically adapt volume control while the user is on the train based on historical data that indicates that the user tends to have the volume louder while on the train as opposed to while at the train station. In particular, the system performs at least some operations automatically by not requiring user intervention, such as requiring that the user adjust a volume control on a companion device or on the headset itself. As a result, the system is capable of providing the user with an enjoyable user experience between different environments with minimal to no user interaction with the system, by taking into account the context in which the user is using the headsets and past user behavior.
As described herein, the system may perform adaptive volume control operations in order to dynamically and automatically (e.g., without user intervention, such as through a user-selection of a volume control) adapt the volume level of the system. In some cases, however, a user may wish to have manual control of the volume level of the system. For example, as the user moves between environments, the user may wish to maintain the volume level for at least a period of time, even though the noise level has changed. This may provide the user with a more consistent listening experience, especially when the user is not aware of environmental changes. Once the user decides to change the volume level, however, the user may be forced to manually adjust the volume through multiple user selections of the volume control in order to get to a desired volume level. For example, when the user is adjusting the volume control, the user may over shoot (e.g., decrease the volume beyond a desired threshold) a desired volume level, and may be forced to fine tune the volume adjustment (e.g., by increasing the volume level after having already decreased the volume). This has many disadvantages. For example, having to manually adjust the volume may take a considerable amount of time and concentration, which may depreciate the user experience. Therefore, there is a need for a system that provides adaptive volume operations in response to receiving one or more user adjustments to a volume control.
The present disclosure describes a system that provides adaptive volume control operations in order to adapt a volume setting responsive to receiving a single user adjustment to a volume control. For example, a volume control may have a series of incremental volume settings, such as having a first volume setting of zero that may be associated with a lowest volume level (e.g., mute or fully ducked), and a last volume setting of ten that may be associated with a highest volume level (e.g., 0 decibels relative to full scale (dBFS)), where each volume setting in the series increases by a value, such as one. Upon receiving a user adjustment to the volume control, such as by changing a current volume setting of three to a next volume setting of four (e.g., twisting a volume dial by a particular degree), the system may perform adaptive volume control operations to predict (determine) a better user-desired volume setting. In particular, the system may draw from historical data that indicates past user-set volume settings. In which case, the system may determine that a higher volume setting may be more preferable, such as a volume setting of six. For instance, this may be due to the historical data indicating that in the past the user has listened to audio content (e.g., on average) at a volume level of six. Upon determining the volume setting, the system may adapt the volume level. As a result, the system may provide the user with a user-desired volume setting, without requiring the user to manually adjust the volume setting several times (e.g., by receiving two more user adjustments).
As referenced herein, “audio content” may be (and include) any type of audio, such as a musical composition, a podcast, audio of a virtual reality (VR) environment, a sound track of a motion picture, etc. In one aspect, the audio content may be a part of a piece of audio content, which may be an audio program or audio file that includes one or more audio signals that includes at least a portion of the audio content. In some aspects, the audio program may be any type of audio content format. In one aspect, an audio program may include audio content for spatial rendering as one or more data files in one or various three-dimensional (3D) audio formats, such as having one or more audio channels. For instance, an audio program may include a mono audio channel or may be a multi-audio channel format (e.g., two stereo channels, six surround source channels (in 5.1 surround format), etc.). In another aspect, the audio program may include one or more audio objects, each having at least one audio signal, and positional data (for spatially rendering the object's audio signals) in 3D sound. In another aspect, the audio program may be represented in a spherical audio format, such as higher order ambisonics (HOA) audio format.
In one aspect, the system 10 may adapt volume in “real-time” such that the volume may be changed as environmental conditions, user context, and/or user behavior change. In some aspects, the system may perform adaptive volume operations in real-time such that the volume may be changed with a minimum amount of time (e.g., accounting for processing time of one or more electronic components of the system, such as one or more processors) from when changes to the environment and/or user context are detected by the system.
As shown, the system 10 includes a source (or audio source) device 14, an output (or audio output) device 15, a (e.g., computer) network (e.g., the Internet) 12, and a media content server 13. In one aspect, the system may include more or fewer elements, such as having additional content servers, or not including content servers and/or a source device. In which case, the output device may perform all (or most) operations, such as the adaptive volume control operations, as described herein.
In one aspect, the media content server 13 may be a stand-alone electronics server, a computer (e.g., desktop computer), or a cluster of server computers that are configured to store, stream, and/or receive digital content, such as audio content (e.g., as one or more audio signals in any audio format). In another aspect, the content server may store video and/or audio content, such as movies, for streaming (transmitting) to one or more electronic devices. As shown, the server is communicatively coupled (e.g., via the network 12) to the source device 14 in order to stream (e.g., audio) content for playback (e.g., via the output device). In another aspect, the content server may be communicatively coupled (e.g., directly) to the output device.
In one aspect, the source device 14 may be any electronic device (e.g., with electronic components, such as one or more processors, memory, etc.) that is capable of streaming audio content, in any format, such as stereo audio signals, for playback (e.g., via one or more speakers integrated within the source device and/or via one or more output devices, as described herein). For example, the source device may be a desktop computer, a laptop computer, a digital media player, etc. In one aspect, the device may be a portable electronic device (e.g., being handheld operable), such as a tablet computer, a smart phone, etc. In another aspect, the source device may be a wearable device (e.g., a device that is designed to be worn on (e.g., attached to clothing and/or a body of) a user, such as a smart watch.
In one aspect, the output device 15 may be any (e.g., portable) electronic device that includes at least one speaker and is configured to output (or playback) sound by driving the speaker(s) with audio signal(s). For instance, as illustrated the device is a wireless headset (e.g., in-ear headphones or earphones) that are designed to be positioned on (or in) a user's ears and are designed to output sound into the user's ear canal. In some aspects, the earphone may be a sealing type that has a flexible ear tip that serves to acoustically seal off the entrance of the user's ear canal from an ambient environment by blocking or occluding in the ear canal. As shown, the output device includes a left earphone for the user's left ear and a right earphone for the user's right ear. In this case, each earphone may be configured to output at least one audio channel of audio content (e.g., the right earphone outputting a right audio channel and the left earphone outputting a left audio channel of a two-channel input of a stereophonic recording, such as a musical work). In another aspect, the output device may be any electronic device that includes at least one speaker and is arranged to be worn by the user and arranged to output sound by driving the speaker with an audio signal. As another example, the output device may be any type of headset, such as an over-the-ear (or on-the-ear) headset that at least partially covers the user's ears and is arranged to direct sound into the ears of the user. In another aspect, the output device may be a wearable electronic device, such as smart glasses or a smart watch.
In some aspects, the output device may be a head-worn device, as illustrated herein. In another aspect, the output device may be any electronic device that is arranged to output sound into an ambient environment. Examples may include a stand-alone speaker, a smart speaker, a home theater system, or an infotainment system that is integrated within a vehicle. In another aspect, the output device as a head-worn device may be arranged to output sound into the ambient environment. For instance, when the output device is a pair of smart glasses, the output device may include “extra-aural” speakers that are arranged to project sound into the ambient environment (e.g., in a direction that is away from at least a portion, such as ears or ear canals, of a wearer), which are in contrast to “internal” speakers of a pair of headphones that are arranged to project sound into (or towards) a user's ear canal when worn.
As described herein, the output device may be a wireless device that may be communicatively coupled to the source device in order to exchange (e.g., audio) data. For instance, the source device may be configured to establish the wireless connection with the output device via a wireless communication protocol (e.g., BLUETOOTH protocol or any other wireless communication protocol). During the established wireless connection, the source device may exchange (e.g., transmit and receive) data packets (e.g., Internet Protocol (IP) packets) with the output device, which may include audio digital data in any audio format.
In another aspect, the source device 14 may communicatively couple with the output device 15 via other methods. For example, both devices may couple via a wired connection. In this case, one end of the wired connection may be (e.g., fixedly) connected to the output device, while another end may have a connector, such as a media jack or a universal serial bus (USB) connector, which plugs into a socket of the source device. Once connected, the source device may be configured to drive one or more speakers of the output device with one or more audio signals, via the wired connection. For instance, the playback device may transmit the audio signals as digital audio (e.g., PCM digital audio). In another aspect, the audio may be transmitted in analog format.
The system 10 also includes a case 38 that is designed to hold the output device 15 while it is not in use by the user. The case includes a lid 39 that may be closed in order to at least partially seal the output device within an interior space of the case. As described herein, the output device 15 may be configured to perform one or more contextual adaptive volume control operations (which may be referred to as “adaptive volume control operations” as described herein) based on the state of the lid and/or of the output device. In particular, once the lid is opened, by a user of the output device for example, the output device may be configured to perform at least some adaptive volume control operations. More about the output device performing adaptive volume control operations based on the state of the lid is described herein.
In one aspect, the output device 15 may detect the state of the lid. For example, the output device may (periodically) monitor sensor data from one or more sensors, such as a proximity sensor (not shown). In another aspect, the case 38 may include circuitry, such as one or more processors, memory, and a network interface, which allows it to (wirelessly) communicate with the output device. As a result, the case may detect (e.g., based on sensor data captured by one or more sensors of the case) that the state of the lid 39 has moved from a closed state to an open state, and in response may transmit a message to the output device, indicating that the lid's state has changed. In another aspect, the output device may be configured to determine the state of the lid through any known method.
In some aspects, the source device 14 and the output device 15 may be distinct (separate) electronic devices, as shown herein. In another aspect, the source device may be a part of (or integrated with) the output device. For example, at least some of the components of the source device (such as one or more processors, memory, etc.) may be part of the output device, and/or at least some of the components of the output device may be part of the source device. In which case, at least some of the operations performed by the source device (e.g., streaming audio content from the media content server 13) may be performed by the output device.
The controller 20 may be a special-purpose processor such as an application-specific integrated circuit (ASIC), a general-purpose microprocessor, a field-programmable gate array (FPGA), a digital signal controller, or a set of hardware logic structures (e.g., filters, arithmetic logic units, and dedicated state machines). The controller is configured to perform adaptive volume control operations, audio signal processing operations and/or networking operations. More about the operations that may be performed by the controller 20 are described herein.
In one aspect, the memory 21 may be any type of non-transitory machine-readable storage medium. Examples may include read-only memory, random-access memory, CD-ROMS, DVDs, magnetic tape, optical data storage devices, flash memory devices, and phase change memory.
The camera 18 may be a complementary metal-oxide-semiconductor (CMOS) image sensor that is capable of capturing digital images including image data that represent a field of view of the camera, where the field of view includes a scene of an environment in which the source device 14 is located. In some aspects, the camera may be a charged-coupled device (CCD) camera type. The camera is configured to capture still digital images and/or video that is represented by a series of digital images. In one aspect, the camera may be positioned anywhere about/on the source device. In some aspects, the source device may include multiple cameras (e.g., where each camera may have a different field of view).
The microphone 17 may be any type of microphone (e.g., a differential pressure gradient micro-electro-mechanical system (MEMS) microphone) that is configured to convert acoustical energy caused by sound wave propagating in an acoustic environment into a microphone signal. In some aspects, the microphone may be an “external” (or reference) microphone that is arranged to capture sound from the acoustic environment. In another aspect, the microphone may be an “internal” (or error) microphone that is arranged to capture sound (and/or sense pressure changes) inside a user's ear (or ear canal).
The speaker 23 may be an electrodynamic driver that may be specifically designed for sound output at certain frequency bands, such as a woofer, tweeter, or midrange driver, for example. In one aspect, the speaker 23 may be a “full-range” (or “full-band”) electrodynamic driver that reproduces as much of an audible frequency range as possible.
The display 131 (or display screen) is designed to present (or display) digital images or videos of video (or image) data. In one aspect, the display may use liquid crystal display (LCD) technology, light emitting polymer display (LPD) technology, or light emitting diode (LED) technology, although other display technologies may be used in other aspects. In some aspects, the display may be a touch-sensitive display screen that is configured to sense user input as touches or taps on the screen, and in response produce one or more control signals. In some aspects, the display may use any touch sensing technologies, including but not limited to capacitive, resistive, infrared, and surface acoustic wave technologies.
The volume control 16 is configured to adjust a volume level of sound output of the source device (and/or the output device 15) in response to receiving a user adjustment (e.g., user input) at the control. In one aspect, the volume control may be a “master” volume control that is configured to control the overall volume level (e.g., sound output level of the speaker 23) of the source device. In another aspect, when the output device is communicatively coupled with the source device in order to stream audio content to the output device, for playback, the volume control may control the overall volume level of the output device (as well). In one aspect, the volume control may be a physical volume control that may be a dedicated volume input control, such as one or more buttons, a rotatable knob, or a physical slider. For instance, the volume control 16 may include at least two buttons, a “volume up” button, which may be configured to perform a stepwise increase to the volume each time it is pressed by the user, and a “volume down” button that may be configured to perform a stepwise decrease to the volume each time it is pressed by the user. In some aspects, the volume control may be any type of physical input device that can adjust the volume level.
In one aspect, the system 10 may include one or more volume settings, each setting defining a different volume level (e.g., dB). In particular, either volume control 16 of the source device and/or 19 of the output device may include a series of one or more volume settings (or positions), where each setting defines a different volume level (e.g., a different sound output level (e.g., dB SPL)) of the system 10. In particular, the volume control may (e.g., in response to a user adjustment) incrementally increase or decrease the volume level based on a user adjusting the control's volume setting or position. For example, when the volume control is a rotatable volume knob, the control may have several (e.g., 18) volume settings, where each successive volume setting may correspond to a degree of rotation and may increase the overall volume by a particular gain value. Specifically, each volume setting may result in a stepwise increase (or decrease) of the system volume. In this case, each volume setting may correspond to a 20° rotation about a center axis of the volume control. For instance, a first volume setting may be 0°, where the overall volume is muted (e.g., having a sound output level of 0 dB), a second volume setting may be 20° that may increase the output level by a particular value, such as 20 dB. Thus, the knob produces a control signal that either incrementally increases or decreases the volume based on how much the knob is twisted and in what direction (e.g., turning clockwise increases the volume, whereas turning counterclockwise decreases the volume). In one aspect, the volume control may be a master volume control that is configured to provide bi-directional control for either incrementally increasing or decreasing an overall volume level of (e.g., sound output of) the device. In one aspect, the control may be a part of the source device (e.g., integrated on the device). In another aspect, the volume control may be a part of an electronic device that is communicatively coupled with the source device.
In one aspect, each volume setting may be associated with a volume (or a particular gain value) that may be applied to one or more audio signals. In particular, in the case in which the volume control is a button, each volume setting of the button may correspond to a particular gain value that may be applied to audio content (that is being played back by the system 10). For example, when a volume control has a volume setting range from 0-10, the highest volume setting, 10, may have a volume level of 0 decibels full scale (dBFS), while the next volume setting down, 9, may have a volume level of −10 dBFS. In which case, when the volume setting is reduced from 10 to 9, the system may be configured to apply attenuation (or gain) of −10 dB to an output audio signal. If pressed by the user again, the volume setting would be reduced to 8, causing the total amount of attenuation to be applied to −20 dB.
In one aspect, the volume control may be a software volume control such that a software application (e.g., a media player application) that may be executing on the source device performs one or more digital signal processing operations to modify one or more digital audio signals associated with audio content. In some aspects, adjustments to the control may result in one or more gains being applied to the digital audio signal.
In one aspect, the volume control 16 may be a user interface (UI) item that is displayed on (e.g., a graphical user interface (GUI) within) the display 131 of the source device. For example, the volume control may be a slider that may be translated along a predefined slidable range. When user input is received to adjust (or translate) the position of the slider (e.g., by the user touching the slider on the display screen and dragging it in one or more directions), the volume control adjusts the overall volume level based on the position of the slider. In one aspect, similar to the example of the physical control, the UI item may include several volume settings, where each position of the slider may correspond to a different volume level for the device. In one aspect, the software volume control may adjust the volume by applying one or more scalar gain values upon one or more digital audio signals in order to scale the levels of the signals.
In some aspects, the volume control may be any input by a user of the device. For example, the input may include a gesture (e.g., a hand gesture, a finger gesture, a head gesture, etc.) made by the user and detected by the device (e.g., based on movement detected by a motion sensor, such as an inertial measurement unit (IMU), of the source device that is caused by the hand gesture). In another aspect, the volume control may be a voice command that is received via the microphone 17. In which case, the voice command may include a request to turn up or down the volume.
The output device 15 includes a controller 24, a network interface 25, a speaker 26, two microphones 28 and 29, one or more (other) sensors 45, a volume control 19, and memory 27. In one aspect, the device may include more or less elements. For example, the output device may include one or more speakers, and/or may include one or more microphones. In one aspect, microphone 28 may be a reference microphone and microphone 29 may be an error microphone, as described herein. In another aspect, the output device may not include an error microphone, or may include at least one reference microphone and at least one error microphone. For example, in the case of a headset, an error microphone may sense sound inside the user's ear when the headset is positioned on (or in) the user's ear.
The sensor(s) 45 may include one or more other sensors that are designed to produce sensor data. For instance, the sensors may include an IMU or a proximity sensor. In which case, the IMU may be designed to produce motion data that indicates (changes in) the position and/or orientation of the output device. The sensors may include a location sensor, such as a Global Positioning System (GPS) sensor that provides location data (e.g., position and/or orientation data) of the output device 15. In another example, the sensors may include an accelerometer that may be arranged and configured to receive (detect or sense) speech vibrations that are produced while a user (e.g., who may be wearing the output device) is speaking, and produce an accelerometer signal that represents (or contains) the speech vibrations. Specifically, the accelerometer is configured to sense bone conduction vibrations that are transmitted from the vocal cords of the user to the user's ear (ear canal), while speaking and/or humming. For example, when the audio output device is a wireless headset, the accelerometer may be positioned anywhere on or within the headphone, which may touch a portion of the user's body in order to sense vibrations.
In one aspect, controller 24 may be configured to perform contextual adaptive volume control operations, (other) audio signal processing operations and/or networking operations, as described herein. For instance, the controller may be configured to obtain (or receive) audio data (as an analog or digital audio signal) that includes audio content, such as music for playback through the speaker 26. In some aspects, the controller may obtain audio data from memory 27, or the controller may obtain audio data from another device, such as the source device via the network interface 25. For instance, the output device may stream an audio signal from the source device (e.g., via the BLUETOOTH connection) for playback through the speaker 26. The audio signal may be a signal input audio channel (e.g., mono). In another aspect, the controller may obtain two or more input audio channels (e.g., stereo) for output through two or more speakers. In one aspect, in the case in which the output device includes two or more speakers, the controller may perform additional audio signal processing operations.
In one aspect, the volume control 19 may perform similar operations as the volume control 16 of the source device. For instance, upon receiving user input, the control 19 may adjust the (e.g., overall) volume of sound output by the (e.g., speaker 26 of the) output device 15. In some aspects, the volume control 19 may be used to adjust the volume at the source device, and as a result, upon receiving user input, the output device 15 may transmit a message (control) signal to the source device 14 indicating a user adjustment of the volume control 19. In particular, the message may include the resulting volume setting at which the user adjusted the volume control. In one aspect, the source device, may use the signal to adjust the volume of one or more audio signals at the source device. More about adjusting the volume is described herein.
In one aspect, the volume control 19 may be a software volume control. In another aspect, the control 19 may be a “hardware” volume control, which may set (or adjust) volume settings for one or more hardware components (e.g., the digital-to-analog converter (DAC) 49 or the amplifier (AMP) 48, as shown in
In one aspect, the output device 15 may be an “active” or “powered” device, which draws power from an (external and/or internal) power source to power at least some of its components. For instance, the output device may be a wired or wireless headset, which when paired with (e.g., communicatively coupled via a wire and/or wireless connection) to the source device, draws power an internal source (e.g., a battery storage) to power (at least) the AMP 48 for driving the speaker 26. An example of such an output device may be a pair of wireless earphones that may pair with the audio source device via any wireless protocol, such as BLUETOOTH protocol, as illustrated in
In another aspect, the output device 15 may be a “passive” or “non-powered” device, which is designed to draw power from an external power source, such as a companion device to which it is coupled. For example, the output device 15 may be wired in-ear headphones that may be coupled to a companion (or source) device (e.g., a multi-media device), which may drive one or more speakers of the output device. In which case, the source device to which the output device is coupled may perform the contextual adaptive volume control operations described herein.
In one aspect, one or more of the devices of the system 10 may be configured to perform (e.g., additional) audio signal processing operations based on one or more elements, such as one or more microphones and/or speakers, which are coupled to a device's respective controller. For instance, when the output device includes two or more “extra-aural” speakers, which are arranged to output sound into the acoustic environment rather than speakers that are arranged to output sound into a user's ear (e.g., as speakers of an in-ear headphone), the controller may include a sound-output beamformer that is configured to produce speaker driver signals which when driving the two or more speakers produce spatially selective sound output. Thus, when used to drive the speakers, the output device may produce directional beam patterns that may be directed to locations within the environment.
In some aspects, the controller 24 may include a sound-pickup beamformer that can be configured to process the audio (or microphone) signals produced two or more microphones of the output device to form directional beam patterns (as one or more audio signals) for spatially selective sound pickup in certain directions, so as to be more sensitive to one or more sound source locations. In some aspects, the controller may perform audio processing operations upon the audio signals that contain the directional beam patterns (e.g., perform spectrally shaping).
In one aspect, operations performed by the controllers may be implemented in software (e.g., as instructions stored in memory and executed by either controller) and/or may be implemented by hardware logic structures as described herein.
In another aspect, at least some of the operations performed by the system 10 as described herein may be performed by the source device 14 and/or by the output device 15. For instance, the source device may include two or more speakers and may be configured to perform sound-output beamformer operations (e.g., when the source device includes two or more speakers). In another aspect, at least some of the operations may be performed by a remote server that is communicatively coupled with either device, for example over the network (e.g., Internet).
In one aspect, at least some elements of the source device 14 and/or the output device 15 may be integrated (or a part of) each the respective device. For example, when the output device is on-ear headphones, the microphone and the speaker may be a part of at least one earcup of the headphones that is placed on a user's ear. In another aspect, at least some of the elements may be separate electronic devices that are communicatively coupled to the device. For instance, the display 131 may be a separate device (e.g., being a display monitor or television) that is communicatively coupled (e.g., wired or wirelessly connected) with the source device to receive image data for display. As another example, the camera 18 may be a part of a separate electronic device (e.g., a webcam) that is coupled to the local device to provide captured image data.
The input audio source 132 may include a programmed processor that is running a media player software application and may include a decoder that is producing an input audio signal as digital audio input to the (e.g., controller 24 of the) output device. In one aspect, the programmed processor may be a part of the output device, such that the media player is executed within the device. In another aspect, the media player may be executed by (e.g., one or more programmed processors of) another electronic device, such as the source device 14 or the media content server 13. For instance, the source device may execute the media player and may (e.g., wirelessly) transmit audio content (as one or more audio signals) to the output device. In some aspects, the decoder may be capable of decoding an encoded audio signal, which has been encoded using any suitable audio codec, such as, e.g., Advanced Audio Coding (AAC), MPEG Audio Layer II, MPEG Audio Layer III, or Free Lossless Audio Codec (FLAC). Alternatively, the input audio source 132 may include a codec that is converting an analog or optical audio signal, from a line input, for example, into digital form for the controller. Alternatively, there may be more than one input audio channel, such as a two-channel input, namely left and right channels of a stereophonic recording of a musical work, or there may be more than two input audio channels, such as for example the entire audio soundtrack in 5.1-surround format of a motion picture film or movie. In one aspect, the input audio source 132 may provide a digital input or an analog input.
In one aspect, the audio content received from the input audio source may be any type of audio content. As described herein, the source may be a media player software application that may provide media audio content associated with the (executing) application. For example, the audio content may be a musical composition, a movie soundtrack, a podcast, etc. In another aspect, the input audio source may be any type of software application that may be configured to provide audio content to the output device. As an example, the source 132 may include a telephony software application that may be configured to provide telephony audio content as a downlink audio signal of a call that may be established between the output device 15 and another (e.g., remote) electronic device. As another example, the source may include a virtual personal assistant (VPA) software application that may be configured to provide VPA audio content that may include audible notifications of the VPA, which may be provided responsive to commands or requests of a user of the output device, for playback through the speaker 26 of the by the output device 15. In another aspect, the source 132 may be any type of media application that produces (provides) audio content for playback.
As shown, the controller 24 of the output device has several operational blocks for performing one or more digital signal processing operations, such as contextual adaptive volume control operations, as described herein. In particular, the controller 24 includes an environmental/playback characteristics estimator 31, a context & behavior engine 30, a volume model 32, a sound enhancer 46, a volume setting 35, a ramp 36, a scalar gain 37, and an audio renderer 133. In one aspect, at least some of these operational blocks may perform one or more operations while the output device 15 is in an “on-state” or in use by a user. For instance, in the case in which the output device is an in-ear headset, the operations described herein may be performed while the in-ear headset is worn by and is at least partially inserted into the ear of the user. In another aspect, the operations may be performed while audio content is streamed by the output device 15 (e.g., and is being played back through the speaker 26).
During playback of audio content, the scalar gain 37 may be configured to receive an input audio signal (or one or more signals) of the audio content from the input audio source 132, and apply a gain based on the volume setting 35, such that the audio content is played back at a desired volume level (associated with the volume setting). The controller 24 may be configured to playback the gain-adjusted audio signal by using the signal to drive the speaker 26. Thus, the output device may be configured to playback the audio content at a given (or target) volume setting. As described herein, the contextual adaptive volume control operations performed by the controller may adjust playback of the audio content by applying one or more gains to the audio signal based on a target volume setting.
In one aspect, the output device may be configured to operate in one of several audio processing modes. For instance, the output device may be user-configured (e.g., through an input device) to operate in a mode, or may be automatically configured by the output device based on one or more criteria. The audio processing modes include an active noise cancelation (ANC) mode, an ambient sound enhancement (ASE) mode, a combination of both modes, or a passive mode. More about these modes are described herein.
In one aspect, the sound enhancer 46 may be configured to cause the output device to operate in one of the audio processing modes. In particular, the enhancer may be configured to receive a (reference) microphone signal that includes sounds of the environment captured by the reference microphone 28 and/or receive a (error) microphone signal that includes sounds captured inside or next to a user's ear, while the output device is worn by a user, captured by the error microphone 29. The enhancer may be configured to perform an ANC function, while the output device is in an audio processing mode that operates in the ANC mode, to cause the speaker 26 to produce anti-noise in order to reduce ambient noise from the environment that is leaking into the user's ears. In particular, the enhancer may use one or more of the microphone signals to implement a feedforward ANC, a feedback ANC, or a combination thereof. In one aspect, the ANC function may be a feedforward ANC that is configured to generate an anti-noise signal based on sound captured (e.g., by one or more reference microphones) in the acoustic environment. In another aspect, the ANC function may be a feedback ANC that is configured to generate an anti-noise signal based on sound captured by one or more error microphone signals. In some aspects, the ANC function may implement a combination of the feedforward and feedback ANC to produce the anti-noise. In one aspect, the ANC function performed by the sound enhancer 46 may be adaptive to changing sounds. In particular, the sound enhancer 46 may adjust one or more ANC gains, which may be scalar (wideband) gain blocks configured to raise (or lower) a level of produced anti-noise (signal) and/or may adjust one or more ANC filters based on changes to the environment (e.g., based on changes to the environmental noise level).
In another aspect, the sound enhancer 46 may be configured to perform an ASE function, while the output device is in an audio processing mode that operates in the ASE mode, in which sound played back by the output device may be a reproduction of ambient sound that is captured by one or more reference microphones. Such a function may be referred to as a “pass through” or “transparency” mode in which sounds of the environment are reproduced by the speaker 26 (as one or more ASE signals) in a “transparent” manner, e.g., as if the output device were not being worn by the user (when the output device is a headset). The sound enhancer processes at least one microphone signal captured by at least one reference microphone 28 and filters the signal through a transparency filter and/or transparency gain to produce one or more ASE signals, which may reduce acoustic occlusion due the audio output device being on, in, or over the user's ear, while also preserving the spatial filtering effect of the wear's anatomical features (e.g., head, pinna, shoulder, etc.). The filter also helps preserve the timbre and spatial cues associated with the actual ambient sound. In one aspect, the filter of the transparency function may be user specific according to specific measurements of the user's head. For instance, the sound enhancer may determine the transparency filter according to a head-related transfer function (HRTF) or, equivalently, head-related impulse response (HRIR) that is based on the user's anthropometrics.
Another example of the sound enhancer 46 may perform a combination of both ANC and ASE functions. Specifically, the enhancer may be configured to produce one or more anti-noise signals and/or one or more ASE filtered signals, which when used to drive the speaker 26 may attenuate at least some ambient noise and/or pass through one or more ambient noises. While operating in the passive mode, the enhancer may be configured to not perform an ANC function or the ASE function. In which case, the output device may provide only passive attenuation from the ambient environmental noise, while neither the ANC nor the ASE mode are active.
The audio renderer 133 may be configured to receive one or more audio signals from the input audio source 132 (or from the scalar gain 37), and render the signals into one or more driver signals for driving the speaker 26. In particular, the renderer may receive a gain-adjusted audio signal from the scalar gain 37 that may adjust the signal according to the volume setting (e.g., adapted volume setting) of the output device. In one aspect, the renderer may receive one or more anti-noise signals and/or one or more ASE signals from the enhancer 46, and may render the signals for playback through the speaker 26. In another aspect, the renderer may combine the signals from the enhancer 46 and from the scalar gain 37 (e.g., via matrix multiplication), and may render the combined signals.
In some aspects, the audio renderer 133 may perform spatial audio rendering operations in order to spatially render one or more audio signals. For example, the renderer may apply spatial filters (e.g., HRTFs) that are personalized for the user of the audio system in order to account for the user's anthropometrics. In another aspect, the spatial filters may be default filters. As a result, the renderer is configured to produce spatial audio signals (e.g., binaural audio signals), which when outputted through speaker 26 produces a 3D sound (e.g., giving the user the perception that sounds are being emitted from a particular location within an acoustic space).
The environmental/playback characteristic estimator 31 may be configured to determine (or estimate) environmental characteristics of the output device 15 and/or playback characteristics of the output device. In one aspect, playback characteristics may be associated with features or conditions of the output device 15 during audio playback. In one aspect, characteristics may indicate user context in which the user is using the output device 15. In one aspect, environmental characteristics may be measured (or estimated) features of or conditions within the environment of the output device 15. In one aspect, the characteristics may be determined while the output device is in an “on-state” in which the output device may be used by a user of the system 10. For instance, when the output device is a headset, it may be in an on-state when worn on the user's head, and/or performing one or more operations, such as an ANC function and/or playing back audio content. In another aspect, the device may be in an on-state when the device is active (powered) or performing one or more operations, such as playing back audio content. In another aspect, the estimator 31 may determine at least some characteristics while the output device is in an “off-state,” such as while the device is off the user's head, but may remain powered. In this case, characteristics may be determined while the output device is housed within case 38.
In one aspect, one or more environmental characteristics may be determined based on sensor data from one or more sensors, such as sensors 45 and/or one or more microphones 28 and 29. For instance, environmental characteristics may be determined based on an acoustic analysis of one or more microphone signals captured by one or more reference microphones and/or one or more error microphones. For instance, the estimator 31 may be configured to determine the environmental noise level of ambient noise from within an acoustic environment in which the output device and/or source device are located. For instance, the estimator may be configured to receive a microphone signal captured by microphone 28, where the microphone signal may include one or more ambient noises within the acoustic environment. The estimator may use the microphone signal to measure (or estimate) an environmental noise level of the acoustic environment (e.g., as a dB sound pressure level (SPL) value). In one aspect, the environmental noise level may be an A-weighting sound pressure level value (e.g., as a dBA value).
In another aspect, the environmental noise level may include (or be) an “in-ear” or “headset” noise level estimate that may refer to the amount of noise exposure perceived by the user, while wearing the output device. In particular, when the output device 15 is an in-ear headset, the in-ear noise level may be the amount of noise that leaks from the environment into the user's ear. In one aspect, this environmental noise level may be captured by the error microphone. The estimator 31 may receive a microphone signal captured by the error microphone 29, and may use the microphone signal to estimate the in-ear noise level (e.g., as a dBA value).
In another aspect, the estimator 31 may estimate the in-ear noise level based on the reference microphone signal captured by the reference microphone 28 and based on the audio processing mode in which the output device 15 may be operating. In particular, the estimator may use an estimated amount of attenuation due to active and/or passive attenuation by the output device due to the mode in which the output device is operating and use the reference microphone to determine the in-ear noise level. In particular, the estimator may determine the audio processing mode in which the output device may be operating, and may determine the environmental noise level as the in-ear noise level based on the microphone signal and the audio processing mode.
In one aspect, the estimator 31 estimates the in-ear noise level by determining an amount of attenuation due to the audio processing mode that the sound enhancer 46 is operating. For example, the estimator may determine an amount of attenuation based on one or more ANC and/or ASE characteristics and/or based on the passive attenuation due to the output device. For instance, when ANC is active, the estimator may determine the attenuation level based on one or more ANC filter characteristics and/or ANC gain that is being applied by the sound enhancer while the ANC function is active. Similarly, when the ASE function is active, the estimator may determine an amount of attenuation due to the ASE function based on one or more ASE filters and/or one or more ASE gains that are being applied to one or more reference microphone signals. In one aspect, the estimator may perform a table lookup into a data structure that associates attenuation levels with ANC and/or ASE characteristics, described herein.
As described herein, the estimator 31 may determine a passive attenuation level based on the coupling of the output device 15. In which case, the estimator may perform a fit estimation process to determine the quality of the coupling of the output device to the user. In particular, when the output device is a headset, the estimator may perform a fitting process to determine how leaky the output device is (e.g., due to openings between the headset and a portion of the user's ear or head, due to the headset not providing an optimal fit). For instance, the estimator may receive the error microphone signal of the error microphone 29, and estimate (measure or determine) a transfer function (or frequency response) that represents a travel path between the speaker 26 and the error microphone 29 (e.g., as the output device plays back audio content or a test signal). In another aspect, the transfer function may be determined as part of an active ANC function. For instance, the controller may be configured to perform adaptive feedback ANC to adapt an ANC filter according to an estimate of a secondary path (or S-path) transfer function that represents the travel path between the speaker and the microphone, which the ANC function may use to adjust an adaptive digital filter used to produce the anti-noise signal. In one aspect, the S-path transfer function may be estimated, while the ANC function is activated (e.g., based on user input received by the output device), or may be estimated as part of the ANC function, but while the ANC function is not generated anti-noise. In another aspect, the estimator may determine a transfer function between the reference microphone and the error microphone, and may use this transfer function to determine the fit of the output device.
With the attenuation levels associated with the ANC/ASE functions and/or the passive attenuation determined, the estimator may be configured to determine the in-ear noise level based on the sum of the attenuation level(s) and the measured sound level of the reference microphone signal(s), and by producing the in-ear noise level.
In one aspect, the estimator 31 may determine the in-ear noise level after the sound enhancer 46 performs the dynamic adaptive ANC or ASE functions to produce anti-noise through the speaker 26. For example, as described herein, the output device may be an in-ear headset, where sound at the user's ear may include noise that leaks into (through the coupling of the in-ear headset and the user's ear) the user's ear canal and sound produced by the speaker 26. In which case, when the ANC function is performed, the perceived noise by the user may be reduced. As a result, in order to determine a more accurate in-ear noise level measurement, the estimator 31 may receive a microphone signal from the reference microphone after the speaker produces (or plays back) the anti-noise. As a result, the anti-noise cancels out at least some of the environmental noise, thereby resulting in less perceived noise by the user. In which case, the estimator may determine the in-ear noise level from the reference microphone to account for canceled noise at the user's ear based on the produced anti-noise, as described herein (e.g., based on the ANC filter(s) and/or ANC gain(s) used to cancel out the noise that leaked into the user's ear).
As described thus far, the estimator 31 may determine the in-ear noise level based on the reference microphone signal and an attenuation level due to active ANC/ASE functions and/or the coupling of the output device to the user. In which case, the attenuation may be frequency dependent, such that it accounts for different attenuation effects across one or more frequency bands. In particular, the estimator may determine, for each frequency band of several bands across a frequency range, one or more attenuation levels based on the audio processing mode, such as based on the characteristics described herein. In particular, the attenuation level may include several attenuation levels (e.g., in dB), each level associated with a different frequency band within a range of frequency bands). In one aspect, the in-ear noise level may be produced (e.g., as a A-weighted value) based on the reference microphone signal, such as one or more noise levels across one or more frequency bands and the determined attenuation level for that those respective frequency bands.
In one aspect, the estimator 31 may take into account audio playback by the output device when determining the in-ear noise level. In particular, during audio playback, the in-ear noise level may result in an output sound level of the audio playback in combination with a noise level of noise that leaks into the user's ears. In which case, the estimator may account for the audio playback and determine the noise level as representing only (or the majority of) the noise from the environmental, which may not have been canceled by the anti-noise produced by the controller 24, as described herein, that leaks into the user's ears. In some aspects, to account for the audio content the estimator may subtract the content level of the audio content from the in-ear noise level. In another aspect, the controller may account for the audio playback by combining the sound level of the playback with the estimated noise level. As a result, the in-ear noise level may be a combination of the audio playback and the environmental noise that leaks into the user's ears, thus being an aggregate noise exposure level value perceived by the user. In which case, the in-ear noise level may be a sound output level (dBA) at the user's ear.
In one aspect, the estimator 31 may determine the environmental noise level based on one or more audio samples of the microphone signals captured by the reference microphone 28 and/or the error microphone 29. For instance, each audio sample may represent a portion of a microphone signal over a period of time. The estimator may be configured to remove outlier samples from the captured samples. For example, the estimator may determine whether spectral content (energy level) of a sample exceeds a standard deviation of a median (or average) spectral content. If so, the sample may be used to determine the at least one of the noise levels. In one aspect, the estimator may determine the noise levels upon determining that a group of one or more samples exceed the standard deviation. In particular, upon determining that the group of samples exceeds a threshold of samples, one or more noise levels may be determined based on the group of samples. If so, the estimator may process the samples to determine the noise levels, such as applying one or more filters, such as a median filter, to the group of samples.
In one aspect, the estimator 31 may perform smoothing operations in order to filter out transient or unwanted sounds (e.g., sounds that may cause an inaccurate estimation of the environmental noise level). For instance, the estimator 31 may be configured to analyze one or more microphone signals of one or more of the microphones to identify sounds of the environment that may be captured by the microphones. For instance, the estimator may be configured to perform a spectral analysis upon the reference microphone signal captured by the reference microphone to detect a presence of sounds, such as wind noise or speech. For instance, the estimator may estimate a wind noise level (e.g., based on spectral content across one or more frequency bands) within the environment. In another aspect, the estimator may be configured to perform a speech recognition algorithm upon the microphone signals to detect speech captured from within the environment. In one aspect, the estimator may be configured to distinguish between the speech of the user of the output device (own voice) and speech of someone else. For example, the estimator may receive an accelerometer signal from an accelerometer of the output device, and may be configured to determine whether the user is speaking based on the accelerometer signal. In another aspect, the estimator may determine whether the user is speaker based on a comparison between the microphone signals and the accelerometer signal (e.g., by comparing one or more frequency bands). In which case, the estimator may remove or filter out audio samples that may be associated with the transient sounds.
In another aspect, the estimator 31 may perform a crosstalk analysis to determine whether the reference microphone signal is capturing audio content that is being played back through the speaker of the output device. For instance, the estimator may analyze the reference microphone signal with respect to an audio signal used to drive the speaker (e.g., to determine whether there is a threshold of correlation). If so, the estimator may filter out audio samples that may include crosstalk.
The estimator 31 may be configured to estimate the content level (or signal level) of the audio content. For instance, the estimator may measure the (e.g., instantaneous) signal level of one or more audio signals of the audio content, in dBFS, for example. In another aspect, the estimator may determine the content level as a loudness of the audio content. For example, the estimator may determine the loudness, A-weighted measurement of the audio content or K-weighted, relative to full scale (LKFS) measurement of the audio content. In one aspect, the measurement may be an average loudness over a duration of the audio content. In another aspect, the estimator may estimate a momentary (or instantaneous) loudness measurement. In which case, the estimator may determine the loudness of the audio signal over a period of time, such as one second, two seconds, three seconds, etc.
In one aspect, the estimator 31 may be configured to determine a signal-to-noise ratio (SNR) of the output device 15. In particular, the estimator may estimate the SNR of one or more audio signals of the audio content. The estimator may monitor spectral content of an audio signal that is being played back by the speaker 26 within certain time intervals, such as 10 milliseconds, and may monitor the in-ear noise level during the same period of time. The estimator may subtract the (e.g., spectral content) of the audio signal from the in-ear noise level to determine the ratio of audio content with respect to noise that is being heard by the user. In another aspect, the estimator may determine the SNR by subtracting the audio signal from a microphone signal of the error microphone 29.
The estimator 31 may receive sensor data from one or more sensors 45 that may provide characteristics of the environment and/or the output device. In one aspect, the sensor data may be from one or more sensors of the output device 15 and/or of the source device 14. For instance, the data may include location data (e.g., position and/or orientation within the environment) of the output device, which may be received from a location sensor, such as a Global Positioning System (GPS) sensor of the output device. In another aspect, the sensor data may image data captured by one or more cameras, such as camera 18 of the source device 14. In one aspect, the controller 24 may be configured to perform an image recognition algorithm upon the image data to detect objects within the environment. For instance, the estimator 31 may be configured to identify objects and/or people within the environment using the image data. In one aspect, the estimator 31 may be configured to determine a location of the output device based on detected objects within the environment. In another aspect, the estimator 31 may use the image data to determine the position and/or orientation of the output device.
In one aspect, the estimator 31 may determine an activity being performed by a user of the output device. For instance, the estimator may determine activity based on sensor data, such as determining that the user is walking to work based on location data, image data, microphone data, etc. In another aspect, the estimator may determine activity based on other data, such as calendar data of a calendar application.
In another aspect, the estimator 31 may receive other data that may indicate or provide characteristics. For instance, the other data may be received from one or more other electronic devices, such as the source device 14. In another aspect, the other data may include metadata regarding the audio content and/or the environment in which the output device is located. For instance, the metadata may indicate a type of audio content that is playing (or is to be played) back by the output device, a content level of the audio content, or any other associated metadata. For instance, the source device may be configured to perform one or more of the operations of the estimator 31 and may be configured to transmit that information to the output device 15. In which case, the other data may include determined characteristics by the source device.
The estimator 31 may be configured to receive audio content from the input audio source 132, as one or more input audio signals, and may be configured to perform an audio content analysis of the audio content to determine one or more playback characteristics, such as a content level, metadata related to the audio content, and/or other audio characteristics. As an example, the estimator may perform a spectral analysis upon the audio content to determine at least some characteristics. For instance, the spectral analysis may identify descriptive information of audio content, such as a type of the audio content, a title of the audio content, a genre of the audio content (e.g., when the audio content is a musical composition), a duration of the audio content, a performer/writer of the audio content, etc. In particular, the estimator may use the spectral content to perform a table lookup into a data structure (stored within memory 27) that associates spectral content with descriptive information (metadata) associated with the audio content. The estimator may also determine one or more software applications to which the audio content belongs. In particular, a characteristic determined by the estimator may be an identification of a software application that is producing the audio content, such as a VPA application, a telephony application, or a media playback application. In another aspect, the estimator may determine (e.g., based on metadata contained with the audio data), whether the audio content is media content, telephony content, or VPA content. As another example, the estimator may determine time of day as a playback characteristic (e.g., based on internal or external clock data), since the user of the output device may have varying user preferences based on the time of day (e.g., listening to quieter audio content at night time as opposed to louder audio content during the day time).
In another aspect, the estimator 31 may determine other characteristics of the audio content. For example, the estimator may be configured to identify one or more sound sources within the audio content. The estimator may perform a spectral analysis, such as blind source separation, to identify one or more sound sources (locations and/or identification of the sound source, such as the type of sound associated with the sound source) within the audio content. In another aspect, the estimator 31 may determine characteristics based on metadata that may be received with the audio content. For instance, when the audio content is a song, audio data of the song may include metadata that indicates a title of the song and a duration of the song. In which case, the estimator may analyze the metadata to determine at least some of the characteristics of the content. Other playback characteristics may include other audio signal processing operations being performed by the controller 24, such as ambient sound enhancement operations, as described herein.
In one aspect, the estimator 31 may be configured to detect changes of one or more characteristics. For instance, the estimator may periodically estimate characteristics, such as the environmental noise level and the in-ear noise level for any changes, and determine whether a characteristic increases or decreases (e.g., above a threshold) from a previously estimated characteristic. As described herein, the controller 24 may adapt the volume setting of the output device based on detected changes by the estimator.
The context and behavior engine 30 may be configured to receive data, such as one or more characteristics generated by the estimator 31 and/or, changes to the volume setting 35 that may be based on user adjustments to the volume control 19 or based on adaptive volume operations of the volume model 32. The engine 30 may be configured to store the received data as historical context and/or behavior data (which may be referred to as “historical data”) 33 in the memory 27. In one aspect, “user context” may indicate how the user is using the output device, which may be based on one or more characteristics from the estimator 31. For example, a user context may be using the output device to listen to music while running through the park in the morning, which may be determined based on one or more characteristics provided by the estimator 31, such as location data, time of day, metadata associated with the music, environmental noise along the route the user is running, etc. “User behavior” may indicate user action and/or inaction based on the user context of the output device. Returning to the previous example, user behavior may represent one or more user adjustments to the volume while running through the park. User behavior may be the result of changes to one or more characteristics, such as fluctuations in environmental noise, thereby causing the user to adjust the volume setting. User behavior may indicate actions or inactions performed by the user, with respect to the context in which the device was used. As described herein, the user behavior may be based on whether the user increases or decreases the volume level based on the context of the device, such as whether the device is in a noisy environment or a quiet environment, based on environmental noise estimation. The engine 30 may store at least some of the received data, which indicates the context in which the output device 15 has been used by the user. For instance, the data may indicate environmental data and playback data that may be gathered (e.g., periodically), as the user uses the output device (e.g., while the device is in an on-state).
In one aspect, the engine 30 may be configured to determine user context of the (user of the) output device based on one or more characteristics. User context may be determined based on one or more characteristics received from the environmental/playback characteristic estimator 31. For example, metadata of audio content that is being played back may indicate user context (e.g., whether the user is listening to loud content or whether the user is listening to content that requires their attention, such as a podcast). In one aspect, metadata of audio content may include descriptive information regarding audio content that is (currently) being played back, such as the type of audio content (e.g., whether a VPA notification, whether media player audio content, etc.), the title of the audio content, the genre of audio content, etc. In another aspect, other characteristics provided to the engine may include the location (e.g., based on location data received by the controller 24) at which the user is listening to the audio content. For example, the location may be an environment, such as a train station, a gymnasium, or a park. The received characteristics may indicate the context in which the output device is being used. For instance, the characteristics of the environment may indicate that the output device is being used in a quiet environment based on the location of the output device (e.g., location data indicating that the device is inside or at a library), the environmental noise level, and/or the in-ear noise level. As another example, the characteristics may indicate whether there are ambient noises or sounds within the environment, such as speech of the user of the output device or speech of another person within the environment and/or other sounds, such as traffic sounds, dogs barking, etc. As another example, the user context may be determined based on several characteristics. In which case, user context may be based on the type of audio content that is being played back and the location at which the user is listening to the content. Returning to the commuter example, the engine 30 may determine the context of which the user is using the output device as the user listening to a particular song (or genre of songs) while riding on a train to work.
In one aspect, the engine 30 may determine user context according to one or more characteristics. For example, the engine 30 may determine that the user of the output device 15 is listening to music while on a train commuting to work based on (e.g., historical) location data, audio content metadata, time of day, etc. Thus, the user context may indicate a specific description of the context in which the user is using the output device and/or listening to audio content.
In another aspect, user context may be defined with respect to one or more characteristics. For instance, rather that specifying a specific task or action being performed by the user while using the output device, user context may be determined as one or more environmental and/or playback characteristics. For example, user context may include an environmental noise level, a type of audio content that the user is listening too while experiencing that noise level, a time of day (or period of time the user is listening to the audio content), location data, a current volume setting, etc.
In one aspect, user behavior may include user action and/or inaction. In particular, a user action may include the system 10 receiving user input via one or more input devices, such as the volume control 19 receiving user input (e.g., by pressing down a button of the volume control 19) to change the volume setting 35 to either increase or decrease the volume level of the system. In another aspect, the engine may store data based on other actions. For example, the user action may be a user adjustment via other methods, such as via a voice command that may be received by the controller 24 through one or more microphones.
In another aspect, user behavior may include an action performed by the user that is not explicitly received to change the volume setting of the output device. For instance, an action may be user movement within the environment of the output device, such as moving from one location in a room, which may be loud, to another location in the same room, being a quiet corner. As another example, the user movement may be between two different environments, such as moving from a quiet restaurant into a noisy sidewalk adjacent to a busy intersection. Although the movement may not change the volume, the resulting reduction in noise may increase the perceptibility of the audio content. Movement may be determined based on IMU data or GPS data (as the other data) that is received from an IMU or a GPS sensor, respectively, of the system 10. User inaction, on the other hand, may relate the system 10 not receiving user input and/or detecting a non-explicit action. More about user inaction is described herein.
The engine 30 may be configured to store at least some data according to a determined user context of the system 10 and/or according to user behavior. For example, upon determining the user context as the user listening to music while commuting on a train to work, the engine 30 may associate and store volume settings (e.g., adapted volume settings) with respect to the determined context in the historical context and behavior data 33. In another aspect, the engine may store or keep track of user context as one or more characteristics and/or data. For example, upon determining the user context as the user listening to music while commuting, the engine 30 may store characteristics used by the engine to make such a determination, such as location data, environmental noise level, audio content metadata, volume settings, etc., as the historical context and behavior data 33. In one aspect, the engine may group this data based on the determined user context. Thus, the stored historical data may indicate how a user listens to audio content as the user uses the output device by including volume settings and volume adjustments with respect to one or more characteristics.
In another aspect, the engine 30 may store data according to user behavior. For example, the engine may store volume settings which may be user-defined based on user action/inaction (and/or model-defined, as described herein) with respect to user context, e.g., one or more characteristics, such as environmental noise levels and in-ear noise levels that may be measured with respect to the volume settings. In which case, the engine may store volume settings at which the user listens to audio content with respect to the noise levels of the environment in which the user is located while listening to the audio content and in-ear noise levels. In particular, the engine may store volume settings as the user's behavior with respect to various noise levels in which the user is listening to audio content through the output device. The engine 30 may also store other characteristics that may provide insight to the user context, such as the content level of the audio content that may be played back by the output device. In another aspect, the engine may store other information, such as whether the ANC function (and/or which ANC function, such as feedback ANC) is active, whether the ASE function is active, and whether environmental sounds, such as wind noise (and/or the wind noise level) or speech is present within the environment.
In another aspect, the engine 30 may store historical context and behavior data 33 based on changes to the user context and/or changes to one or more characteristics. Continuing with the previous example in which the user is listening to music while commuting on a train, the engine may store data as characteristics, such as environmental noise changes, playback changes, etc. In which case, the engine may store the changes and other changes, such as volume setting changes, which in this case may be volume setting increases based on the environmental noise increasing.
In one aspect, the historical context and behavior data 33 may include a data structure that associates user behavior with user context. The data structure may include a table with one or more rows, where each row may include one or more volume settings associated with a user context. For example, a row may be associated with the user listening to music while commuting on a train, where the row includes one or more volume settings associated with the user listening to music at that location. In another aspect, each row may associate a current (or changed) volume setting with at least some of the received data, such as the environmental noise level, the in-ear noise level, etc., which gives context to the user. In another aspect, the engine may store volume settings associated with changes to one or more characteristics, as described herein. In particular, each volume setting associated with the user context stored in the row may be associated with (e.g., changes to) one or more characteristics. For instance, each volume setting may be associated with a particular environmental noise level. In another aspect, the data 33 may store volume settings of the output device associated with one or more characteristics. As an example, for each environmental noise level, of several noise levels ranging from a low noise level (e.g., 20 dB) to a high noise level (e.g., 100 dB), the data 33 may include a rolling list of one or more volume settings that have been set by the user and/or by the system or model (as described herein) over a period of time when the output device is exposed to that specific environmental noise level. In another aspect, each row of the table may be associated with a (different) ranges of one or more environmental noise levels, where each range may be associated with one or more volume settings that have been set within that range. For instance, a range may include from 45 dB to 55 dB, where the range may include several volume settings of the output device within that 10 dB range. In another aspect, the engine may store other characteristics within the data 33.
The historical context and behavior data 33 may be “historical” in that the engine 30 may be configured to store at least some of the received data, while the output device is in use, for a period of time, such as a day, a week, a month, etc. In which case, the historical data 33 may be personalized for a user of the output device 15. In particular, the historical data may indicate the context and the behavior of a user (or owner) of the output device, as the user uses the device. As described herein, the culmination of this data may indicate user preferences in various contexts. For example, the output device may collect data for a period of time, where the data may indicate the user's behavior in various contexts, such as how loud (e.g., on average) the user listens to audio content in a quiet environment (e.g., associated with a particular environmental noise level), or how loud the user listens to music while commuting on a train. Once a sufficient amount of data is collected, the controller 24 may use the data to train the volume model 32 to perform adaptive volume control operations that may be personalized for the user. Returning to the previous example, the trained model may then adapt volume when a determination is made that the user is commuting on the train. As described herein, the model may use the historical data to determine the volume level, while the user listens to audio content in another quiet environment. More about the volume model is described herein.
In one aspect, the engine may store the historical data 33 periodically. For example, the engine may store the data in memory 27, every second, minute, hour, etc. In which case, the engine may store data that is received at that time. In which case, each time the engine stores data, it may associate the data together, such as storing the data in a common table (data structure) and/or by associating the data with a timestamp. As a result, the historical data may keep track of the context of the output device and the user's behavior at any given time.
In another aspect, the engine 30 may store data based on user behavior. In particular, data may be stored based on a user action (or input), such as when a user changes the volume setting 35 (e.g., in response to receiving a user adjustment to the volume control 19). In which case, at each volume change, the engine may receive at least some data, and store it together in the data 33. As a result, the historical data may indicate the context (e.g., characteristics) associated with a user changing the volume. This may provide the system insight into how the user prefers the volume level when certain characteristics are present, such as the in-ear noise level. In another aspect, once a determination is made to store data, the engine 30 may gather data for a period of time, and store a median or average value based on the gathered data. For example, once a volume setting change occurs, the engine may receive a plurality of environmental noise level estimates within a time window, such as the next five seconds. The engine may calculate a median noise level, and store that median level. In one aspect, when there are multiple consecutive user changes to the volume control, the engine may store data associated or received at the last volume change.
In one aspect, the engine 30 may store data based on user inaction. For example, the engine 30 may store data based on whether one or more characteristics change, while the user does not change the volume setting. In particular, if the user context changes, the engine 30 may begin storing data according to a newly determined user context. In one aspect, if the environmental noise level increases above a threshold, the engine 30 may store data associated with that change, even though the user may have not changed the volume setting. In particular, the engine 30 may wait a threshold period of time (e.g., 10 seconds) after the change in the environmental noise level has exceeded the threshold to determine whether the user has taken an action, such as increasing the volume. If, however, no action is received after the threshold period of time, the engine 30 may store data. This may indicate that the user is satisfied listening to the audio content with a lower SNR. In another aspect, the engine may store data based on user action that is not explicitly received to change the volume setting, such as moving between environments, as described herein.
As described thus far, the engine 30 may store the historical context and behavior data 33 that includes various characteristics, such as environmental noise levels (e.g., in-ear noise levels) with respect to volume settings that have been user-defined. In another aspect, the engine 30 may store other data described herein that indicate different contexts in which the user is using the output device as historical data. As one example, the engine may store (e.g., in one or more associated tables) a volume setting of the output device with respect to the type of audio content that is being played back, the location at which the user is playing back the audio content, and/or other estimated characteristics. Returning to the commuter example, the historical data may include volume settings of the output device for different types of audio content, such as audio content played back by a media player, audio content of a VPA application, etc., that are being played back while the user is riding the train. In another aspect, the engine may store volume settings according to any variation of characteristics as user context. As another example, the engine 30 may store volume settings within a particular environment when particular sounds are captured from within the environment, such as speech. In which case, the engine may store historical data 33 that indicates the volume setting of the output device while the user engages in a conversation with another person. As described herein, this historical data may be used to train the volume model 32 in order to adapt the volume setting of the output device when future contexts (similar or the same to those stored in the historical data) occur.
As described thus far, the historical data stored by the engine provides user context and the user behavior while the user uses the output device. In one aspect, the engine may store, or group data together based on a determined user context and/or user behavior. Specifically, the engine 30 may be configured to store at least some received data in memory 27, according to user context and/or user behavior. For instance, the engine 30 may be configured to determine the context (or conditions) in which the user is wearing or holding the output device (e.g., on the user's head) and/or may indicate the context in which the user is using the output device, such as conditions during which the user is using the output device to playback audio content.
In one aspect, the engine may store data in memory based on the determined context. For example, upon determining that there is traffic noise in a noisy environment, where the environmental noise level may be above a threshold, the engine may store data, along with user behavior (e.g., whether the user increased or decreased the volume setting) in a table in data 33. As another example, upon determining that the user is on a job, based on location data and motion data, the engine may store the data along with the user's listening habits, such as volume settings along a path through which the user is jogging. In one aspect, aggregating context-like data together may be used by the volume model 32 to better estimate an adapted volume setting when the output device encounters a similar situation.
As described thus far, the engine 30 may store received data while the output device is in an on-state. In another aspect, the engine 30 may store received data while the output device is in an “off-state”, such as while the device is stored within the case 38. Again, in the case of the output device being a headset, the device may be in an off-state when not worn on the user's head and/or when the headset is not performing one or more operations, such as playing back audio content. In the case in which the output device 15 is an in-ear headset, the controller 24 may be configured to produce data, as described herein, and store the data into memory as a user wears the in-ear headset.
The volume model 32, which may be stored in memory 27, may be configured to generate a target volume setting, which is adapted to a user context of the output device, based on the historical context and behavior data 33. In particular, the volume model 32 may be trained using at least some of the historical data 33 in order to optimally predict (or estimate) target volume settings based on future user context that may be detected based on received characteristics, such as changes to the environmental noise level within an environment of the output device, changes in listening location, etc. As a result, the historical data allows the volume model to be personalized for the user. In addition to being personalized for the user, the model may also be updated based on changes to user habits. For example, as the user changes listening habits, such as listening to audio content at lower volume settings at different locations, this data may be stored within the historical data, and periodically used to update the volume model. Thus, having the historical data stored in the device allows the volume model to dynamically update efficiently and effectively to changes in user habit and behavior.
In one aspect, the controller may abstain from using the volume model to determine a target volume setting until a sufficient amount of training data has been measured and stored in the historical data. In particular, the controller may determine (or gather) historical data, over a period of time, that indicates user behavior as user adjustments to the volume setting of the volume control 19 in one or more user contexts (e.g., based on environmental noise levels), while audio data is played back through the speaker, and store the data on the output device. In one aspect, the controller may gather and store data until a sufficient amount of samples of characteristics are measured and stored. For example, the controller may gather a specific amount (e.g., threshold) of samples of at least some characteristics, such as environmental noise levels with respect to user-defined volume settings. In one aspect, the threshold may be between 50 samples and 200 samples. In another aspect, the threshold may be preferably 100 samples. Upon gathering a sufficient amount of training data, the controller may generate (or train) the volume model using the data, thereby personalizing the model for the user, as described herein.
The volume model 32 may be trained to adapt the volume setting 35 based on changes to one or more (e.g., environmental and/or playback) characteristics and/or changes to determined user context or behavior. For instance, the estimator 31 may detect a change to the environmental noise level (e.g., the in-ear noise level exceeding a threshold) based on one or more microphone signals, as described herein. This change may be a most recent in-ear noise level estimate. Upon detecting the change, the controller 24 may provide the last estimated characteristics to the volume model, such as the in-ear noise level, and the content level of the audio content that is being played back, and/or the last (or most recent) volume setting of the output device which in response may produce a target the volume setting as output. For example, the characteristics may be input into the model, which in response produces an adapted volume setting. In one aspect, the content level may be an average content level of audio content (e.g., a LKFS value) over at least a portion of audio content. In one aspect, the model may include other characteristics as input, which may be used by the model to produce a target volume setting as output.
As described herein, the model may be trained based on various contexts, such that the model may generate an optimal target volume setting once the output device is in a future context. Returning to a previous example, the volume model 32 may be trained using characteristics associated with the commuter on the train. Once trained, the volume model may generate a target volume setting that adapts the volume level of the output device for when the user of the output device 15 rides the train, or is in a similar context, such as riding any train. In which case, the volume model 32 may adapt volume settings upon detecting changes to one or more characteristics associated with the user context, such detecting that the user is no longer on the commuter train (based on location data).
In one aspect, the volume model may be configured to generate one or more volume curves 34 based on the data 33 and store the curves in memory 27, and may use the curves to determine a target volume setting. Specifically, the model may be trained using the data 33 to output the volume curve(s) 34, each of which may be a function of volume setting with respect to user context as a at least one characteristic. In particular, each volume curve may correspond to a user context (one or more characteristics), such as commuting on a train, in which the user uses the headset and user behavior with respect to a volume control while the user uses the headset. In one aspect, user context may be various (one or more) environmental noise levels (e.g., with respect to a user context, such as commuting on a train). In which case, the volume model may be configured to retrieve, from memory 27 of the output device 15, historical data 33 that may indicate user behavior of at least some past user adjustments to the volume control (e.g., that may be collected by the output device over a period of time) with respect to various environmental noise levels, such as over a past period of time, and may determine a target volume setting as output of a machine learning (ML) model responsive to the historical data as input. For instance, the ML model may generate a volume curve based on the input that may indicates one or more volume settings with respect to various noise levels using the historical data, and may use the generated volume curve to determine (or select) a volume setting, as described herein.
The volume model 32 may use the generated volume curves to determine the target volume setting. For instance, as described herein, the data 33 may include a data structure that associates one or more environmental noise levels with one or more historic volume settings as a volume curve. The model may retrieve one of the volume settings that may be associated with the changed environmental noise level. In particular, the volume model may retrieve a target volume setting that corresponds to the changed environmental noise level along a retrieved volume curve 34. As a result, the volume model may retrieve a target volume setting from the curve that is associated with an environmental noise level, which may be a current (or last) estimated environmental noise level by the estimator 31.
In one aspect, each of the volume curves may be generated by the volume model 32 to account for user behavior in various user contexts of the output device. Specifically, the volume model may be configured to generate different volume curves based on user behavior associated with environmental and/or playback characteristics stored within the historical data 33. For example, the volume model 32 may generate a different volume curve for different types of audio content, such as having a media curve for audio content played back by a media player application, a telephony curve of audio content during a call (e.g., a voice-only call, a voice over Internet protocol (VoIP) call, etc.) and having a VPA curve for playing back notifications of a VPA application. In which case, the volume model may retrieve data 33 from the memory 27 that may be associated with the different types of audio content and user behavior, such as various previous volume settings by the user while the user listened to the audio content and a measured environmental noise level associated with the previous volume settings.
In another aspect, the volume curves may be more granular with respect to taking into account user behavior/context. For instance, the volume model may generate a volume curve for adapting the volume setting while the user plays back a particular movie soundtrack. In which case, the movie soundtrack volume curve may be a function of volume setting with respect to environmental noise level, which may account for user volume settings while listening to a movie soundtrack in environments in various noise levels. As another example, along with different volume curves for different types of audio content, the volume model may generate volume curves based on the location of the user while listening to the movie soundtrack. Returning to the commuter example, the volume model may use historical context and behavior data 33 of the user while the user rode the train (e.g., the volume setting of the output device, the location of the user, and the environmental noise level), and may produce a volume curve to define the volume setting of the output device for listening to the movie soundtrack, as the user rides the train to work.
In one aspect, the volume model 32 may be configured to generate one or more curve parameters in order to produce the volume curves 34. In particular, the volume model may use historical data 33, which may indicate past (user-defined) volume settings with respect to environmental noise levels, for example, to determine the curve parameters. Curve parameters may include volume setting thresholds that bound a volume curve within a volume setting range. The thresholds may include a maximum threshold that may be a maximum volume setting and a minimum threshold that may be a minimum volume setting to which the volume control may be adapted. In one aspect, the model may generate the max/min thresholds based on the highest and lowest, respectively, volume setting according to user context. For instance, one or both thresholds may be the highest and lowest volume setting set by the user of the output device across the one or more environmental noise levels. In another aspect, the thresholds may be determined based on a median volume setting across one or more environmental noise levels. In some aspects, a threshold may be determined based on an average of one or more volume settings across one or more environmental noise levels. In another aspect, the max/min thresholds may be determined based on other context, such as environmental noise levels for one or more locations, the content level of the audio content, etc.
The volume model may generate an average SNR of the audio content as one or more slopes of a volume curve. A slope of the curve may indicate how the volume setting changes with respect to changes to characteristics, such as the environmental noise level. To generate a slope, the volume model may average volume settings associated with one or more neighboring environmental noise levels. In one aspect, the slope may be defined using a portion of the historical data. For example, the volume model may generate a minimum volume setting based on historical volume settings across one or more low environmental noise levels, and may generate a maximum volume setting based on historical volume settings across one or more high environmental noise levels. To generate the slope, the volume model may average historical volume settings across one or more environmental noise levels that are between the low and high environmental noise levels.
In another aspect, the volume model may determine the adapted volume setting based on user context of the output device received from the engine 30. For example, the engine 30 may be configured to determine the user context (e.g., a current user context) of the output device in which the user is currently using the output device, such as a content type of audio content that is being played back by the output device. As another example, the engine 30 may determine a more specific user context, such as the user listening to music on a train while commuting based on one or more characteristics and/or other data, as described herein. The volume model 32 may receive current user context and/or user behavior from the engine 30, and may determine an adapted volume setting based on the current user context and/or user behavior. As an example, data from the engine 30 may indicate that the audio content that is being played back is of a particular type and that the user is at a particular location. Using this data, the volume model 32 may select a previously generated volume curve 34 that may be associated with that user context. In particular, the selected volume curve may have been generated by the ML model to indicate the volume setting with respect to one or more characteristics, such as environmental noise, as the user plays back the type of audio content at the particular location. The volume model may determine, using the selected curve, the volume setting associated with at least one characteristic, such as a current environmental noise level.
In another aspect, upon detecting a change in the environmental noise level, the volume model may be configured to retrieve the current content level (which may be an average content level over at least a portion) of the audio content that is being played back and the environmental noise level, and may determine whether a target volume setting is to be generated based on the levels. In particular, the volume model may determine whether the change to the environmental noise level is significant enough to be perceivable to the user, which may mask at least a portion of the audio content. For example, if the content level is high (e.g., above a threshold), then a small change to the environmental noise level (e.g., below the threshold) may not be perceivable to the user. As a result, the volume model may not adapt the volume setting based on the change to the environmental noise level. In one aspect, this determination is based on whether the content level is greater than the environmental noise level. In some aspects, the volume model may provide this result to the engine 30 in order to save it as historical data 33. In another aspect, the volume model may determine whether to adapt the volume setting based on whether the content level of the audio content is above a threshold, such as being above a maximum threshold of a volume curve. If so, the volume model may not generate a target volume setting that is greater than the content level, since doing so would exceed the maximum threshold.
In some aspects, the volume model 32 may use one or more volume curves 34 (or more specifically one or more curve parameters) and historical data to generate a target volume setting. For instance, the volume model may retrieve a slope of a previously generated volume curve, which may have been generated based on the current context of the output device, the (current) content level of the audio content, the last (or current) volume setting of the output device, and the last environmental noise level measured by the output device as input for generating the target volume setting. For example, the volume model may use the input to generate a volume curve, from which the volume model may select a target volume setting that is associated with the last environmental noise level, as described herein.
In one aspect, the controller 24 may be configured to store the target volume setting, the generated volume curve, and/or associated characteristics in the historical data 33. For example, the controller may store the changed environmental noise level, the target volume setting, any associated characteristics that indicate the context of the output device in the historical data, and/or generated volume curves (or curve parameters).
The volume setting 35 may be configured to receive the target volume setting from the volume model 32, and provide the target volume setting along with the current volume setting, e.g., the volume setting at which the volume control is set before the controller 24 adapts the volume, as described herein, to the ramp 36. In addition, the volume setting 35 may provide the target volume setting to the context & behavior engine 30, in order for the engine to store the generated target volume setting, along with one or more characteristics (and/or user context) used by the volume model 32 to generate the setting in historical data 33. Such data may then be used, along with the already stored data, to update the model, as described herein.
The ramp 36 may be configured to receive the last volume setting and the target volume setting, and may be configured to ramp up or down the scalar gain 37, such that the gain that may be applied to an input audio signal either increases or decreases over a period of time from the last volume setting to the target volume setting. As a result, the playback of the audio content may be adjusted by transitioning the last volume setting to the target volume setting. In particular, the applied gain 37 to the audio signal either increases or decreases a signal level of the audio signal over a period of time from a starting signal level to an ending signal level. Specifically, the controller may apply, without user intervention (e.g., adjusting a volume control), one or more gain to an audio signal as a function of time to ramp up or ramp down sound output according to (resulting in) the target volume setting. As an example, when the volume level associated with the last volume setting may be −20 dB, and the volume level of the target volume setting may be −10 dB, the ramp 36 may ramp down the attenuation of (e.g., increasing) the applied scalar gain 37 on the audio signal by 10 dB over a (e.g., predefined) period of time (e.g., one second, two seconds, three seconds, etc.) from a current volume setting to a target. In which case, the gain may be applied as a function of time, as opposed to adjusting the gain stepwise, which may be the case when the user changes the volume setting by adjusting the volume control. The ramping of gain may be better than adjusting the gain stepwise, since the adapted volume control may be performed without user intervention. As a result, the volume level may be increased gradually to provide a better user experience, as opposed to an abrupt volume change that would otherwise be the case when the gain is increased in a stepwise fashion.
In one aspect, the period of time may be a ramp rate at which the ramp either ramps up or down the scalar gain 37. In one aspect, the ramp rate may be predefined. In which case, the ramp 36 may retrieve the ramp rate from memory. In one aspect, the ramp may select a ramp rate from several ramp rates stored in memory, based on various criteria. For example, the ramp 36 may be configured to determine the ramp rate to increase or decrease the gain based on the user context or the user behavior, as described herein. For instance, the memory may include different ramp rates based on the different context at which the output device is playing back audio content. For example, memory 27 may include a ramp rate associated with media content, while having a different ramp rate for VPA audio content.
In one aspect, the ramp rate may be adjustable or updatable by the controller 24. As described herein, the ramp rate may be associated with the amount of time it takes to increase or decrease the volume from an initial volume setting to a new volume setting. This rate may be based on user adjustments to the volume control 19. In particular, the controller may adjust, or update ramp rates based on historical data 33 that indicates user adjustments to the volume control. The controller may monitor how fast and/or how often a user adjustment to the volume control is received. For example, the controller may monitor how fast the user adjusts the volume setting by measuring the period of time from an initial volume setting to a final user-adjusted volume setting. The controller may base the ramp rate on this period of time. In another aspect, the controller may be configured to update the ramp rate based on whether a user adjustment of the volume control 16 is received while the volume setting is being ramped up or ramped down. More about adjusting the ramp rate is described herein.
As described herein, the controller may be configured to adapt the volume setting using the volume ML model 32, which may be configured to decide whether or not to adapt the volume setting based on various criteria. In one aspect, the ML model may be any type of machine learning model, such as a convolution neural network, which may have an objective function for updating the volume setting of the output device with a minimal amount of user intervention (e.g., with minimal or no user adjustments to the volume control 19).
In one aspect, the controller 24 may be configured to update the ML model 32 based on new (or newer) data that may be stored within the historical data 33. For example, the controller may be configured to continuously (or periodically) store new historical data in memory 27, and may be configured to update the volume model using the historical data 33 that may have old data and new data as training data. For instance, as described herein, the context & behavior engine 30 may receive the target volume setting, and store the setting with one or more characteristics in the data 33, as described herein. The controller may be configured to update the volume model 32 based on the updated data 33. As a result, the volume model 32 may be constantly updated based on changes to user context and/or user behavior in order to maximize its objective function. As described herein, one or more volume curves 34 may be updated when the volume model is updated.
In one aspect, the volume model 32 may include a reinforcement learning or adaptive framework that may update the volume model based on user action or inaction, while the ML model 32 performs adaptive volume control operations. As described herein, the model's objective function may be to minimize user intervention (or actions) based on changes to the environment, such as varying environmental noise levels. As a result, the volume model 32 may be configured to monitor the volume setting 35 (e.g., for a period of time) after adapting the volume setting to determine whether a user adjustment to the volume control occurs, which may indicate that the adapted volume setting is an undesirable setting for the user. Upon determining that the user has adjusted the volume control, the volume model may be configured to be updated in order to take into account the user's adjustment. More about updating the volume model is described herein.
In one aspect, the output device 15 may provide data to the source device 14, in order for both devices to reflect the most up-to-date volume setting. This may ensure that both devices reflect the same volume setting so that if the user adjusts the volume setting at volume control 16 of the source device, the change is accurately detected by the controller 24. This may ensure that changed volume settings are accurately depicted in the historical data 33. Thus, the output device may be configured to transmit (e.g., over a wireless connection, such as a BLUETOOTH connection) the target volume setting (from the volume setting 35) to the controller 20, which may store the setting in memory. In addition, the output device may transmit the ramp rate at which the controller 24 is to ramp up or down the volume level. In addition, the source device 14 may transmit a control signal (e.g., over the BLUETOOTH connection) that indicates a user adjustment to the volume of the output device has been received through the volume control 16. In response, the volume setting 35 may adjust the volume level based on the user-adjustment. In which case, the controller 24 may perform a stepwise adjustment from a previously set volume setting. In addition, the volume setting 35 may provide the user adjustment to the engine 30 to be stored, along with one or more characteristics, in the historical data, as described herein.
In another aspect, the source device 14 may be configured to provide (or stream) the audio content to the output device 15. In which case, the input audio source 132 may be a part of the source device.
In one aspect, the curve may be any type of non-linear curve, which may include multiple curve parameters, such as having multiple slopes. In another aspect, the volume curve 40 may be a linear curve. In another aspect, the generated volume curve 40 may be a function of volume setting with respect to one or more characteristics determined by the controller 24.
In one aspect, the curve represents various volume settings for the output device based on different noise levels. For example, the curve shows the last volume setting 47, which is 0.2 at a noise level of approximately 50 dB, and shows a target volume setting 41 of approximately 0.5 at a noise level of 60 dB. In this case, the volume model 32 may have received an indication that the noise level has increased by 10 dB, which may be due to the user moving from a quiet location to a louder location, and then based on the new noise level may select the volume setting 41 of the curve 40 that is associated with 60 dB.
Turning to
The controller 24 (optionally) determines one or more playback characteristics of the output device 15 (at block 52). In particular, the controller may determine, while audio content is being played back, one or more characteristics, such as a content level, (e.g., a loudness value) of the audio content. In addition, the controller 24 may determine a current (or last) volume setting of the output device. In another aspect, the controller may determine other characteristics, such as a type of audio content that is being (or to be) played back, such as whether the audio content is media audio content. In this case, the characteristics may indicate descriptive information of the media audio content. For example, the descriptive information may indicate what type of media, such as a movie sound track or a musical composition. When the type is a musical composition, the information may include the title of the musical composition, the genre, etc. In another aspect, the controller may determine how the user is listening to the audio content, such as a location of the user, a time of day, a destination of the user, etc., as described herein.
The controller 24 receives one or more microphone signals captured by one or more microphones (at block 53). For instance, the controller may receive a microphone signal that includes ambient noise of an acoustic environment from the reference microphone 28 of the output device and/or a microphone signal that includes sound from within (or near) the user's ear from the error microphone 29 of the output device.
The controller 24 determines one or more environmental characteristics of the acoustic environment based on the one or more microphone signals (at block 54). For example, the controller may determine (or detect) the environmental noise level within the ambient environment. In one aspect, the noise level may include the in-ear noise level. In which case, the controller may user at least a portion of a reference microphone signal captured by the reference microphone to determine the in-ear noise level. For instance, the controller may determine the in-ear noise level by taking into account the audio processing mode in which the output device is operating to determine a level of attenuation of the output device, and then may produce the in-ear noise level (e.g., as a dBA value) based on the reference microphone and the mode. In another aspect, the controller may determine the in-ear noise level based on the error microphone signal.
In another aspect, the controller 24 may determine other environmental characteristics. For example, the controller 24 may use the reference microphone signal to determine whether there are certain sounds of the environment that may have been captured by the microphones, such as speech of a person. The controller 24 may also determine other information using other data, such as determining the environment in which the output device is located using data, such as image data captured by one or more cameras.
The controller 24 determines whether the volume setting should be adapted to the environment (at decision block 55). In particular, the controller 24 may determine whether there is a change (e.g., above a threshold) to one or more environmental characteristics, such as a measured environmental noise level of ambient noise within the environment captured by one or more microphones. For example, the controller may determine whether a measured noise level within the acoustic environment has increased or decreased from a previous measurement. If so, the controller determines, using the volume model 32, a target volume setting (at bock 56). In particular, the controller may retrieve a target volume setting by using the volume model 32 to produce the target volume setting based on the (e.g., changed) environmental noise level and/or one or more determined characteristics. For instance, the volume model 32 may generate a target volume setting based on one or more characteristics as input into the model. For example, the controller may input the latest (e.g., changed) environmental noise level (e.g., in-ear noise level), the (average) content level of the audio content, the last (or more recent) volume setting into the model, and/or other characteristics or curve parameters, such as a slope of a previously generated volume curve, and in response the volume model may generate the target volume setting based on the input. Thus, a new volume setting to adapt to the environmental noise level and/or characteristics may be output of the volume model. In one aspect, to determine the target volume setting, the volume model may retrieve a volume curve 34 from memory 27 and may determine the target volume setting that may be associated with the latest environmental characteristic, as described herein.
In another aspect, the volume model may determine the target volume setting based on other inputs, such as other environmental and/or playback characteristics. As described herein, the additional input data may allow the volume model to provide a more specialized volume estimate for a particular context of the output device.
The controller 24 adjusts playback of the audio content by transitioning the (last) volume setting to the target volume setting (at block 57). Specifically, the controller adapts the existing volume setting to the target volume setting. As described herein, the controller 24 may ramp up or ramp down the last volume setting to the target volume setting. In particular, the controller 24 may transition, without user intervention, from the last volume setting to the target volume setting such that the output volume level of the audio content is either ramped up or ramped down to a new output volume level associated with the new volume setting over a period of time. More about ramping is described herein. The controller stores the characteristics and/or adapted volume setting as historical data in memory of the output device (at block 58). For instance, the controller may store the last volume setting, the target volume setting, and the environmental noise level, and other data/characteristics that were used (or not used) to determine the target volume setting into data 33. As described herein, this stored information may be used to update the volume model 32.
Returning to decision block 55, if, however, it is determined that the volume setting is not to be adapted, the controller 24 continues to playback the audio content at the volume setting (at block 59). For instance, if the environmental noise level has not changed greater than a threshold, such as the content level of the audio content, such as the user may have moved between environments with similar noise levels, the system determines that the best course of action is to maintain the current volume level. In this case, the controller may store the determined characteristics and the current (or non-adapted) volume setting in memory of the output device at block 58.
As described herein, the controller may determine a target volume setting based on the context in which the user is using the output device. For example, the volume model may generate a target volume setting for a particular type of audio content, such as a musical composition, while the user is commuting on a train. In particular, the volume model may be trained to generate a volume setting to compensate for changes to the environmental noise level while the user is listening to the audio content on a train. In another aspect, the controller may determine one or more target volume settings based on one or more contexts. In which case, to determine whether the volume setting should be adapted at decision block 55, the controller 24 may determine the context in which the user is using the output device based on the determined characteristics (e.g., location data, time of day, etc.). Upon determining the context, the controller 24 may retrieve a corresponding volume model, and determine whether volume should be adapted using the model. For example, the model may output a target volume setting in response to input, such as one or more current characteristics (e.g., a current environmental noise measurement), a current volume setting, etc.
In one aspect, the volume model may determine several target volume settings based on the change to the environmental noise level, each associated with a different context. For example, several contexts may include the user listening to different types of audio content while riding a train. In which case, the volume model may determine a target volume setting for listening to a musical composition, a target volume setting for listening to VPA audio content, and a target volume setting for listening to telephony audio content. Each target volume setting may be an optimal volume setting for a change to the environmental noise with respect to listening to a particular audio content while riding a train. In one aspect, this may allow the controller 24 to adapt volume settings as context changes, which in this case may include the user changing the audio content that is being played back. For example, the controller may determine a first target setting for media content that is being played back, a second target setting for VPA audio content, and a third target setting for another type of media content, where each of the target settings may be associated with a particular environmental noise level and/or other characteristics. At a later time, when the user interacts with the VPA application, which as a result will playback VPA audio content, the controller 24 may adapt the volume setting to the second target volume setting while playing back the VPA. This may allow the output device to adapt the volume setting for changes in the context in which the user is listening to the output device.
As described herein, the output device 15 may transmit data relating to the target volume setting to the source device 14. For example, the output device may transmit the target volume setting and the ramp rate at which the volume level is adjusted. In one aspect, the output device may transmit other data, such as other target volume settings that are generated, as described herein. In another aspect, the output device may transmit determined characteristics to the source device, which may store the characteristics (and/or other data) as historical data on the source device.
In one aspect, the model may be updated by retraining the model with at least some old data (e.g., historical data 33) and/or new data (e.g., characteristics and/or user behavior) in order update model weights (or parameters) according to the new data. In another aspect, one or more volume curves 34 that are stored in memory 27 may be updated when the model is updated. For instance, the model may recreate new volume curves each time new data is used to train the volume model in order to further optimize the curves according to the new data. In which case, the model may use at least some of the curve parameters, such as slopes, max/min thresholds, etc., as input into the model, along with other historical data, in order to update at least one of the curve parameters.
In one aspect, the operations of process 80 may be performed by the controller 24 while audio content is being played back to a user of the output device. As described herein, the volume model 32 may be any type of ML model that may be trained using at least some of the historical data 33 stored in memory to determine a target volume setting based on one or more criteria. In particular, the model may be a reinforcement learning model that may be trained to determine whether to take an action of adapting the volume setting of the volume control based on one or more playback and/or environmental characteristics. Such a model may have an objective function of minimizing user intervention, whereby the model may be updated based on (e.g., positive and/or negative) feedback from the user in order to optimize the objective function. As described herein, feedback may be based on user action or user in-action that may be received while the model performs adaptive volume control operations. Thus, by optimizing the model based on feedback of a user of the output device, the model may be personalized for the user of the output device. For instance, the model may learn user preferences, such as the volume setting, based on the context of the output device (based upon which the model may be trained, as described herein).
The process 80 begins with the controller 24 using the volume model 32 to decide whether a volume setting should be adapted, to the environment for example (at block 81). For instance, the model may receive a last environmental noise level, a last volume setting, and/or may receive one or more other characteristics described herein as input, and may either generate a target volume setting as output or may determine that the last volume setting may be the most optimal volume setting based on the input. As described herein, this decision may be based on the content level of the audio content with respect to the environmental noise level. As another example, this decision may be based on whether a volume curve generated by the model associates the last environmental noise level (e.g., within a threshold) with the last volume setting. In other words, determining that the environmental noise has not changed sufficiently enough to generate a new adapted volume setting.
The controller 24 determines whether the volume setting is to be adapted based on the decision made by the model (at decision block 82). If so, the controller adapts the volume setting of the volume control (at block 83). In particular, the volume setting 35 may be adapted to the generated target volume setting by the model. In which case, when the controller 24 is playing back audio content, it may be adjusted by transitioning the last volume setting to the target volume setting. As described herein, the controller may ramp up or down the volume of the audio content from the last volume setting to the new, target volume setting over a ramp rate. The controller 24 determines whether a user adjustment to the volume control has been received within a period of time from when the volume setting of the volume control was adapted (at decision block 84). In particular, the controller determines whether the target volume setting is an optimal volume setting for the user based on whether a user action of a volume adjustment has been received. In one aspect, this determination may be made from when playback of the audio content was adjusted. For example, the period of time may be from when the controller 24 began to ramp up or down the volume setting to the target volume setting, as described herein.
If so, meaning that a user adjustment at a volume control has been received to adjust the target volume setting, the controller 24 changes the volume setting based on the user adjustment to the volume control (at block 85). For example, the controller may change the volume setting 35 and adjust the gain 37 applied to an audio signal of the audio content according to the changed setting, as described herein. For instance, the controller may perform a stepwise increase of the volume level based on the user adjustment to the volume control. The controller 24 updates the volume model based on the user adjustment (at block 86). In particular, the controller may update the volume mode based on one or more current environmental characteristics and/or playback characteristics and/or based on the user-adjustment, such that, responsive to detecting future changes to environmental and/or playback characteristics, the updated volume model may produce a different target volume setting than would otherwise be produced by the volume model before being updated. For example, if the target volume setting by the model was too high for the user, the user may turn down the volume setting via the volume control. In response, the volume model may be updated to account for the user's reduction of the volume setting, such that when a future target volume setting is generated it may account for the reduction (e.g., by being less than the previously generated target volume setting). As another example, the updated model may include an adjusted (or updated) volume curve (e.g., updated one or more curve parameters) that accounts for (based on) the adjusted volume setting. For example, as described herein, the updated model may adjust the curve slope based on the user-adjustment.
Returning to decision block 84, if, however, no user adjustment has been received (within the period of time), the controller may update the model based on user inaction (at block 87). As a result, the model may update model parameters to indicate that the policy taken by the model, by adapting the volume setting due to the environment and/or environmental changes, is more optimal than otherwise not taking the action of adapting the volume. In particular, determining that the user has not changed the volume setting may indicate that this is a user-desired volume setting, which may be used to reinforce the learning of the volume model.
Returning to decision block 82, if, however, the volume setting is not to be adapted, the controller 24 determines whether a user adjustment to the volume control has been received within a period of time from when the volume model decided not to adapt the volume setting (at decision block 88). In particular, the controller 24 is determining, using the ML model, that the volume setting is not to be updated. This decision may be based on changes (or no changes) to one or more characteristics (as input into the model). At decision block 88, however, the controller 24 is determining whether the decision to not change the volume setting is the most optimal for the user. If a user adjustment is received, indicating that the model should have adapted the volume setting (e.g., a false negative), the controller 24 proceeds to block 85 to change the volume setting based on the user adjustment. For instance, when the volume adjustment is to change the current volume setting from one value (e.g., four) to another value (e.g., six), the controller may adjust the applied gain accordingly. The ML model may be updated in response to the user adjustment being received after a determination by the ML model is made that the volume setting is not to be updated.
If, however, no user adjustment has been received, meaning that the most optimal action was to not change the volume, the controller proceeds to block 87.
As described herein, the model may be updated when user feedback, in the form of user adjustments to the volume control, are received within a period of time. In another aspect, the model may be updated regardless of whether the user adjustment is received within a period of time, as described herein. In one aspect, the model may be updated based on user feedback, so long as characteristics of the system 10 that were used by the volume model to decide whether the volume setting should be adapted remain (approximately) the same. As described in block 81, the model may make this decision based on the last environmental noise level. Once the volume setting is adapted (at block 83), the controller may determine whether a user adjustment is received while the last environmental noise level remains (approximately) the same. If a user adjustment is not received while the environmental noise has not changed, this may be used as positive feedback for the ML model, indicating that it made the most optimal choice, and the ML may be updated according to the positive feedback.
In one aspect, the model may be stored in memory 27 of the output device. For instance, the model may be retrieved (downloaded) from a remote device, such as a remote server, via a network (e.g., the Internet). As another example, the model may be stored in memory by the manufacturer, before leaving the factory. In one aspect, the controller 24 may receive multiple models, each model configured to adapt volume settings, as described herein. In particular, the controller may receive one type of model, such as the reinforcement learning model described herein, and may receive another type of model that may have a different framework than the model 32, such as being a supervised or unsupervised framework. In which case, the controller may be configured to execute both models, using at least some of the historical data 33 to generate an adapted volume setting as output. The controller 24 may monitor user feedback for a period of time to determine which model makes the most optimal decisions. Once that period of time has ended, the controller may select the model that has the highest percentage of correct decisions to be used for adapting the volume setting. For instance, during the period of time, the model 32 (first ML model) may continue to adapt volume settings, while a second ML model operates in the background. If the second ML model is determined to be more accurate, the controller may use the second ML model instead of the first model.
As described thus far, the feedback received by the controller may be “explicit” positive or negative feedback. In particular, the feedback may be explicit in that the model is being updated based on user action or inaction according to a decision being made by the ML model. In another aspect, the controller may use “implicit” feedback that indicates user behavior with respect to user context. In particular, implicit feedback may be user behavior with respect to context, as the user is using the output device 15. For example, implicit feedback may be data gathered and stored on the output device, as described herein, before the volume model may be stored and executed on the output device. As another example, the implicit feedback may be historical data gathered while the volume model is not running, such as while the model is deactivated, which may be based on a device setting of the output device.
The first stage 90 shows the curve 40 that is bounded by a max threshold 42 and a min threshold 43, with a slope 44. This stage shows the curve 40 having the same parameters as shown in
The second stage 91 shows the result of the user-adjustment. In particular, once the user adjustment is received, the ML model identifies it as negative feedback, and performs an update in which the volume curve 40 is adjusted to compensate for the negative feedback. As shown, the max threshold 42 has been reduced from 0.88 to approximately 0.74, and the slope has rotated downward, such as by rotating about a Z-axis by 10°. As result, the new target volume setting 95 is 0.69, which is 0.11 less than the original target volume setting 93.
As shown, the curve 40 in the second stage 91 does not pass through the user-adjusted volume setting 94, but instead is between the target volume setting 93 and the setting 94. This may be due to the model taking into account historical data 33 that indicates that the user normally has a higher volume setting at this noise level. In another aspect, the volume curve may pass through the user-adjusted volume setting.
As described thus far, the adaptive volume control operations may be performed by the controller 24 of the output device. In one aspect, at least some of the operations may be performed (e.g., by the controller 20 of) the source device 14. For example, the source device may be configured to apply the volume adjustment (e.g., via an application of one or more scalar gains). This may be the case when the source device is streaming the audio content to the output device for playback. In which case, the output device may be configured to provide the adapted volume setting and the ramp rate to the source device, which may be configured to perform the ramping operations and the gain application upon one or more audio signals, as described with respect to ramp 36 and gain 37. The source device may transmit the one or more gain-adjusted audio signals to the output device, which may be used to drive the speaker 26. An example of the source device performing at least some of these operations is described with respect to
As shown, the controller 20 of the source device 14 may include a volume setting 96, a ramp 97, and a scalar gain 98. In one aspect, the controller may include other operational blocks, described herein. For example, the controller 20 may include the input audio source 132. As shown, the source device 14 receives one or more target volume settings from the output device 15 and receives a ramp rate from the output device 15. In one aspect, the source device may receive this data once the output device performs contextual adaptive volume control operations to generate the ramp rate and target volume(s), responsive to a change in the environment, as described herein. For instance, the output device may determine the ramp rate (e.g., retrieving the rate from memory), and transmit the ramp rate and the target volume setting to the source device 14 in order for the source device to ramp up or down the output volume level of the audio content to a target output volume level of the target volume setting and to transmit the audio content, while being ramped up or down back to the output device for playback.
As described, the source device 14 may receive one or more target volume settings. For example, the source device may receive different target volume settings based on different contexts, such as receiving different target volume settings for different types of audio content. In particular, the volume model 32 of the output device 15 may generate several target volume settings (e.g., in response to detecting a change in the environmental noise level) at least in part based on a context in which the user is using the headset, such as how the user is using the headset (e.g., listening to audio content) or environmental characteristics, as described herein. In particular, the source device may receive a target volume setting associated with the audio content that is currently being played back and other target volume settings for other types of audio content. As described herein, this may ensure that the most optimal target volume setting may be applied as the context changes (e.g., as the audio content changes from media content to telephony audio content).
In one aspect, the controller 20 may be configured to select one of the received target volume settings based on the context in which the output device 15 and/or the source device 14 is being used by the user. For example, when the received target volume settings are for different types of audio content, the controller 20 may be configured to determine the content type of the audio content from the input audio source 132 and may select a target volume setting of the received volume settings associated with that type. In one aspect, the controller 20 may store the target volume settings for later use, as described herein.
The volume setting 96 receives the target volume setting, which then provides the last volume setting and the target volume setting to the ramp 97. The ramp 97 receives the ramp rate (from the output device 15), and ramps up or down the scalar gain 98 that is applied to the audio content received from the input audio source 132 according to the ramp rate from a last gain that was applied according to the last volume setting to a new gain that is associated with the target volume setting. The source device 14 transmits the gain-adjusted audio content back to the output device for playback. In one aspect, the source device may transmit the audio content as its volume level is being ramped up or down by the scalar gain 98. For example, when the target volume setting is higher than the last volume setting, the audio content may be transmitted as the volume level of the audio content is being increased. As a result, the user of the output device may perceive the volume level of the audio content being played back by the output device increasing over the period of time associated with the ramp rate to (or ending at) the target volume level associated with the target volume setting.
In one aspect, the controller 20 may switch target volume settings based on changes to user context. For instance, the source device 14 may receive user input, indicating a user-desire to playback different audio content. For example, the user may interact with a user interface (UI) displayed on the display 131 for the system 10 to playback a different type of audio content. As an example, the user may initiate a telephone call through a telephony application executing on the source device. As a result, the controller 20 may receive a downlink audio signal of the call. Thus, responsive to the user input, the controller 20 may receive another audio signal that includes telephony audio content, instead of media content, which may have been playing back before. The controller 20 may determine that the different audio content is of a different type, telephony instead of media, and may select a target volume setting from the received target volume settings that is associated with the different type. The source device may produce a gain-adjusted downlink audio signal by applying a gain based on the target volume setting, and may transmit the signal instead of the previous audio signal. As a result, the system may optimize the volume output to the user's preference, for different types of audio content.
As described in
In one aspect, the source device 14 may transmit data to the output device 15 indicating that the ramping has been paused. For example, the source device may transmit a message to the output device that includes the current volume setting associated with a current applied gain, where the ramping stopped, to the output device and other data. In one aspect, the current volume setting may be the volume setting at which the ramping was paused, or may be the user-adjusted volume setting, as described herein. The source device may transmit a ramp cancellation reason to the output device. Returning to the previous example, the source device may transmit a message indicating that the ramping up or ramping down of the volume output level has been canceled based on a user adjustment (e.g., to increase or decrease the volume level) to the volume control 16. In one aspect, the output device may use this data to update the volume model 32, as described herein. For instance, the volume model 32 may update one or more curve parameters, such as a curve slope of which the volume model used to generate the target volume setting. For example, the angle of the slope may increase, meaning that the updated model may predict a higher volume setting for a detected environmental noise level, than the pre-updated model for the same detected level.
As described herein, the system 10 may perform contextual adaptive volume control operations responsive to receiving one or more user adjustments to an input device, such as a volume control. Unlike the adaptive volume control operations described thus far, the system may perform the operations responsive to user input, as opposed to performing the operations without user intervention (e.g., without receiving user input). This may have the benefit of allowing the user to maintain a consistent volume level, until the user wishes for the volume to change. In addition, the system may adapt the volume setting in response to minimal user intervention, such as a single adjustment of a volume control by a user. This has several advantages. First, the system 10 may be configured to determine (predict) a user's desired volume setting, without the user needing to provide multiple volume adjustments. Second, such a system may reduce the need for a complex user interface, such as a touch-sensitive display screen that may be arranged to display a user interface for adjusting the volume, such as a slider. Instead, an electronic device, such as an in-ear headset that may include one or more input devices, such as physical buttons, may perform the volume adaptations responsive to receiving a single user input via one of the input devices.
In one aspect, the contextual volume control operations described herein may be performed while the output device 15 plays back audio content. For example, the controller 24 may receive audio content, as one or more audio signals, and may playback the audio content through the speaker 26. In particular, the controller may playback the audio content at a volume level associated with a current volume setting of the volume control 19. For instance, the current volume setting may indicate a volume level, according to which the scalar gain 37 may be set, such that when applied to the audio content it is at the volume level (e.g., the content level of the audio content is adjusted according to the volume level). In one aspect, the current volume setting may have been set by a user adjustment to the volume control 19 or may have been a target volume setting by the volume model 32, as described herein.
As described herein, the volume control 19 may include a series of sequential volume settings, where each volume setting defines a different volume level of the output device. In one aspect, each volume setting may be incremental to an adjacent volume setting by a value. In particular, each pair of adjacent volume settings may be separated by the value, such as a value of one. For example, when the series of settings ranges from zero to ten, the volume control may include eleven volume settings, each setting being a different integer. Each volume setting may define a different volume level, whereby each incremental volume setting may increase the volume level by a given amount (e.g., each incremental volume setting increasing the volume level by 10 dB).
The volume control 19 may be configured to receive a single user adjustment to change a current volume setting of the output device. For instance, when the volume control is a physical button (e.g., a volume-up button or a volume-down button), the volume control may receive a single user input to press down the button and then release the button. In the case of a volume-up button, when pressed once, the volume setting may increase by one volume setting, from a volume setting of four to five, for example. In another aspect, the volume control may be a software application that may be executing on the output device. For example, the volume control may be provided through a VPA application, where the user may control the volume through a voice command that is received, via the microphone 28, by the VPA application. In which case, the voice command may include an audible command, such as “Volume up,” which may indicate that the controller is to increase the volume setting incrementally. As a result, the user-set volume setting 135 may change by one volume setting. For example, when the volume control includes a series of incremental volume settings, the user adjustment may change the user-set volume setting 135 by one (e.g., increasing the setting by one value).
In response, the user-set volume setting 135 may provide the user-adjusted volume setting to the volume model 32, which may be configured to determine an adapted volume setting based on the user adjustment. In particular, the volume model 32 may perform at least some contextual volume control operations, as described herein to determine an adapted volume setting. For example, the volume model 32 may determine an adapted volume setting, responsive to the reception of the single user adjustment to the volume control 19 (e.g., a current volume setting) and/or based on user context (e.g., one or more characteristics). In particular, the volume setting may be adapted based on at least one of 1) a noise level of the ambient environment captured by a microphone, which may be produced by the estimator 31, as described herein, 2) a content level (e.g., a signal level) of the audio content that is to be played back at the adapted volume setting, and 3) historical data 33 that indicates past user adjustments (volume settings) to the volume control (e.g., with respect to environmental noise levels, while the headset has played back the same or different audio content).
In one aspect, the controller may retrieve historical data 33 that indicates past volume settings of the volume control with respect to one or more user contexts in which the user has used the output device. For example, the historical data 33 may indicate that at this noise level the user normally (e.g., on average) listens to a higher volume setting than the volume setting that was set by the received single user adjustment. In which case, to determine a target volume setting, the context engine 30 may determine a context in which the user is currently using the output device. As described herein, to determine context the estimator may determine one or more characteristics of the output device (e.g., based on sensor data received from one or more sensors 45) and/or based on other data, such as noise level, where the context may be determined based on the characteristics. The volume model may retrieve historical data with respect to that context and may produce the target volume setting, as described herein. As a result, the controller may set the model-set volume setting 134 according to the output of the volume model 32. As described herein, the model-set volume setting 134 may be different than the user-set volume setting 135, based on the context of the output device and based on past user preferences according to the context.
As described herein, the volume model 32 may determine the adapted volume setting based on user context and user behavior according to environmental and/or playback characteristics that may be determined by the estimator 31 and/or stored as the historical data 33. For example, the estimator 31 may determine that the output device 15 is at a particular location and/or that the audio content that is being played back is of a particular type. In response to identifying the context in which the output device is operating, the volume model 32 may adapt the volume setting according to these characteristics with respect to historical data, and responsive to receiving a user adjustment to volume.
In another aspect, when the volume control includes two or more inputs (or input devices), the volume model 32 may take into account which of the inputs received the user-adjustment. In particular, the volume model 32 may take into account whether a user increases or decreases the volume. For example, when the volume control 19 includes a volume-up button and a volume-down button, the volume model 32 may determine an adapted volume setting based on the which of the buttons the user presses. In one aspect, the volume model may determine different adapted volume settings based on whether the user increases or decreases the volume (e.g., based on which button the user presses). For example, the volume model 32 may determine a first volume setting when the user presses the volume-up button that may increase the current volume setting by a threshold (e.g., one or more volume settings), while the model may determine a second volume setting that may be less than the current volume setting by less than the threshold when the user presses the volume-down button. In another aspect, the model may determine the adapted volume setting, agnostic to which input the user selects. In particular, the adapted volume setting may be determined by the model responsive to a volume control receiving a single selection but is not determined based on which volume control is selected. In other words, the adapted volume setting may increase or decrease the current volume setting, regardless of whether the user selects the volume-up button or the volume-down button, because (at least) historical data 33 that may indicate that the user increases the volume given the current context of the output device.
In another aspect, the volume model 32 may be trained to predict an adapted volume setting based on user adjustments to the volume control with respect to the context in which the user is using the output device. For example, while listening to a musical composition in a noisy environment, the user may adjust the volume control multiple times, such as pressing the volume-up button four times and pressing the volume-down button one time. In which case, the historical data stored in memory may include past instances in which previous user adjustments to the volume control have been received, and may be used to train the volume model to predict an adapted volume setting based on the same or similar context, and responsive to receiving a user adjustment to the control. In one aspect, the historical data may include a final volume setting that resulted from several previous user adjustments to the volume control within a period of time, which may be used to train the volume model 32. As a result, if a user were to provide a single selection of the volume control according to similar (or the same) context to which the model may have been trained, the volume model may generate an adapted volume setting to increase the volume setting by a factor of three, which was the result of the previous user adjustments.
The volume model 32 provides the adapted volume setting to the model-set volume setting 134. As described herein, the model-set volume setting may be different than the user-set volume setting 135. In particular, the model-set volume setting may be more or less than the user-set volume setting 135 by more than one volume setting in the sequence of volume settings of the volume control. For example, when each volume setting is a series of incremental volume settings by a value, such as one, the model-set volume setting may be greater than that value from the last (or current) volume setting. As an example, when the last volume setting is two out of ten, and the user-set volume setting increases the last volume setting by one, resulting in a user-set setting of three, the model-set volume setting may be determined to be six based on past user context and user behavior.
In another aspect, the model-set volume setting 134 may be a different volume setting from the sequence of volume settings of the volume control. In one aspect, the series of volume settings of the volume control may be predefined, each setting being associated with a different volume level. Returning to the previous example, the volume control may include eleven volume settings, zero to ten, each volume setting changing the volume level by 10 dB. In this case, the model-set volume setting may be between a pair of adjacent volume settings. For example, when the last volume setting is two, the model-set volume setting may be 5.5, thereby increasing the volume level by 35 dB. The volume control, however, may still increment (or reduce) the volume settings by the defined sequence of volume settings. For instance, if the user presses the volume control in order to increase the volume setting again, the setting may increase to six, since the volume setting of six is the next volume setting from 5.5 of the eleven volume setting values.
Thus, the volume model 32 may be configured to determine an adapted volume setting, and may change the existing setting of the model-set volume setting 134 to the adapted volume setting. The model-set volume setting may adjust the scalar gain 37 according to the volume level associated with the model-set volume setting. In one aspect, the scalar gain may apply a stepwise gain adjustment to the audio content, as opposed to ramping up or ramping down the gain, as described in
The model-set volume setting 134 may provide the adapted volume setting and/or the user-set volume setting 135 may provide the user-adjusted volume setting to various components. For example, both volume settings may be provided to the context & behavior engine 30 in order for the controller 24 to store the settings as historical data 33. In particular, the engine 30 may store these settings based on a determined user context of the output device as historical data 33, such that the data may be used by the volume model 32 at a future time. In one aspect, the controller 24 may use the user-set volume setting 135 and the model-set volume setting 134 to train the volume model, as described herein. In addition, the user-set volume setting 135 may be set to the adapted model-set volume setting, such that if a future user adjustment to the volume control is received, the user-set volume setting 135 may be increased or decreased incrementally along the sequence of volume settings of the volume control, as described herein. In addition, the controller 24 may provide the target volume setting to the source device 14, such that both the source device and the output device may maintain a same current volume setting of the system 10.
The controller 24 receives a single user adjustment to a volume control to change the volume setting (at block 154). For example, the volume control 19 may be a physical button that is arranged to produce a control signal in response to receiving user input (e.g., a selection) that indicates that the user-set volume setting 135 is to increase or decrease the existing volume setting by one volume setting. In this case, the single user adjustment may change the current volume setting by one volume setting, due to the user pressing selecting the volume control once. As a result, the controller may determine, responsive to receiving the single user adjustment, a next volume setting within the series of volume settings of the volume control from the current volume setting. In one aspect, the volume control may not be a dedicated volume control but instead may be any input device that may provide the controller 24 an indication that a volume adjustment is to be performed. For instance, the controller 24 may receive a control signal indicating that the user desires to adjust the volume setting from any input device. As another example, the control signal may be received from a software application, such as a VPA application, that is being executed by the system 10. In another example, the controller may receive an indication to change the volume setting based on sensor data captured by sensors 45. For example, when the output device is an in-ear headset that includes a proximity sensor, the controller may determine that the volume setting is to be changed upon detecting the user's hand being placed close (within a threshold proximity) to the in-ear headset, based on sensor data from the proximity sensor.
The controller 24 determines a target volume setting based on at least one of the noise level, a content level of the audio content, and historical data indicating past user adjustments to the volume control (at block 155). In particular, the target (or second) volume setting may be output of the volume model 32 in response to at least one of the noise level, the content level, and at least some historical data 33 stored in memory 27 as input into the model. In one aspect, the controller may determine the volume setting responsive to receiving the single user adjustment. In one aspect, the target volume setting may be greater or less than the last volume setting by more than one (adjacent) volume setting to the last volume setting in the sequence of volume settings of the volume control. In particular, the target volume setting may increase or decrease by more than one volume setting (e.g., at least two volume settings). In one aspect, the target setting may increase or decrease greater than the single user adjustment would normally adjust the volume setting (e.g., either up or down by one volume setting of the several volume settings associated with the volume control).
As described herein, the volume model 32 may adapt the volume setting responsive to receiving a single user adjustment. In one aspect, when the volume control includes multiple user-adjustable input devices, the model may adapt the volume setting based on which of the input devices is adjusted by the user. For example, when the volume control includes a volume-up button and/or a volume-down button, the model may adapt the volume setting based on which button the user selects. For example, when the single user adjustment (e.g., user selection) is to the volume-up button, the volume model may be configured to generate the adapted volume setting to be greater than the last volume setting. Conversely, if the user selects the volume-down button, the adapted volume setting may be less than the last volume setting. In this case, both volume settings may be the same amount of adjustment from the last volume setting, but one increasing sound output while the other decreasing sound output. For instance, the greater adapted volume setting may be 10 dB more than the last volume setting, whereas the lesser adapted volume setting may be 10 dB less than the last volume setting.
In another aspect, when the volume control may adjust volume up or down, the direction towards which the user adjustment is made may impact the adapted volume setting predicted by the volume model 32. In particular, if the single user adjustment is to increase the volume, the volume model may increase the volume setting more than if the single user adjustment is to decrease the volume. For example, when the user adjustment is to a volume-up button, thereby adjusting the last volume setting to an incremental next volume setting (e.g., from a volume setting of five to six), the adapted volume setting may be greater than six, whereas, when the user adjustment is to a volume-down button, thereby adjusting the last volume setting to a decremental next volume setting (e.g., from the volume setting of five to four), the adapted volume setting may be less than four.
In one aspect, the volume model may determine based on historical user context and behavior data 33, that in an environment in which the output device is located, the user generally (e.g., on average) listens to audio content at a higher-than-average volume setting, such as above a threshold. Thus, if the user adjustment is to increase the volume, the volume model may be configured to determine an adapted volume setting that is higher than average, such as being at or above the threshold. Conversely, if the user adjustment is to turn down the volume, the model may decrease the volume setting slightly (e.g., less than if the user had increased the volume), since although the user has requested that the volume be turned down, that the user would most likely want the volume to not decrease sharply, since the user normally listens to audio content at a higher level in this type of environment.
The controller 24 changes the last volume setting of the volume control to the target volume setting (at block 156). For instance, the model-set volume setting 134 (and the user-set volume setting 135) may both be set to the target volume setting, and the scalar gain 37 may be adjusted such that the applied gain either increases or decreases the volume level according to the target volume setting. As a result, the audio content may be volume-adjusted more than if the controller applied a volume adjustment based on the user adjustment that only increased or decreased the last volume setting by one volume setting. Responsive to determining the new volume setting, the controller drives the speaker of the output device with the audio signal of the audio content at the new volume setting. The controller 24 stores the volume settings in memory, as historical data 33 (at block 157). As described herein, the controller may store this information in the historical data 33, which may be retrieved and used by the volume model 32 for a future prediction of the adapted volume setting. In one aspect, the controller 24 may store other information, as described herein, such as the content level of the audio content and the noise level of the environment in order to provide context to the adaptive volume.
The controller 24 determines whether another user adjustment has been received (at block 158). In particular, the controller may determine whether a user adjustment has been received within a period of time from when the volume setting has been adapted. In another aspect, the additional user adjustment may be received differently than the previously received user adjustment. For example, the previous user adjustment may have been received via the volume control 19 of the output device. The subsequent user adjustment, however, may be received via an VPA application, based on a voice command of the user. As described herein, the volume model 32 may be trained to predict an adapted volume setting based on historical user data 33 in order to provide the most optimal volume setting. In some instances, however, the user may wish to further adjust the volume after the controller 24 has changed the volume level according to the adapted volume setting. In which case, the controller may determine another adapted volume setting that is either greater or less than the last volume setting, based on the additional user adjustment (at block 159).
Receiving a user adjustment quickly (e.g., within the period of time) after the volume had been previously adapted by the volume model 32, may indicate that the user wishes to fine-tune the volume setting. This may be the case since the volume model 32 bases its adapted volume setting on past user behavior. And since past user behavior indicates that the user is most likely to listen to audio content at the adapted volume setting, it may be assumed that a subsequent user adjustment may be an indication of a slight volume adjustment, as opposed to a drastic one. For example, the user may wish to increase the volume of a musical composition that is currently being played back by the output device. In which case, the volume model 32 may be configured to produce an additional adapted volume setting that changes the volume less than the previous adapted volume setting. Thereby, reducing the range (or step size) at which the adapted volume setting may change the volume level. This may satisfy the user's desire to increase the volume level, while keeping the volume at a comfortable level (based on historical data) for the given environment. In another aspect, the volume model may increase or decrease the step size of the adapted volume setting.
In another aspect, the controller 24 may determine how to adapt the volume setting based on changes to one or more characteristics. For example, the controller may determine whether the environment has changed, such as determining whether the noise level within the ambient environment has increased above a threshold since a pervious user adjustment to the volume control has been received. If so, the controller may adapt the volume responsive to the increased noise level. In which case, the volume setting may increase as much or more than a previous adjustment. Otherwise, if the environment has not changed (e.g., the noise level has not increased above a threshold), the controller may adapt the volume setting slightly. In one aspect, the controller may reduce the step-size that was used for a previous adjustment when the environment has not (substantially) changed (e.g., when one or more characteristics have not (substantially) changed since the last adaptation). For example, a previous adapted volume setting may have increased the volume by a factor of four volume settings. The volume model may produce a subsequent adapted volume setting that only increase the volume by a factor of one volume setting, in cases when the environment has not changed in order to allow the user to fine-tune the volume. In another aspect, when no change is detected, the controller 24 may reduce the step below one volume setting in order to further fine-tune the volume level. In this case, the subsequent adapted volume setting may be less than one volume setting, such as a half of a volume setting.
In another aspect, the controller may adjust the volume differently based on different sounds within the environment. For example, the controller may determine whether a noise captured by the reference microphone 28 is speech or an ambient sound. The volume model 32 may be configured to decrease the volume more when speech is present, than if an ambient sound (e.g., a barking dog) were present. In particular, the volume model 32 may produce an adapted volume setting that is less than a last volume setting by more than a first subset of volume settings of the sequential volume settings when noise is an ambient sound, and may produce an adapted volume setting that is less than the last volume setting by a second subset of volume settings that is greater than the first subset of volume settings when the noise is speech. For example, when speech is present, the adapted volume setting may reduce the volume setting by a factor of eight, whereas when an ambient sound is present the adapted volume setting may reduce the volume setting by a factor of six, when the user selects the volume-down button.
Upon determining another adapted volume setting, the controller may return to block 156 to change the volume setting to the new adapted volume setting. In one aspect, the controller may continue to perform at least some of these operations until no user adjustment is received (e.g., no user adjustment is received after a period of time passes when the volume level is adapted).
As described herein, the controller 24 may determine the noise level within the ambient environment captured by the microphone at block 153. In one aspect, the controller 24 may determine other characteristics, as described herein. In determining the target volume setting, the controller 24 may determine (e.g., retrieve) a volume model based on user context associated with the determined one or more characteristics. In particular, the volume model may be trained using historical data that indicates user behavior of past user adjustments to a volume control of the output device with respect to one or more characteristics. In this way, the volume model may be trained according to one or more user contexts associated with the characteristics. The controller 24 may be configured to determine the target volume setting using the determined volume model. In particular, the controller may determine the target volume setting as output of the model responsive to one or more characteristics, such as a change to an environmental noise level as input.
Between T0 and T1, the volume setting of the output device has a value of two. As shown, both the user volume curve and the model volume curve are aligned during this time. This may be due to the volume model 32 having adapted the volume setting to a volume setting that was set by the user. For example, prior to T0, the volume setting of the output device may have been one. The user may have provided a single user adjustment to the volume control to increase the volume. In response, the volume model may have adapted the volume setting of the output device to a volume setting of two, based on historical user data, as described herein.
At T1, the user increases the volume level by one (e.g., by tapping a volume-up button once). Responsive to receiving the user input, the volume model may perform adaptive volume control operations to determine a user-desired volume setting 134. For example, the volume model may take the changed volume setting and other data, such as historical data, as input to determine an adapted volume setting, as described herein. In this case, the model determines that the most optimal setting for the user is a volume setting of six. As a result, the volume setting of the output device is set to six to increase the volume level accordingly, whereas the user volume curve 162 increases only to a volume setting of three, since the user-input only adjusted the volume setting by one.
At T2, however, the volume control 19 of the output device receives user input to increase the volume again. This increases the user volume curve 162 by one step. The volume model again performs adaptive volume control operations, and determines that the new adapted volume goes from a volume setting of six to a volume setting of eight. In one aspect, since the first adapted volume setting at T1 was determined to be the optimal volume setting for the user, the second adapted volume setting at T2 may be a smaller increase (two steps) than the first adapted volume setting (four steps). This may be the case when environmental and/or playback characteristics used by the volume model 32 to predict the adapted volume setting have not changed (substantially) from T1. In another aspect, the controller 24 may perform adaptive volume control operations to determine the new volume setting. In this case, the controller may take into account similar characteristics as a previous performance. In some aspects, the controller may take into account the current volume setting. For instance, if the current volume setting is high, the controller may reduce a stepwise increase/decrease of the volume to avoid reaching and/or exceeding a max/min threshold. In particular, the controller may determine whether the current volume setting is within a max or min threshold, and if so, may reduce the adapted volume setting. In another aspect, the volume model may take into account the current volume settings position with respect to the min/max thresholds. In one aspect, subsequent adaptations to the volume setting from an initial adaptation may change the volume setting by an amount that is equal to or less than the volume change that occurred due to the initial adaptation. In another aspect, subsequent adaptations may change the volume setting less than one or more previous adaptations as the number of adaptations increases. In some aspects, this may be the case if the subsequent user adaptations to the volume controller occur within a period of time from when the initial adaptation occurred. If, however, at least one characteristic changes, such as the noise level, the volume model 32 may start over again (e.g., determining an adapted volume setting that is greater than the one determined at T1).
At T3, the user decides to turn down the volume, such as by selecting a volume-down button. In response to the user increasing the volume twice and now decreasing the volume once, the volume model may determine that the user is attempting to fine-tune the volume setting between the last two volume settings set by the user. Therefore, the volume model 32 may determine that the adapted volume setting may be between volume setting eight and six. In this case, the volume model 32 reduces the volume setting by 0.5, to 7.5. The volume model may reduce the volume setting by this small amount, due to the user having increased the volume several times before.
As a result of the volume model adapting the volume setting, the volume setting of the output device ended at T3 on 7.5, which is in contrast to a volume setting of three, which would have otherwise been set if the volume setting of the output device were set to the number of user adjustments received through the volume control.
As shown in
Memory 176 can be connected to the bus and can include DRAM, a hard disk drive or a flash memory or a magnetic optical drive or magnetic memory or an optical drive or other types of memory systems that maintain data even after power is removed from the system. In one aspect, the processor 177 retrieves computer program instructions stored in a machine-readable storage medium (memory) and executes those instructions to perform operations described herein.
Audio hardware, although not shown, can be coupled to the one or more buses 178 in order to receive audio signals to be processed and output (or played back) by speakers 173. Audio hardware can include digital to analog and/or analog to digital converters. Audio hardware can also include audio amplifiers and filters. The audio hardware can also interface with microphones 172 (e.g., microphone arrays) to receive audio signals (whether analog or digital), digitize them if necessary, and communicate the signals to the bus 178.
The network interface 175 may communicate with one or more remote devices and networks. For example, interface can communicate over known technologies such as Wi-Fi, 3G, 4G, 5G, Bluetooth, ZigBee, or other equivalent technologies. The interface can include wired or wireless transmitters and receivers that can communicate (e.g., receive and transmit data) with networked devices such as servers (e.g., the cloud) and/or other devices such as remote speakers and remote microphones.
It will be appreciated that the aspects disclosed herein can utilize memory that is remote from the system, such as a network storage device which is coupled to the audio processing system through a network interface such as a modem or Ethernet interface. The buses 178 can be connected to each other through various bridges, controllers and/or adapters as is well known in the art. In one aspect, one or more network device(s) can be coupled to the bus 178. The network device(s) can be wired network devices (e.g., Ethernet) or wireless network devices (e.g., WI-FI, Bluetooth). In some aspects, various aspects described herein may be performed by a networked server in communication with one or more devices of the system 10.
Various aspects described herein may be embodied, at least in part, in software. That is, the techniques may be carried out in an audio processing system in response to its processor executing a sequence of instructions contained in a storage medium, such as a non-transitory machine-readable storage medium (e.g., DRAM or flash memory). In various aspects, hardwired circuitry may be used in combination with software instructions to implement the techniques described herein. Thus, the techniques are not limited to any specific combination of hardware circuitry and software, or to any particular source for the instructions executed by the audio processing system.
In the description, certain terminology is used to describe features of various aspects. For example, in certain situations, the terms “enhancer”, “renderer”, “estimator”, “gain”, “controller”, “component,” “unit,” “module,” “logic”, “setting”, “ramp”, “model” “engine”, “compressor” “filter”, “SNR”, “sensitivity”, “generator”, “optimizer”, “processor”, “mixer”, “detector”, “encoder” and “decoder” are representative of hardware and/or software configured to perform one or more processes or functions. For instance, examples of “hardware” include, but are not limited or restricted to an integrated circuit such as a processor (e.g., a digital signal processor, microprocessor, application specific integrated circuit, a micro-controller, etc.). Thus, different combinations of hardware and/or software can be implemented to perform the processes or functions described by the above terms, as understood by one skilled in the art. Of course, the hardware may be alternatively implemented as a finite state machine or even combinatorial logic. An example of “software” includes executable code in the form of an application, an applet, a routine or even a series of instructions. As mentioned above, the software may be stored in any type of machine-readable medium.
Some portions of the preceding detailed descriptions have been presented in terms of algorithms and symbolic representations of operations on data bits within a computer memory. These algorithmic descriptions and representations are the ways used by those skilled in the audio processing arts to most effectively convey the substance of their work to others skilled in the art. An algorithm is here, and generally, conceived to be a self-consistent sequence of operations leading to a desired result. The operations are those requiring physical manipulations of physical quantities. It should be borne in mind, however, that all of these and similar terms are to be associated with the appropriate physical quantities and are merely convenient labels applied to these quantities. Unless specifically stated otherwise as apparent from the above discussion, it is appreciated that throughout the description, discussions utilizing terms such as those set forth in the claims below, refer to the action and processes of an audio processing system, or similar electronic device, that manipulates and transforms data represented as physical (electronic) quantities within the system's registers and memories into other data similarly represented as physical quantities within the system memories or registers or other such information storage, transmission or display devices.
The processes and blocks described herein are not limited to the specific examples described and are not limited to the specific orders used as examples herein. Rather, any of the processing blocks may be re-ordered, combined, or removed, performed in parallel or in serial, as necessary, to achieve the results set forth above. The processing blocks associated with implementing the audio processing system may be performed by one or more programmable processors executing one or more computer programs stored on a non-transitory computer readable storage medium to perform the functions of the system. All or part of the audio processing system may be implemented as special purpose logic circuitry (e.g., an FPGA (field-programmable gate array) and/or an ASIC (application-specific integrated circuit)). All or part of the audio system may be implemented using electronic hardware circuitry that include electronic devices such as, for example, at least one of a processor, a memory, a programmable logic device or a logic gate. Further, processes can be implemented in any combination of hardware devices and software components.
According to another aspect, each volume setting in the sequence of volume settings defines a different volume level of the audio content, where the method performed by the output device further includes, responsive to receiving the single adjustment, determining a third volume setting that is a next volume setting within the sequence of volume settings from the original volume setting. In one aspect, each pair of adjacent volume settings is separated by a value; where the target volume setting is greater than the original volume setting by more than the value responsive to the third volume setting being an incremental next volume setting from the original volume setting, and where the target volume setting is less than the first volume setting by more than the value responsive to the third volume setting being a decremental next volume setting from the original volume setting. In another aspect, the value is one and each of the sequence of volume settings is a different integer that are sequential by one. In some aspects, the method further includes, responsive to determining that the noise level within the ambient environment has not increased above the threshold, determining a third volume setting that is less than the second volume setting. In another aspect, the third volume setting is either more than or less than the first volume setting by the one volume setting. In one aspect, the method is performed by a programmed processor of an electronic device. In some aspects, the electronic device is a portable device.
As previously explained, an aspect of the disclosure may be a non-transitory machine-readable medium (such as microelectronic memory) having stored thereon instructions, which program one or more data processing components (generically referred to here as a “processor”) to perform contextual adaptive volume control operations, digital signal processing operations, volume modeling, characteristic estimation operations, audio compression operations, rendering operations, network operations, and audio signal processing operations, as described herein. In other aspects, some of these operations might be performed by specific hardware components that contain hardwired logic. Those operations might alternatively be performed by any combination of programmed data processing components and fixed hardwired circuit components.
To aid the Patent Office and any readers of any patent issued on this application in interpreting the claims appended hereto, applicants wish to note that they do not intend any of the appended claims or claim elements to invoke 35 U.S.C. 112(f) unless the words “means for” or “step for” are explicitly used in the particular claim.
It is well understood that the use of personally identifiable information should follow privacy policies and practices that are generally recognized as meeting or exceeding industry or governmental requirements for maintaining the privacy of users. In particular, personally identifiable information data should be managed and handled so as to minimize risks of unintentional or unauthorized access or use, and the nature of authorized use should be clearly indicated to users.
While certain aspects have been described and shown in the accompanying drawings, it is to be understood that such aspects are merely illustrative of and not restrictive on the broad disclosure, and that the disclosure is not limited to the specific constructions and arrangements shown and described, since various other modifications may occur to those of ordinary skill in the art. The description is thus to be regarded as illustrative instead of limiting.
In some aspects, this disclosure may include the language, for example, “at least one of [element A] and [element B].” This language may refer to one or more of the elements. For example, “at least one of A and B” may refer to “A,” “B,” or “A and B.” Specifically, “at least one of A and B” may refer to “at least one of A and at least one of B,” or “at least of either A or B.” In some aspects, this disclosure may include the language, for example, “[element A], [element B], and/or [element C].” This language may refer to either of the elements or any combination thereof. For instance, “A, B, and/or C” may refer to “A,” “B,” “C,” “A and B,” “A and C,” “B and C,” or “A, B, and C.”
This application claims the benefit of and priority to U.S. Provisional Patent Application No. 63/499,186, filed Apr. 28, 2023, which is hereby incorporated by this reference in its entirety.
Number | Date | Country | |
---|---|---|---|
63499186 | Apr 2023 | US |