This disclosure generally relates to systems and methods for providing spatialized audio with dynamic head tracking.
All examples and features mentioned below can be combined in any technically possible way.
According to an aspect, a pair of headphones includes: a sensor outputting a sensor signal representative of an orientation of a user's head; and a controller, receiving the sensor signal, the controller programmed to output a spatialized audio signal, based on the sensor signal, to a pair of electroacoustic transducers for transduction into a spatialized acoustic signal, wherein the spatialized acoustic signal is perceived by the user as originating from a virtual soundstage comprising at least one virtual source, each virtual source of the virtual soundstage being perceived as located at a respective position distinct from the location of the electroacoustic transducers and being referenced to an audio frame of the virtual soundstage, the audio frame being disposed a first location aligned with a reference axis of the user's head; wherein the controller is further programmed to determine, from the sensor signal, whether a characteristic of the user's head satisfies at least one predetermined condition, the at least one predetermined condition including whether the orientation of the user's head is outside a predetermined angular bound, wherein, upon determining the characteristic of the user's head does not satisfy the at least one predetermined condition, the controller is programmed to maintain the audio frame at the first location, wherein, upon determining the orientation of the user's head is outside the predetermined angular bound, the controller is programmed to rotate the location of the audio frame about an axis of rotation to reduce an angular offset with the reference axis of the user's head.
In an example, rotating the location of the audio frame comprises rotating the location of the audio frame to align with the reference axis of the user's head as a turn of the user's head comes to an end.
In an example, while the user's head has an increasing angular acceleration, the angular velocity of the rotation of the audio frame is based, at least in part, on an angular velocity of the user's head.
In an example, while the user's head has a decreasing angular acceleration, the angular velocity of the rotation of the audio frame is selected such that audio frame will align with a predicted location of the reference axis of the user's head as the turn of the user's head comes to an end.
In an example, the predicted location of the reference axis of the user's head as the turn of the user's head comes to an end is updated each sample that the user's head has a decreasing angular acceleration.
In an example, the at least one predetermined condition further includes whether an angular jerk of the user's head exceeds a predetermined threshold, wherein the at least one predetermined condition is satisfied if either the orientation of the user's head is outside the predetermined angular bound or the angular jerk of the user's head exceeds the predetermined threshold.
In an example, upon determining the orientation of the user's head is outside the predetermined angular bound, the controller is programmed to rotate the angular bound about the axis of rotation in conjunction with the audio frame, wherein the location of the audio frame is rotated about an axis of rotation to reduce an angular offset with the reference axis of the user's head for a predetermined period of time after the orientation of the user's head is again within the angular bound.
In an example, upon determining the orientation of the user's head is outside the predetermined angular bound, a second angular bound, narrower than the angular bound, is rotated about the axis of rotation in conjunction with the audio frame, wherein the location of the audio frame is rotated about an axis of rotation to reduce an angular offset with the reference axis of the user's head for a predetermined period of time until the orientation of the user's head is within the second angular bound.
In an example, the at least one virtual source comprises a first virtual source and a second virtual source, the first virtual source being disposed in a first location and the second virtual source being disposed in a second location, wherein the first location and the second location are referenced to the audio frame.
In an example, maintaining the audio frame at a first location comprises rotating the audio frame at a rate tailored to eliminate drift of the sensor.
In an example, the sensor outputting the sensor signal comprises a plurality of sensors outputting a plurality of signals.
According to another example, a method for providing spatialized audio includes: outputting a spatialized audio signal, based on a sensor signal representative of an orientation of a user's head, to a pair of electroacoustic transducers for transduction into a spatialized acoustic signal, wherein the spatialized acoustic signal is perceived by the user as originating from a virtual soundstage comprising at least one virtual source, each virtual source of the virtual soundstage being perceived as located at a respective position distinct from the location of the electroacoustic transducers and being referenced to an audio frame of the virtual soundstage, the audio frame being disposed a first location aligned with a reference axis of the user's head; determining, from the sensor signal, whether a characteristic of the user's head satisfies at least one predetermined condition, the at least one predetermined condition including whether the orientation of the user's head is outside a predetermined angular bound, rotating, upon determining the orientation of the user's head is outside the predetermined angular bound, the location of the audio frame about an axis of rotation to reduce an angular offset with the reference axis of the user's head.
In an example, rotating the location of the audio frame comprises rotating the location of the audio frame to align with the reference axis of the user's head as a turn of the user's head comes to an end.
In an example, while the user's head has an increasing angular acceleration, the angular velocity of the rotation of the audio frame is based, at least in part, on an angular velocity of the user's head.
In an example, while the user's head has a decreasing angular acceleration, the angular velocity of the rotation of the audio frame is selected such that audio frame will align with a predicted location of the reference axis of the user's head as the turn of the user's head comes to an end.
In an example, the predicted location of the reference axis of the user's head as the turn of the user's head comes to an end is updated each sample that the user's head has a decreasing angular acceleration.
In an example, the at least one predetermined condition further includes whether an angular jerk of the user's head exceeds a predetermined threshold, wherein the at least one predetermined condition is satisfied if either the orientation of the user's head is outside the predetermined angular bound or the angular jerk of the user's head exceeds the predetermined threshold.
In an example, upon determining the orientation of the user's head is outside the predetermined angular bound, the angular bound is rotated about the axis of rotation in conjunction with the audio frame, wherein the location of the audio frame is rotated about an axis of rotation to reduce an angular offset with the reference axis of the user's head for a predetermined period of time after the orientation of the user's head is within the angular bound.
In an example, upon determining the orientation of the user's head is outside the predetermined angular bound, a second angular bound, narrower than the angular bound, is rotated about the axis of rotation in conjunction with the audio frame, wherein the location of the audio frame is rotated about an axis of rotation to reduce an angular offset with the reference axis of the user's head for a predetermined period of time until the orientation of the user's head is within the second angular bound.
In an example, wherein the at least one virtual source comprises a first virtual source and a second virtual source, the first virtual source being disposed in a first location and the second virtual source being disposed in a second location, wherein the first location and the second location are referenced to the audio frame.
In an example, wherein maintaining the audio frame at a first location comprises rotating the audio frame at a rate tailored to eliminate drift of the sensor.
In an example, wherein the sensor outputting the sensor signal comprises a plurality of sensors outputting a plurality of signals.
The details of one or more implementations are set forth in the accompanying drawings and the description below. Other features, objects, and advantages will be apparent from the description and the drawings, and from the claims.
In the drawings, like reference characters generally refer to the same parts throughout the different views. Also, the drawings are not necessarily to scale, emphasis instead generally being placed upon illustrating the principles of the various aspects.
Current headphones that provide spatialized audio fail to deliver a consistent and pleasant experience for users engaged in activities that include frequent head turning, such as walking, running, cycling, etc. To remedy this, such headphones often provide a “fixed mode” in which the audio is “fixed to the user's head,” meaning that the audio is always rendered in front of the user (i.e., without head tracking). While this effectively resolves the issues encountered with spatialized audio when engaged in these activities, it also fails to deliver truly spatialized audio. Rendering audio in front of the user's head effectively destroys the auditory illusion of spatialized audio, leading to a “collapse” of the externalized audio inside of the user's head, meaning that the user perceives the audio as originating from the speakers.
Accordingly, there exists a need for headphones that can provide spatialized audio, with head tracking, in a manner that remains a consistent and pleasant experience for user's engaged in activities that require frequent head turning.
There is shown in
For the purposes of simplicity and to emphasize the more relevant aspects of headphones 100, certain features of the block diagram of
Controller 110 comprises a processor 116 and a memory 118 storing program code for execution by processor 116 to perform the various functions for providing the spatialized audio as described in this disclosure, including, as appropriate, the steps of method 1000, described below. It should be understood that the processor 116 and memory 118 of controller 110 need not be disposed within same housing, such as part of a dedicated integrated circuit, but can be disposed in separate housings. Further, a controller 110 can include multiple physically distinct memories to store the program code necessary for its functioning and can include multiple processors 116 for executing the program code. The various components of controller 110, further, need not be disposed in the same ear cup (or corollary part in other headphone form factors) but can be distributed between ear cups. For example, each of ear cups 102 and 104 can include a processor and a memory working in concert to perform the various functions for providing the spatialized audio as described in this disclosure, the processor and memory in both ear cups 102, 104 forming the controller.
As described above, sensor 112 generates a sensor signal representative of an orientation of the user's head. In an example, sensor 112 is an inertial measurement unit used for head tracking; however, it should be understood that sensor 112 can be implemented as any sensor suitable for measuring the orientation of the user's head. Further, sensor 112 can comprise multiple sensors acting in concert to generate the sensor signal. Indeed, an inertial measurement unit itself typically includes multiple sensors—e.g., accelerometers, gyroscopes, and/or magnetometers acting in concert to generate the sensor signal. The sensor signal representative of the orientation of the user's head can be a data signal that that represents orientation directly, e.g., as changes in pitch, roll, and yaw, or can contain other data from which orientation can be derived, such as the specific force and angular rate of the user's head. In addition, the sensor signal can itself be comprised of multiple sensor signals, such as where multiple separate sensors are used to measure the orientation of the user's head. In an example, separate inertial measurement units can be respectively disposed in ear cups 102, 104 (or corollary part in other headphones form factors), or any other suitable location, and together form sensor 112 and the signals from the separate inertial measurement units form the sensor signal.
Controller 110 can be configured to generate the audio signal in one or more modes. These modes include, for example, an active noise-reduction mode or a hear-through mode. In addition, controller 110 can produce spatialized audio in a mode that fixes the virtualized soundstage in space (e.g., in front of the user) and does not change the perceived location of the virtualized soundstage in response to the motion of the user's head, or only changes it in response to the user spending a predetermined period of time facing a direction that is greater than a predetermined angular rotation away from the virtual soundstage. For the purposes of this disclosure, this will be referred to as a “room-fixed” mode, referring to the fact that the virtual soundstage is perceived as fixed in place in the room. Additional details regarding the room-fixed mode are described in U.S. patent application Ser. No. 16/592,454 filed Oct. 3, 2019, titled SYSTEMS AND METHODS FOR SOUND SOURCE VIRTUALIZATION, which is published as U.S. Patent Application Publication No. 2020/0037097; and U.S. Patent Application Ser. No. 63/415,783 filed Oct. 13, 2022, titled SCENE RECENTERING, the complete disclosures of which are incorporated herein by reference. The room-fixed mode is, as described above, best suited for user's that are relatively stationary, such as sitting at a desk, and is ill-suited for active user's that are, for example, walking or running. To address this, controller 110 is programmed to operate, either by user selection or through some trigger condition, in a “head-fixed” mode, which maintains the virtual soundstage at a point fixed in space (similar to the room-fixed mode, referred to in this disclosure as “static phase” of the head-fixed mode) until a predetermined condition is met, at which point the virtual soundstage can be dynamically rotated following the rotation of the user's head (referred to in this disclosure as the “dynamic phase” of the head-fixed mode). The details of the head-fixed mode are described in greater detail in connection with
Turning to
For the purposes of this disclosure, and for the sake of simplicity, the location of a virtual soundstage and virtual speakers will often be discussed as though the virtual soundstage or virtual speakers are physically disposed at a given location. Even where not explicitly stated, it should be understood that the location of the virtual soundstage or the virtual speakers is a perceived location only. Stated differently, to the extent that the virtual soundstage or virtual speakers are described as having a physical location in space, it refers only to the perceived location of the virtual soundstage or virtual speakers and not to an actual location in space.
The location of the virtualized speakers 206, 208 can be a location at which the virtualized speakers 206, 208 were initialized. The initialization of the speakers can occur, for example, when the user first selects a spatialized audio mode (such as room-fixed mode or head-fixed mode) and can be initially placed in front of the user (e.g., as shown in
As described above, after the virtualized speakers are initialized, controller 110 maintains the virtual soundstage 204, in the static phase, at the same location until one or more predetermined conditions are satisfied. Once at least one of the predetermined conditions are met, the spatialized audio signal can be adjusted by controller 110, in the dynamic phase, such that virtual soundstage 204 rotates to track the movement of the user's head. The movement of virtual soundstage 204 is represented, in
The rotation of the virtual soundstage 204 is accomplished by the rotation of each virtual speaker 206, 208 (as the virtual soundstage 204 is comprised entirely of the collection of virtual speakers). Stated differently, as virtual soundstage 204, from time t0 to time t1 angularly rotates angle αc about the Z-axis, each virtual speaker 206, 208 likewise rotates angle αc about the Z-axis from its initial position at time t0 to its position at time t1. For the purposes of this disclosure, however, the rotation of virtual soundstage 204, is described with respect to a single reference point, denoted as point A in
Virtual soundstage 204 (and consequently, each virtual speaker 206, 208) angularly rotates about the Z-axis. Typically, the Z-axis corresponds to the axis about which the user's head rotates, otherwise the virtual soundstage 204 will not be perceived as remaining a fixed distance from the user throughout the course of a rotation. In practice, this axis can be approximated to any point at which the distance changes over the course of the rotation are not noticeable to a user.
The virtual soundstage 204 tracking the user's head movement is accomplished by rotating virtual soundstage 204 to reduce an angular offset between virtual soundstage 204 and the orientation of the user's head. This is shown in
Once at least one of the predetermined conditions are met, virtual soundstage begins moving toward the current orientation of the user's head 202 at an angular velocity of μdps degrees per second (the selection of a value for μdps can be a dynamic process and will be described in more detail below). The angle of rotation αc can be given by integrating the value of μdps per sample from time t0 to t1.
Similarly, the turn angle αturn of user's head 202 can be found by integrating the measured angular velocity of the user's head, Ωz (in degrees per second) measured each sample by controller 110, according to the input from sensor 112, from time t0 to t1.
Accordingly, the angular offset αoff at time t can be found by integrating the difference between the angular velocity Ωz of user's head 202 and the angular velocity μdps of soundstage 204 from time t0 to t1 and summing the result with the offset that existed at time t0.
Equation (3) can be rewritten as the sum of the offset at time t0 with the difference between turn angle αturn of user's head 202 and the angle of rotation αc of virtual sound stage 204 from time t0 to t1.
As described above, virtual soundstage 204 remains fixed in space while certain predetermined conditions are not met. Such predetermined conditions can be, for example, an angular bound (such as a wedge or a cone) disposed about the user's head to determine whether the user's head has turned beyond a predetermined maximum, and whether the angular jerk of the user's head exceeds a threshold to determine whether user's head is quickly turning, an early indication of a turn that will exceed the angular bound. Other predetermined conditions are conceivable and within the scope of this disclosure.
Turning to
While
However, employing the same reference axis for offset of virtual soundstage 204 and detecting when the orientation of the user's head exceeds the angular bound allows angular offset doff to be used as a proxy for reference axis of the user's head 202. In other words, if the same reference axis is used for determining the alignment of the virtual soundstage 204 with the user's head 202 and for determining whether the user's head 202 exceeds angular bound αmax, then while the user's head 202 remains within the angular bound αmax, angular offset doff is equal to the distance from the center of the angular bound αmax, permitting a certain economy of calculations. (This assumes that the angular bound is disposed symmetrically around the reference axis, although this would typically be the case.)
It should further be understood that other suitable shapes of angular bounds can be used. For example, a three-dimensional cone can be used to determine whether the pitch of user's head 202 exceeds a bound in the vertical dimension (i.e., the pitch of the user's head) can be used in place of the two-dimensional cone.
While the user's head remains within the maximum angular bound αmax, controller 110 operates in the static phase of the head-fixe mode, meaning that the user perceives virtual soundstage 204 as being fixed in space. This is equivalent to operating in the room-fixed mode described above. In general, during this period, virtual soundstage 204 angular velocity μdps is either 0 deg/s or is held at a very low value to compensate for drift in the sensor 112 (so the user will not perceive any motion in the virtual soundstage 204).
Upon determining that the user's head is outside of angular bound αmax, the angular velocity μdps of virtual soundstage 204 is increased in a direction that reduce the angular offset doff, that is, in the direction of the angular velocity (2, of the user's head 202. This is shown in more detail in
In
The controller will continue in the dynamic phase until the second predetermined condition is met. In one example of such a second predetermined condition, virtual soundstage 204 tracks the movement of user's head 202 for a predetermined length of time after the user's head returns to angular bound αmax. For example, as shown in
In an alternative example, the second predetermined condition can be a separate angular bound, narrower than angular bound αmax, established to determine when virtual soundstage 204 is aligned with longitudinal axis Z-P. Stated differently, in this example, angular bound αmax is used to determine when virtual soundstage 204 begins tracking user's head 202 (i.e., enters the dynamic phase), but a narrower angular bound is used to determine when virtual soundstage 204 stops tracking user's head 202 and again becomes fixed in space (i.e., enters the static phase). An example of this is shown in
In
Turning now to
with the length of the curved arrow representing the value of the angular jerk with respect to the threshold value, represented by the dashed line labeled T. If the angular jerk exceeds a threshold value, the user's head is turning quickly in a direction, suggesting that the user's head will imminently exceed the angular bound. The angular jerk thus represents an early indication of a head turn that requires adjusting the location of the virtual soundstage 204. The angular jerk can be directly received from sensor 112, but more typically can be calculated by comparing changes in orientation from one sample to the next; although, any suitable method for calculating angular jerk can be used.
but has not yet exceeded the threshold; accordingly, controller 110 remains in the static phase and virtual soundstage is perceived as fixed in place.
In general, monitoring an angular jerk of user's head 202 is useful for early detection of a head turn, but it will not (by design) detect the slower movements, even movements that result in the user's rotating heavily to the left or right. Accordingly, the angular jerk of user's head 202 is conceived of as being used in tandem with angular bound αmax, with either the angular jerk exceeding threshold or the orientation of the user's head exceeding the angular bound being sufficient to enter the dynamic phase and adjust the location of virtual soundstage 204; however, it is conceivable that either the angular bound condition or the angular jerk threshold condition could be used as the only predetermined condition for initiating tracking of the user's head. It should further be understood that, instead of or in addition to the two methods described above, any suitable predetermined condition for detecting or predicting rotation of the user's head that exceeds a predetermined extent can be used.
The angular velocity μdps of virtual soundstage 204 can be based on the angular velocity of the user's head Ωz. In general, when soundstage tracking is triggered, the goal is to rapidly cancel large head rotations (so that, typically, the user's perceives the soundstage predominantly in front of user's head 202), and to eliminate the most offensive artifacts of recentering the virtual soundstage, while also permitting some lag to be present so that the illusion of the virtual soundstage is preserved. Applicant has also appreciated that is it typically an unpleasant experience for the soundstage to lag the user's head once the user's head has stopped moving. In other words, audio frame A of virtual soundstage 204, ideally, should align with longitudinal axis Z-P (or other reference as axis) as user's head comes to a stop. Accordingly, to track the motion of the user's head, controller 110 can dynamically adjust the angular velocity μdps of virtual soundstage 204, in a manner that is based on the motion of user's head 202 but times the alignment of the virtual soundstage 204 with longitudinal axis Z-P to coincide with the end of the head turn.
To accomplish these goals, controller 110 can dynamically adjust the angular velocity μdps of virtual soundstage 204 according two separate stages: (1) when the user's head is accelerating, and (2) when the user's head is decelerating.
In this stage, the value of angular velocity μdps of the virtual soundstage 204—denoted μdps,1 for the angular velocity in the acceleration stage—is based on the angular velocity of the user's head Ωz, such that virtual soundstage 204 tracks the user's head 202 while permitting some amount of lag that allows the user to experience some spatial cues of a head turn to relative to virtual soundstage 204 (and preventing the perceived “collapse” of virtual soundstage 204). In an example, this can be accomplished according to the following equation:
where fturn,acceleration represents a scale factor applied to the angular velocity Ωz of the user's head and is a design choice.
In
pointing to the right. (It should be understood that a deceleration is an acceleration in a different direction. The deceleration, as described herein, is with respect to the direction of the initial acceleration.) Once the user's head begins decelerating, indicating that the end of the user's head turn is imminent, the final orientation of the user's head at the end of the turn is predicted, represented in dashed lines 202p, and used to select a value for μdps,2—the angular velocity in the deceleration stage—such that virtual soundstage 204 arrives in front of user's head 202 as the head turn is completed (i.e., virtual soundstage 204 arriving in front of user's head 202 and the user completing the head turn occur at approximately the same time).
This can be accomplished by predicting the time that the user's head turn will complete and setting μdps,2 so that virtual soundstage 204 traverses the remaining angular offset αoff between its current location and the predicted orientation of user's head 202 at end of the turn. For example, if h0 represents the time from the current sample to the end of the head turn, then virtual soundstage must compensate (i.e., traverse), the existing angular offset αoff and the additional angular offset αoff accrued the current time t to the time that head turn ends t+h0.
The angular velocity of the user's head at a future time h can be approximated from the linear approximation:
(This linear approximation has been truncated to a second term. It should, however, be understood that this Taylor series, and any others described in this disclosure, can be expanded to any number of terms.) Assuming that, at the end of the user's head turn, the angular velocity is zero (given that the user's head has come to a stop), the linear approximation of the t+h0 can be written as follows:
Accordingly, the linear approximation at future time h can be rewritten:
The additional angle accrued from the current time t0 the time that head turn ends t+h0 can thus be approximated as follows:
The angular velocity μdps,2(t) of the virtual soundstage 204 at future time h can be assumed to have a linear profile, Ωz(t+h0) and thus can be written as a linear approximation:
And thus, the angle compensated (traversed) by virtual soundstage 204 from the current time to the time that head turn ends t+h0 can be approximated as follows:
The angular velocity of μdps,2(t) can be set so that Equation 9 cancels Equation 11 and the angular offset αoff that existed at the current time t. In other words, μdps,2(t) is selected so that:
Solving for μdps,2(t) yields the angular velocity that results in the virtual soundstage 204 arriving in front of user's head 202 as the head turn is completed:
This angular velocity can be recalculated for each incoming sample to adjust for changes in the angular velocity of the user's head Ωz.
Regardless of the stage of user's head turn, the dynamic angular velocity can include a baseline angular velocity μbaseline that is summed with the calculated values of μdps,1 and μdps,2. Baseline angular velocity μbaseline can be added to account for very slow head turns or the user's head coming back into the angular bound αmax with a residual offset. The baseline angular velocity μbaseline ensures that the virtual soundstage 204 does not stray far from longitudinal axis Z-P (or other reference axis) or erases residual angular offsets. In an example, baseline angular velocity μbaseline can be 20 deg/sec, although other suitable values are contemplated herein.
Turning to
At point 1b, the second dynamic phase begins, this time as a result of the angular jerk of the user's head 202 exceeding the threshold t, both of which are represented in the middle plot of
Turning now to
At step 1002, a sensor input signal or a selection of a mode operation is received. In an example, the headphones (and, particularly, the controller) can operate in more than one mode of operation, which include different spatialized audio modes. These modes include, for example, a room-fixed mode, in which the virtual soundstage is perceived as fixed to a particular location in space that does not move in response to the movement of the user's head, except, in certain examples, if the user's head has turned away from the virtual soundstage for at least a predetermined period of time. In the head-fixed mode, as will be described in more detail below (and as described in connection with
The sensor input signal can, for example, be an input from a sensor such as an inertial measurement unit, that can provide an input indicative of the activity of a user, such as a walking or running, for which the head-tracking mode is better suited (other suitable types of sensors, such as accelerometers, gyroscopes, etc. are contemplated). The sensor input can be received from the same sensor detecting the orientation of the user's head or from a different sensor. In yet another example, the sensor signal can be mediated by a secondary device, such as a mobile phone or a wearable, such as a smart watch, that includes the sensor. Alternatively, an input can be received from a user—e.g., using a dedicated application or through a web interface, or, through a button or other input on the headphones—to directly select between modes.
Step 1004 is a decision block that represents whether the selection of the mode of operation or the sensor signal satisfy the requirements for the head-fixed mode. Upon receiving input of a selection of a mode of operation received from the user, this is typically sufficient to satisfy the requirement, without the need for any further action or decisions. The sensor signal, however, requires certain analysis to determine whether it is indicative of an activity that merits switching to the head-fixed mode. Such analysis can, for example, determine whether the user has taken a predetermined number of steps in a predetermined period of time or whether the user has completed a predetermined number of head turns (as, for example, determined by measuring a reference axis of the user's head with respect to an angular bound) within a predetermined period of time. Other suitable measures of determining that the user is engaged in an activity that would be aided (i.e., made more comfortable) through the implementation of the head-fixed mode are contemplated; indeed, many tests already exist for identifying when a user is engaged in an activity or particular types of activities, any such suitable test can be used. In alternative examples, step 1004 can be conducted by the secondary device (e.g., mobile phone or wearable), which can direct the controller to enter the head-fixed mode or otherwise notify it that a certain activity is occurring, following analysis of the sensor signal by the secondary device.
If the requirements for the head-fixed mode are not satisfied, then at step 1006, controller operates in the room-fixed mode, in which, as described above the virtual soundstage remains fixed except for in narrow circumstances in which the user has faced a different direction for an extended period of time. As mentioned above, additional details regarding the room-fixed mode are described in in U.S. patents application Ser. Nos. 16/592,454 and 63/415,783 the disclosures of which have been incorporated herein by reference. Further, although the room-fixed mode is listed as the only alternative to the head-fixed mode, it should be understood that the head-fixed mode could one of any number of potential modes, which may or may not be spatialized audio modes. If the requirements for the head-fixed mode are satisfied, then the method progresses to step 1008, shown in
At step 1008, the spatial audio signal is output by controller, based on the sensor signal representative of an orientation of the user's head, to a pair of electroacoustic transducers for transduction into a spatialized acoustic signal. The spatialized acoustic signal is perceived as originating from a virtual soundstage that comprises at least one virtual source, each of which the user perceives as being located in a position distinct from the location of the electroacoustic transducers. The virtual sources are also referenced to an audio frame of the virtual soundstage, which is disposed at a first location and aligned with a reference axis of the user's head. In other words, the audio frame is used as a singular point to describe the location and rotation of the virtual soundstage. In one example, the longitudinal axis of the user's head can be used as the reference axis, however the reference axis can be any suitable axis for determining an angular offset between the user's head and the virtual soundstage, as the user's head and the virtual soundstage rotate in the manner described below.
The position of the current location depends on the manner and time the head-fixed mode was selected in step 1004, as the spatialized audio signal can be initialized at step 1008 or it can be initialized earlier, such as in connection with the room-fixed mode. In the former instance, the spatialized audio signal is initialized at step 1008, and is thus determined according to user input, or, automatically, according to the direction the user is facing at step 1008. In the latter instance, the first location can depend upon the location of the audio frame, determined in connection with the room-fixed mode, which can be where the room-fixed mode was initialized or to the location it was adjusted.
Step 1010 is a decision block that determines whether a characteristic of the user's head satisfies at least one predetermined condition. Such predetermined conditions can be, for example, whether the orientation of the user's head is outside a predetermined angular bound or whether the angular jerk of the user's head exceeds a predetermined threshold.
Turning briefly to
Step 1020 is a decision block that represents determining whether the angular jerk of the user's head exceeds a predetermined threshold. In this example, the angular jerk of the user's head is compared to a threshold as early evidence of a turn that will likely exceed the angular bound. An example of comparing the angular jerk against a threshold is depicted and described in connection with
The angular jerk thus serves as another predetermined condition that can trigger the dynamic phase. Although an angular bound and an angular jerk threshold are described, it should be understood that other examples of suitable predetermined conditions, i.e., that are indicative or predictive of a head turn of at least a predetermined extent, are contemplated. It is also contemplated that only one such predetermined condition—e.g., only the angular bound or only the angular jerk threshold—can be used.
Upon determining at least one predetermined condition is not satisfied, then, at step 1012, the audio frame is maintained at its current location. In other words, in the above examples, upon determining the user's head is within the angular bound and the angular jerk is below the threshold. Maintaining the audio frame at the current location is implemented by rendering the at least one virtual source such that it is perceived as fixed in space, regardless of the movement of the user's head (in actuality, this requires adjusting the spatialized acoustic signal, based on detected changes to the orientation of the user's head, in a manner that the virtual sources are perceived as fixed in space). Thus, upon determining the user's head is relatively stationary (e.g., the user is seated at a desk), the user will perceive as the virtual soundstage as fixed in space.
Returning to
Turning momentarily to
Upon determining, however, at step 1022, the user's head does not have an angular acceleration, then at step 1026 the angular velocity of the rotation of the audio frame is selected such that the audio frame will align with a predicted location of the reference axis as the turn of the user's head will come to an end. This can be accomplished by first predicting the time at which the user's head will stop turning (by assuming that the current deceleration continues) and the angular rotation traveled by the user's head from its current location to that time. The angular velocity the virtual soundstage (and, thus, of each virtual source) can then be selected so that, at the predicted time, the virtual soundstage will align with the user's head, meaning that it will compensate for both the existing offset between the virtual soundstage and the user's head, and the additional angle that the user's head will travel between the current time and the predicted time (as described in connection with Eq. (5) above).
Returning to
Looking at
Not shown in method 1000 is an additional condition to exit the head-tracking phase entirely. This can be accomplished, for example, by returning each sample or periodically to step 1004 to determine whether the requirements for the head-fixed mode continue to be satisfied. Upon determining the requirements are no longer satisfied, the method can enter the room fixed mode or some other mode.
The functionality described herein, or portions thereof, and its various modifications (hereinafter “the functions”) can be implemented, at least in part, via a computer program product, e.g., a computer program tangibly embodied in an information carrier, such as one or more non-transitory machine-readable media or storage device, for execution by, or to control the operation of, one or more data processing apparatus, e.g., a programmable processor, a computer, multiple computers, and/or programmable logic components.
A computer program can be written in any form of programming language, including compiled or interpreted languages, and it can be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment. A computer program can be deployed to be executed on one computer or on multiple computers at one site or distributed across multiple sites and interconnected by a network.
Actions associated with implementing all or part of the functions can be performed by one or more programmable processors executing one or more computer programs to perform the functions of the calibration process. All or part of the functions can be implemented as, special purpose logic circuitry, e.g., an FPGA and/or an ASIC (application-specific integrated circuit).
Processors suitable for the execution of a computer program include, by way of example, both general and special purpose microprocessors, and any one or more processors of any kind of digital computer. Generally, a processor will receive instructions and data from a read-only memory or a random access memory or both. Components of a computer include a processor for executing instructions and one or more memory devices for storing instructions and data.
While several inventive embodiments have been described and illustrated herein, those of ordinary skill in the art will readily envision a variety of other means and/or structures for performing the function and/or obtaining the results and/or one or more of the advantages described herein, and each of such variations and/or modifications is deemed to be within the scope of the inventive embodiments described herein. More generally, those skilled in the art will readily appreciate that all parameters, dimensions, materials, and configurations described herein are meant to be exemplary and that the actual parameters, dimensions, materials, and/or configurations will depend upon the specific application or applications for which the inventive teachings is/are used. Those skilled in the art will recognize, or be able to ascertain using no more than routine experimentation, many equivalents to the specific inventive embodiments described herein. It is, therefore, to be understood that the foregoing embodiments are presented by way of example only and that, within the scope of the appended claims and equivalents thereto, inventive embodiments may be practiced otherwise than as specifically described and claimed. Inventive embodiments of the present disclosure are directed to each individual feature, system, article, material, and/or method described herein. In addition, any combination of two or more such features, systems, articles, materials, and/or methods, if such features, systems, articles, materials, and/or methods are not mutually inconsistent, is included within the inventive scope of the present disclosure.