This disclosure relates generally to head pose tracking for spatial audio applications.
Spatial audio creates a three-dimensional (3D) virtual auditory space that allows a user wearing a headset to pinpoint where a sound source is located in the 3D virtual auditory space, while watching a movie, playing a video game or interacting with augmented reality (AR) content displayed on a source device (e.g., a computer screen). Some existing spatial audio platforms include a head pose tracker that uses a video camera to track the head pose of the user. Other existing spatial audio platforms use a single inertial measurement unit (IMU) in the headset for head pose tracking. If the source device is a mobile device (e.g., smartphone, tablet computer), then the source device and the headset are free to move relative to each other, which may adversely impact the user's perception of the 3D spatial audio. For example, in platforms that rely on a single IMU the audio would swivel off-center in cases such as movie-watching on a bus or plane that is turning, since it appears to the single headset IMU tracking solution that the user is turning their head.
Embodiments are disclosed for correlated motion detection for spatial audio applications.
In an embodiment, a method comprises: obtaining, using one or more processors of a source device, source device motion data from a source device and headset motion data from a headset; determining, using the one or more processors, correlation measures using the source device motion data and the headset motion data; updating, using the one or more processors, a motion tracking state based on the determined correlation measures; and initiating motion tracking in accordance with the updated motion tracking state. The motion tracking state determines whether tracking is relative to the source device rotation, or ignore how the source device is rotating.
In an embodiment, updating the motion tracking state based on the determined correlation measures further comprises: transitioning from a single inertial measurement unit (IMU) tracking state to a two IMU tracking state, wherein the motion tracking is performed using relative motion data computed from the headset motion data and source device motion data.
In an embodiment, different size windows of motion data are used to compute short term and long term correlation measures.
In an embodiment, the short term correlation measures are computed based on a short term window of rotation rate data obtained from the source device, a short term window of rotation rate data obtained from the headset, a short term window of relative rotation rate data about a gravity vector, and a variance of the relative rotation rate data.
In an embodiment, the long term correlation measures are computed based on a long term window of rotation rate data obtained from the source device, a long term window of rotation rate data obtained from the headset, a long term window of relative rotation rate data about a gravity vector, and a variance of the relative rotation rate data.
In an embodiment, the correlation measures are logically combined into a single correlation measure indicating whether the source device motion and headset motion are correlated, and the single correlation measure triggers the updating of the motion tracking state from a single inertial measurement unit (IMU) tracking state to two IMU tracking state.
In an embodiment, the single correlation measure includes a confidence measure that indicates a confidence that the user is engaged in a particular activity that results in correlated motion.
In an embodiment, the particular activity includes at least one of walking or driving in a vehicle.
In an embodiment, the single correlation measure logically combines a mean relative rotation rate about a gravity vector, a determination that a mean short term rotation rate of the source device is less than a mean short term rotation rate of the headset and the confidence measure.
In an embodiment, the motion tracking state is updated from a two inertial measurement unit (IMU) tracking state to a single IMU tracking state based on whether the source device is rotating faster than the headset and that the source device rotation is inconsistent.
In an embodiment, a system comprises: one or more processors; memory storing instructions that when executed by the one or more processors, cause the one or more processors to perform operations: obtaining, using one or more processors of a source device, source device motion data from a source device and headset motion data from a headset worn on a head of a user; determining, using the one or more processors, correlation measures using the source device motion data and the headset motion data; updating, using the one or more processors, a motion tracking state based on the determined correlation measures; and initiating head pose tracking in accordance with the updated motion tracking state.
Other embodiments can include an apparatus, computing device and non-transitory, computer-readable storage medium.
Particular embodiments disclosed herein provide one or more of the following advantages. The disclosed embodiments allow a head pose tracker to transition to a relative motion head tracking state when motion data from a source device and headset are determined to be correlated. The relative motion head tracking state tracks the user's head rotations relative to the source device. For example, if the user turns their head to the side, the center audio channel will sound as if it is coming from the side of the user's head, such that the audio appears to be fixed in the same location relative to the user.
The details of one or more implementations of the subject matter are set forth in the accompanying drawings and the description below. Other features, aspects and advantages of the subject matter will become apparent from the description, the drawings and the claims.
Source device 101 includes any device capable of displaying AV content and that can be wired or wirelessly coupled to headset 102, including but not limited to a smartphone, tablet computer, laptop computer, wearable computer, game console, television, etc. Source device 101 includes a display for presenting the visual portion of the AV content and IMU 707 that includes motion sensors (e.g., 3-axis MEMS gyro, 3-axis MEMS accelerometer) that output source device motion data (e.g., rotation rate, acceleration). Source device 101 further includes a spatial audio rendering engine (e.g., a binaural rendering engine) that simulates the main audio cues humans use to localize sounds including interaural time differences, interaural level differences, and spectral filtering done by the outer ears. An example source device architecture 700 is described in reference to
Headset 102 is any device that includes loudspeakers for projecting acoustic audio, including but not limited to: headsets, earbuds, ear phones and loudspeakers (e.g., smart speakers). In an embodiment, headset 102 includes stereo (Left/Right) loudspeakers that output rendered spatial audio content generated by source device 101. Headset 102 also includes inertial measurement unit (IMU) 811 that includes motion sensors (e.g., 3-axis MEMS gyro, 3-axis MEMS accelerometer) that output headset motion data (e.g., rotation rate, acceleration).
In an embodiment, the headset motion data is transmitted to source device 101 over a short-range wireless communication channel (e.g., a Bluetooth channel). At source device 101, correlation motion detector 103 determines similarities (e.g., similar rotational and/or acceleration features) between the headset motion data and the source device motion data. If the headset data and source device motion data are determined to not be correlated, a head tracker is transitioned into a 1-IMU tracking state 104, where head tracking is performed using only the headset motion data. The 1-IMU tracking state 104, where head tracking is performed using only the headset motion data, allows arbitrary rotation of the source device (e.g., picking up the source device or rotating it around in the user's hands) to be ignored, so that this uncorrelated source device rotation does not cause the audio to shift around. If the headset motion data and the source device motion data are determined to be correlated, the head tracker is transitioned into a 2-IMU fusion tracking state 105, where head tracking is performed using relative motion data computed from the headset motion data and source device motion data. In both 1-IMU and 2-IMU states, the boresight vector is tracked which is the location of the source device from the perspective of the user's head. A relative pose tracking model, described in Appendix A, is used in both tracking states. The difference is that in the 1-IMU state, the rotation of the source device is ignored and does not affect the tracked boresight vector location. In the 2-IMU state, the boresight vector is updated to compensate for the rotation of the source device.
This alignment process causes the spatial audio to be “centered.” When the spatial audio is centered, the user perceives audio from the center channel (e.g., spoken dialogue) as coming directly from the display of source device 101. The centering is accomplished by tracking boresight vector 203 to the location of source device 101 from the head reference frame using an extended Kalman filter (EKF) tracking system, as described in Appendix A. Estimated boresight vector 203 only determines the location of the center channel. A second tracker takes as input the estimated boresight vector 203 and provides an output orientation of ambience bed 202, which determines the location of the L/L-S and R/R-S surround channels around the user in addition to the center channel. Aligning the center channel of ambience bed 202 with boresight vector 203 allows rendering the center channel at the estimated location of source device 101 for the user's perception.
If boresight vector 203 is not centered on source device 101 (e.g., due to tracking error), then aligning the center channel of ambience bed 202 will not “center” the audio, since the center channel will still be rendered at the erroneous estimate of the location of source device 101. Note that boresight vector 203 changes whenever the user's head rotates with respect to source device 101, such as when source device 101 is stationary in front of the user and the user's head is rotating. In this case, the motion of the user's head is accurately tracked as the head rotates, so that even when boresight vector 203 changes, the audio stays centered on the estimated location of source device 101 because the EKF is providing accurate tracking of how the true boresight vector 203 is changing. Also note that spatial audio becomes uncentered when the estimated boresight vector 203 is not the true location of source device 101 due to tracking error, which may come from drift over time, such as IMU propagation errors from gyro bias, etc., or other sources of error. In an embodiment, the tracking error is corrected using a bleed-to-zero (BTZ) process when the user is quiescent or a complex transition is detected, as described in Appendix A.
Note that ambience bed 202 shown in
Headset motion data received from headset 102 is stored in motion data buffer 301 and source device motion data is stored in motion data buffer 302. In an embodiment, there is also a motion buffer 308 for storing relative rotation rate samples. In an embodiment, several seconds of data is stored. Correlated motion detector 303 takes as input different size windows of the motion data from buffers 301, 302 for use in computing short term and long term correlation measures, as illustrated in
The example correlation measures computed above are used by state machine 304 to transition from a 1-IMU state 501 to a 2-IMU state 502 and back again. For example, a transition will occur from 1-IMU state 501 to 2-IMU state 502 when rotation is correlated and thus satisfies Equation [6]:
(correlatedRotation==true∥(isInCorrelatedActivity&& rotationAroundGravityLongBufferMeanDiff(src, aux)<θ))&& srcRotationRateMeanShort<auxRotationRateMeanShort+δ Equation [6]
where correlatedRotation computed according to Equation [3] is TRUE, srcMotionActivity is a state variable in a motion activity state machine implemented in the source device that indicates (based on analysis of inertial sensor and digital pedometer data) an estimated motion activity state, and VehicularOrWalkingHighConf is a particular motion activity state in the motion activity state machine that indicates with high confidence that the source device is in a vehicle or attached to a user who is walking. Note isInCorrelatedActivity indicates that the user is walking, in a vehicle, in a plane, etc., and can be provided by an activity classifier, as previously described. Also note that correlatedRotation is about the inertial gravity vector, e.g., if both devices are rotating or maintaining their yaw rate similarly.
A transition from 2-IMU state 502 to 1-IMU state 501 will occur when the source device is rotating faster than the headset and thus satisfies Equation [7]:
srcRotatingFaster∥(∥Var(ωrelshort)∥>τ1&&inconsistentSrcRotation), Equation [7]
where srcRotatingFaster and inconsistentSrcRotation are computed using Equations [4] and [5], respectively.
The reason for having a 1-IMU state 501 and 2-IMU-state 502 is to prevent an undesirable listener experience in un-correlated motion scenarios, where head tracking relative to position/attitude can result in a potential ill effect (e.g., causing the user to be nauseated) due to the audio source moving around too much. 1-IMU state 501 allows tracking of the user's head rotations relative to an assumed static source device in such situations, hence limiting the potential ill effects. Conversely, during correlated motion scenarios, where the source device is moving/rotating with the headset (e.g., while the user is walking and watching content in a vehicle/plane), it is desirable that tracking is performed in 2-IMU state 502 (estimating the relative position/attitude between the two devices) to maintain the illusion of 3D sound, originating from the source device, even when the user or the vehicle is turning.
The output of correlated motion detector 303 (correlatedRotation) is input into motion tracker 306 and relative motion tracker 307. Note that motion tracker 306 outputs relative position and relative attitude, assuming the source device remains stationary.
The process described above meets multiple design criteria: 1) to operate in the 1-IMU state 501, unless the devices are detected (with good confidence) to be in a moving frame; 2) to detect un-correlated/complex motion and transition to the 1-IMU state 501 with minimal delay (i.e., minimizing tracking error); and 3) to minimize unnecessary transitions between 1-IMU state 501 and 2-IMU state 502.
Process 600 begins by obtaining source device and headset motion data (601). For example, motion data output by IMUs in the source device and headset can be stored in buffers as shown in
Process 600 continues by determining correlation measures using the source device motion data and the headset motion data (602). For example, the correlation measures shown in Equations [1]-[5] are computed using the respective rotation rates output by the source device and headset IMUs and relative rotation rates computed from the respective source device and headset rotation rates and the estimation of relative attitude.
Process 600 continues by updating a motion tracking state based on the determined correlation measures (603), and initiating head pose tracking in accordance with the updated motion tracking state (604), as described in reference to
Sensors, devices and subsystems can be coupled to peripherals interface 720 to provide multiple functionalities. For example, IMU 707, light sensor 708 and proximity sensor 709 can be coupled to peripherals interface 720 to facilitate motion sensing (e.g., acceleration, rotation rates), lighting and proximity functions of the wearable computer. Location processor 710 can be connected to peripherals interface 720 to provide geo-positioning. In some implementations, location processor 710 can be a GNSS receiver, such as the Global Positioning System (GPS) receiver. Electronic magnetometer 711 (e.g., an integrated circuit chip) can also be connected to peripherals interface 720 to provide data that can be used to determine the direction of magnetic North. Electronic magnetometer 720 can provide data to an electronic compass application. IMU 707 includes one or more accelerometers and/or gyros (e.g., 3-axis MEMS accelerometer and 3-axis MEMS gyro) configured to determine acceleration and attitude (e.g., rotation rate) of the source device, as described in reference to
Camera/3D depth sensor 702 captures digital images and video and can include both forward-facing and rear-facing cameras. The 3D depth sensor can be any sensor capable of capturing 3D data or point clouds, such as a time of flight (TOF) sensor or LiDAR.
Communication functions can be facilitated through wireless communication subsystems 712, which can include radio frequency (RF) receivers and transmitters (or transceivers) and/or optical (e.g., infrared) receivers and transmitters. The specific design and implementation of the wireless communication subsystem 712 can depend on the communication network(s) over which a mobile device is intended to operate. For example, architecture 700 can include communication subsystems 712 designed to operate over a GSM network, a GPRS network, an EDGE network, a Wi-Fi™ network and a Bluetooth™ network. In particular, the wireless communication subsystems 712 can include hosting protocols, such that the mobile device can be configured as a base station for other wireless devices.
Audio subsystem 705 can be coupled to a speaker 703 and one or more microphones 704 to facilitate voice-enabled functions, such as voice recognition, voice replication, digital recording and telephony functions. Audio subsystem 705 can be configured to receive an interpret voice commands from the user using speech detection and recognition engine.
I/O subsystem 713 can include touch surface controller 717 and/or other input controller(s) 715. Touch surface controller 717 can be coupled to a touch surface 718. Touch surface 718 and touch surface controller 717 can, for example, detect contact and movement or break thereof using any of a plurality of touch sensitivity technologies, including but not limited to capacitive, resistive, infrared and surface acoustic wave technologies, as well as other proximity sensor arrays or other elements for determining one or more points of contact with touch surface 746. Touch surface 718 can include, for example, a touch screen or the digital crown of a smart watch. I/O subsystem 713 can include a haptic engine or device for providing haptic feedback (e.g., vibration) in response to commands from processor or a digital signal processor (DSP) 722. In an embodiment, touch surface 718 can be a pressure-sensitive surface.
Other input controller(s) 744 can be coupled to other input/control devices 716, such as one or more buttons, rocker switches, thumb-wheel, infrared port and USB port. The one or more buttons (not shown) can include an up/down button for volume control of speaker 703 and/or microphones 704. Touch surface 718 or other input control devices 716 (e.g., a button) can include, or be coupled to, fingerprint identification circuitry for use with a fingerprint authentication application to authenticate a user based on their fingerprint(s).
In one implementation, a pressing of the button for a first duration may disengage a lock of the touch surface 718; and a pressing of the button for a second duration that is longer than the first duration may turn power to the mobile device on or off. The user may be able to customize a functionality of one or more of the buttons. The touch surface 718 can, for example, also be used to implement virtual or soft buttons.
In some implementations, the mobile device can present recorded audio and/or video files, such as MP3, AAC and MPEG files. In some implementations, the mobile device can include the functionality of an MP3 player. Other input/output and control devices can also be used.
Memory interface 721 can be coupled to memory 723. Memory 723 can include high-speed random access memory and/or non-volatile memory, such as one or more magnetic disk storage devices, one or more optical storage devices and/or flash memory (e.g., NAND, NOR). Memory 723 can store operating system 724, such as the iOS operating system developed by Apple Inc. of Cupertino, Calif. Operating system 725 may include instructions for handling basic system services and for performing hardware dependent tasks. In some implementations, operating system 724 can include a kernel (e.g., UNIX kernel).
Memory 723 may also store communication instructions 725 to facilitate communicating with one or more additional devices, one or more computers and/or one or more servers, such as, for example, instructions for implementing a software stack for wired or wireless communications with other devices. Memory 723 may include graphical user interface instructions 726 to facilitate graphic user interface processing; sensor processing instructions 727 to facilitate sensor-related processing and functions; phone instructions 728 to facilitate phone-related processes and functions; electronic messaging instructions 729 to facilitate electronic-messaging related processes and functions; web browsing instructions 730 to facilitate web browsing-related processes and functions; media processing instructions 731 to facilitate media processing-related processes and functions; GNSS/Location instructions 732 to facilitate generic GNSS and location-related processes; and camera/3D depth sensor instructions 733 for capturing images (e.g., video, still images) and depth data (e.g., a point cloud). Memory 723 further includes spatial audio instructions 734 for use in spatial audio applications, including but not limited AR and immersive video applications.
Each of the above identified instructions and applications can correspond to a set of instructions for performing one or more functions described above. These instructions need not be implemented as separate software programs, procedures, or modules. Memory 723 can include additional instructions or fewer instructions. Furthermore, various functions of the mobile device may be implemented in hardware and/or in software, including in one or more signal processing and/or application specific integrated circuits.
SoC 801 further includes various modules, such as a radio frequency (RF) radio (wireless transceiver) for wireless bi-directional communication with other devices, such as a source device 101, as described in reference to
The described features can be implemented advantageously in one or more computer programs that are executable on a programmable system including at least one programmable processor coupled to receive data and instructions from, and to transmit data and instructions to, a data storage system, at least one input device, and at least one output device. A computer program is a set of instructions that can be used, directly or indirectly, in a computer to perform a certain activity or bring about a certain result. A computer program can be written in any form of programming language (e.g., SWIFT, Objective-C, C#, Java), including compiled or interpreted languages, and it can be deployed in any form, including as a stand-alone program or as a module, component, subroutine, a browser-based web application, or other unit suitable for use in a computing environment.
While this specification contains many specific implementation details, these should not be construed as limitations on the scope of any inventions or of what may be claimed, but rather as descriptions of features specific to particular embodiments of particular inventions. Certain features that are described in this specification in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable sub combination. Moreover, although features may be described above as acting in certain combinations and even initially claimed as such, one or more features from a claimed combination can in some cases be excised from the combination, and the claimed combination may be directed to a sub combination or variation of a sub combination.
Similarly, while operations are depicted in the drawings in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. In certain circumstances, multitasking and parallel processing may be advantageous. Moreover, the separation of various system components in the embodiments described above should not be understood as requiring such separation in all embodiments, and it should be understood that the described program components and systems can generally be integrated together in a single software product or packaged into multiple software products.
As described above, some aspects of the subject matter of this specification include gathering and use of data available from various sources to improve services a mobile device can provide to a user. The present disclosure contemplates that in some instances, this gathered data may identify a particular location or an address based on device usage. Such personal information data can include location-based data, addresses, subscriber account identifiers, or other identifying information.
The present disclosure further contemplates that the entities responsible for the collection, analysis, disclosure, transfer, storage, or other use of such personal information data will comply with well-established privacy policies and/or privacy practices. In particular, such entities should implement and consistently use privacy policies and practices that are generally recognized as meeting or exceeding industry or governmental requirements for maintaining personal information data private and secure. For example, personal information from users should be collected for legitimate and reasonable uses of the entity and not shared or sold outside of those legitimate uses. Further, such collection should occur only after receiving the informed consent of the users. Additionally, such entities would take any needed steps for safeguarding and securing access to such personal information data and ensuring that others with access to the personal information data adhere to their privacy policies and procedures. Further, such entities can subject themselves to evaluation by third parties to certify their adherence to widely accepted privacy policies and practices.
In the case of advertisement delivery services, the present disclosure also contemplates embodiments in which users selectively block the use of, or access to, personal information data. That is, the present disclosure contemplates that hardware and/or software elements can be provided to prevent or block access to such personal information data. For example, in the case of advertisement delivery services, the present technology can be configured to allow users to select to “opt in” or “opt out” of participation in the collection of personal information data during registration for services.
Therefore, although the present disclosure broadly covers use of personal information data to implement one or more various disclosed embodiments, the present disclosure also contemplates that the various embodiments can also be implemented without the need for accessing such personal information data. That is, the various embodiments of the present technology are not rendered inoperable due to the lack of all or a portion of such personal information data. For example, content can be selected and delivered to users by inferring preferences based on non-personal information data or a bare minimum amount of personal information, such as the content being requested by the device associated with a user, other non-personal information available to the content delivery services, or publicly available information.
This application claims priority to U.S. Provisional Patent Application No. 63/041,876, filed Jun. 20, 2020, the entire contents of which are incorporated herein by reference.
Number | Date | Country | |
---|---|---|---|
63041876 | Jun 2020 | US |