The present disclosure generally relates to adjusting inaccurate position and/or orientation information obtained via relative motion sensors. This can be useful in the field of Virtual Reality (VR), for example.
Virtual Reality (VR) drives innovation in a wide range of applications including theme parks, museums, architecture, training, and simulation. All of them benefit from multi-user interaction and large scale areas that are larger than 25 m×25 m, for example. Today's state-of-the-art VR systems mostly use camera-based motion tracking. However, tracking accuracy decreases with the camera resolution and the environment size, and tracking more users needs more cameras to avoid occlusion. The cost of those systems grows exponentially with the number of users and the size of the tracking area. Instead, room-scale tracking is available for a few hundred dollars.
Conceptually, so-called No-Pose (NP) tracking systems that only track single positions per user/object instead of the complete pose (position and orientation) can work with larger tracking areas and more users at significantly lower total cost. NP tracking systems can be based on Radio Frequency (RF) tracking systems, for example. But there are a number of technical obstacles that still limit their applicability. Most importantly, in contrast to camera-based motion capturing systems (that provide the full pose), NP tracking systems only provide single positions per object that cannot be combined to derive the pose as tracking accuracy is insufficient. Hence, an object's orientation (such as a head's orientation with respect to a user's body, for example) has to be estimated separately.
Current low-cost Head-Mounted Display (HMD) units are equipped with local Inertial Measurement Units (IMU) such as accelerometers, gyroscopes, and magnetometers that can be used to estimate the object's orientation (e.g., head orientation). This on-client processing can also reduce latency, which is a serious problem of pose estimation camera-based systerns. Reducing latency in VR systems can significantly improve immersion.
But in practice IMU-based orientation estimation is far from accurate because of a number of reasons. First, as magnetometers are unreliable in many indoor and magnetic environments, they often provide a wrong absolute orientation. Second, dead reckoning, based on relative IMU data leads to drift and (after a while) to a wrong orientation estimation. In navigation, dead reckoning or dead-reckoning is known as a process of calculating one's current position by using a previously determined position and advancing that position based upon known or estimated speeds over elapsed time and course. Third, state-of-the-art orientation filters fail, as the low-cost sensors of the HMD provide unreliable motion direction estimates. Fourth, besides sensor noise, rotations (such as head rotations, for example) make it impossible to reliably estimate the linear and gravity components of the acceleration while moving and turning the object. However, linear acceleration components are necessary to estimate a movement direction, displacement and/or position.
A wrong orientation estimation can result in a significant mismatch of the real world and a VR display. The upper row in
Thus, it is desirable to better align the sensor based orientation or view direction {right arrow over (v)} with the real orientation or view direction {right arrow over (r)}.
An idea of the present disclosure is to combine positional tracking with relative IMU data to achieve a long-time stable object orientation while a user or object is naturally moving (e.g., walking and rotating his/her head).
According to one aspect of the present disclosure, it is provided a method for correcting orientation information which is based on inertial sensor data from one or more inertial sensors mounted to an object. The object can in principle be any kind of animate or inanimate movable or moving object having one or more IMUs mounted thereto. For example, the object can be a human head or a HMD in some examples. The sensor data can comprise multi-dimensional acceleration data and/or multi-dimensional rotational velocity data in some examples. The method includes receiving position data indicative of a current absolute position of the object. In some examples, the position data can be indicative of a single absolute position of the object stemming from a NP tracking system. The method also includes determining a direction of movement of the object based on the position data and correcting the object's orientation information based on the determined direction of movement.
In example applications related to VR, the object's orientation information, which is based on inertial sensor data, can also be considered as the object's virtual orientation, which might differ from its real orientation due to sensor inaccuracies.
In some examples, the object's orientation information may be indicative of rotational orientation around the object's (e.g., a user's head) yaw axis. Various objects are free to rotate in three dimensions: pitch (up or down about an axis running horizontally), yaw (left or right about an axis running vertically), and roll (rotation about a horizontal axis perpendicular to the pitch axis). The axes can alternatively be designated as lateral, vertical, and longitudinal. These axes move with the object and rotate relative to the earth along with the object. A yaw rotation is a movement around the yaw axis of a rigid body that changes the direction it is pointing, to the left or right of its direction of motion. The yaw rate or yaw velocity of an object is the angular velocity of this rotation. It is commonly measured in degrees per second or radians per second.
In some examples, the direction of movement can be determined based on position data corresponding to subsequent time instants. The position data can be indicative of a 2- or 3-dimensional position (x, y, z) of the object and can be provided by a position tracking system. Based on a first multi-dimensional position at a first time instant and a second multi-dimensional position at a subsequent second time instant it is possible to derive a current or instantaneous multi-dimensional motion vector pointing from the first position to the second position.
In some examples, correcting the object's orientation information can include estimating, based on the sensor data, a relationship between a real orientation of the object and the object's (real) direction of movement. If the estimated relationship indicates that the object's real orientation (e.g., user's head orientation) corresponds to the object's real direction of movement, the object's orientation information can be corrected based on the determined real direction of movement.
In some examples, assuming that animate objects, such as humans, mostly walk towards their viewing direction, correcting the object's orientation information can include corresponding the object's orientation information with the object's direction of movement. As such, an inaccurate orientation estimate provided by the one or more IMUs can be aligned with the object's measured (real) direction of movement.
In some examples, the method can further optionally comprise preprocessing the sensor data with a smoothing filter to generate smoothed sensor data. An example of such a smoothing filter would be a Savitzky-Golay filter, which is a digital filter that can be applied to a set of digital data points for the purpose of smoothing the data, that is, to increase the signal-to-noise ratio without greatly distorting the signal. This can be achieved, in a process known as convolution, by fitting successive sub-sets of adjacent data points with a low-degree polynomial by the method of linear least squares.
In some examples, the method can further optionally comprise filtering the (smoothed) sensor data with a low pass filter and/or a high pass filter. In some applications this can be beneficial to avoid or reduce unwanted sensor signal components, such as acceleration signal components related to gravity, for example.
In some examples, estimating the relationship between the object's real orientation and the object's (real) direction of movement can further include compressing the (filtered) sensor data. In signal processing, data compression involves encoding information using fewer bits than the original representation. Compression can be either lossy or lossless. Lossless compression reduces bits by identifying and eliminating statistical redundancy. No information is lost in lossless compression. Lossy compression reduces bits by removing unnecessary or less important information. The process of reducing the size of data is referred to as data compression.
In some examples, compressing the sensor data can include extracting one or more statistical and/or heuristic features from the sensor data to generate sensor data feature vectors. Such features can include domain specific features, such as time domain features (e.g., mean, standard deviation, peaks) or frequency domain features (e.g., FFT, energy, entropy), heuristic features (e.g., signal magnitude area/vector, axis correlation), time-frequency-domain features (e.g., wavelets), domain-specific features (e.g., gait detection).
Although a variety of statistical and/or heuristic features is principally possible, compressing the sensor data can comprise extracting a mean value, standard deviation, and can comprise a Principal Component Analysis (PCA) of the sensor data in some examples. As such, numerous sensor data samples can be reduced to only one or a few samples representing the statistical and/or heuristic features. PCA is a statistical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components. The number of principal components is less than or equal to the number of original variables. This transformation is defined in such a way that the first principal component has the largest possible variance (that is, accounts for as much of the variability in the data as possible), and each succeeding component in turn has the highest variance possible under the constraint that it is orthogonal to the preceding components. The resulting vectors are an uncorrelated orthogonal basis set.
In some examples, estimating the relationship between the object's real orientation and the object's real direction of movement can further include classifying the relationship based on the compressed sensor data and generating a statistical confidence with respect to the classification result. In machine learning and statistics, classification is referred to as the problem of identifying to which of a set of categories a new observation belongs, on the basis of a training set of data containing observations (or instances) whose category membership is known. Example categories can be indicative of the relationship between the object's real orientation and the object's real direction of movement, such as “head right while moving forward”, “head left while moving forward”, or “head straight while moving forward”. In the terminology of machine learning, classification is considered an instance of supervised learning, i.e. learning where a training set of correctly identified observations is available. Often, the individual observations are analyzed into a set of quantifiable properties, known variously as explanatory variables or features.
Classifying the relation between the object's real orientation and the object's real direction of movement can be performed by a variety of classification algorithms, such as, for example a Support Vector Machine (SVM). In machine learning, SVMs are supervised learning models with associated learning algorithms that analyze data used for classification and regression analysis. Given a set of training examples, each marked as belonging to one or the other of two categories, an SVM training algorithm builds a model that assigns new examples to one category or the other, making it a non-probabilistic binary linear classifier. An SVM model is a representation of the examples as points in space, mapped so that the examples of the separate categories are divided by a clear gap that is as wide as possible. New examples are then mapped into that same space and predicted to belong to a category based on which side of the gap they fall. In addition to performing linear classification, SVMs can efficiently perform a non-linear classification using what is called the kernel trick, implicitly mapping their inputs into high-dimensional feature spaces. When data are not labeled, supervised learning is not possible, and an unsupervised learning approach is required, which attempts to find natural clustering of the data to groups, and then map new data to these formed groups. The clustering algorithm which provides an improvement to the SVMs is called support vector clustering and is often used in industrial applications either when data are not labeled or when only some data are labeled as a preprocessing for a classification pass.
In some examples, the statistical confidence of the classification can further be verified based on predetermined physical properties or limitations of the object. For example, a human often looks into the direction he/she is moving. Also, a human is not capable of turning his/her head from left to right, or vice versa, within certain short time periods. For example, if two subsequent estimation or prediction periods are within 100-250 ms, and both predictions yield contradicting results with respect to head orientation, their respective confidence level can be lowered.
In some examples, an error of the object's orientation information can be corrected incrementally or iteratively. That is to say, the error can be divided into smaller portions which can be applied to the VR over time. In VR applications, this can reduce or even avoid so-called motion sickness. In some examples, spherical linear interpolation (SLERP) can be used to correct the error.
In some examples, the method can include a live as well as a training mode. During training mode a supervised learning model (such as e.g. SVM) may be trained for classifying a relation between a real orientation of the object and the object's real direction of movement based on training sensor data corresponding to a predefined relation between a predefined real orientation and a predefined real direction of movement of the object.
According to a further aspect of the present disclosure it is provided an apparatus for correcting orientation information based on inertial sensor data from one or more inertial sensors mounted to an object. The apparatus, when operational, can perform methods according to the present disclosure. It comprises an input configured to receive position data indicative of a current absolute position of the object and processing circuitry configured to determine a direction of movement of the object based on the position data and to correct the object's orientation information based on the determined direction of movement.
Thus, some examples propose to combine positional tracking with relative IMU data to achieve a long-time stable object (e.g. head) orientation while the user is naturally moving (e.g. walking and rotating his/her head). Under the assumption that humans mostly walk towards their viewing direction, some examples propose to extract features from the sensor signals, classify the relation between real movement direction (e.g., of the body) and real object (e.g. head) orientation, and combine this with absolute tracking information. This then yields the absolute head orientation that can be used to adapt the offsets into a user's virtual view. The fact that humans tend to walk and look into the same direction can be further exploited to reduce classification errors by constituting a high probability of forward movement and view direction.
Some examples of apparatuses and/or methods will be described in the following by way of example only, and with reference to the accompanying figures, in which
Various examples will now be described more fully with reference to the accompanying drawings in which some examples are illustrated. In the figures, the thicknesses of lines, layers and/or regions may be exaggerated for clarity.
Accordingly, while further examples are capable of various modifications and alternative forms, some particular examples thereof are shown in the figures and will subsequently be described in detail. However, this detailed description does not limit further examples to the particular forms described. Further examples may cover all modifications, equivalents, and alternatives falling within the scope of the disclosure. Like numbers refer to like or similar elements throughout the description of the figures, which may be implemented identically or in modified form when compared to one another while providing for the same or a similar functionality.
It will be understood that when an element is referred to as being “connected” or “coupled” to another element, the elements may be directly connected or coupled or via one or more intervening elements. If two elements A and B are combined using an “or”, this is to be understood to disclose all possible combinations, i.e. only A, only B as well as A and B. An alternative wording for the same combinations is “at least one of A and B”. The same applies for combinations of more than 2 Elements.
The terminology used herein for the purpose of describing particular examples is not intended to be limiting for further examples. Whenever a singular form such as “a,” “an” and “the” is used and using only a single element is neither explicitly or implicitly defined as being mandatory, further examples may also use plural elements to implement the same functionality. Likewise, when a functionality is subsequently described as being implemented using multiple elements, further examples may implement the same functionality using a single element or processing entity. It will be further understood that the terms “comprises,” “comprising,” “includes” and/or “including,” when used, specify the presence of the stated features, integers, steps, operations, processes, acts, elements and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, processes, acts, elements, components and/or any group thereof.
Unless otherwise defined, all terms (including technical and scientific terms) are used herein in their ordinary meaning of the art to which the examples belong.
Although the principles of the present disclosure will be mainly exemplified in the context of VR in the following, the skilled person having benefit from the present disclosure will appreciate that these principles can also straightforwardly be translated to numerous other fields of technology where sensor data can be used to provide orientation information of an animate or inanimate movable object. In case of IMUs, the relative sensor data will inevitably lead to error accumulation over time and thus needs to be corrected now and then. The present disclosure proposes a concept for such correction by combining the IMU sensor data with positions or position tracking data.
As absolute sensors (such as magnetometers) do not work reliably in practice, only relative movement estimation sensors can be exploited. For HMDs used in VR applications, for example, this inevitably leads to a wrong head orientation in the long-term due to sensor drift.
What causes motion sickness is that when the user moves straight forward in reality (in direction of {right arrow over (m)}) to reach the pillar 303, the VR view shows a sidewards movement, see also the bottom row of
Method 400 includes receiving 410 position data indicative of a current absolute position of the object 310. For that purpose the object 310 may generate or cause NP positional data, for example, by means of an integrated GPS sensor. Another option could be to use a Radio-Frequency (RF) NP tracking system. In this case the object 310 could be tracked by means of an active or passive RF position tag attached to the object and emitting RF signals to a plurality of antennas. The object's position can then be determined based on the different time-of-flights of the RF signals to the different antennas. A real direction of movement {right arrow over (m)} of the object can be determined 420 based on the position data. The object's orientation information {right arrow over (v)} can be corrected 420 based on the determined real direction of movement {right arrow over (m)}.
Apparatus 500 comprises an input 510 configured to receive position data indicative of a current absolute position of the object 310. Apparatus 500 can further comprise an input 520 to receive inertial sensor data from one or more inertial sensors. Orientation information ii can be derived from the inertial sensor data. Processing circuitry 530 of apparatus 500 is configured to determine a real direction of movement {right arrow over (m)} of the object 310 based on the position data and to correct the object's orientation information {right arrow over (v)} based on the determined real direction of movement {right arrow over (m)}. The corrected orientation information can be provided via output 540.
The skilled person having benefit from the present disclosure will appreciate that apparatus 500 can be implemented in numerous ways. For example, it can be an accordingly programmed programmable hardware device, such as a general purpose processor, a Digital Signal Processor (DSP), a Field Programmable Gate Array (FPGA), or an Application Specific Integrated Circuit (ASIC). In one example, apparatus 500 may be integrated into a HMD 310 for VR applications or another (remote) device controlling the HMD. The HMD may be comprised of a smartphone or another portable device in some examples.
As absolute position tracking is used to generate the (absolute) position data, a solution is to combine the position data or positional tracking information with the relative motion sensors. Assuming that persons look forward in forward movements (ƒ/0, forward at 0 degrees), his/her positions [p0 . . . pt] can be recorded for t time steps. A position trajectory vector {right arrow over (m)}=pt . . . p0≈{right arrow over (r)} can be extracted from the recorded positions. Thus, determining 420 the real direction of movement {right arrow over (m)} can include determining {right arrow over (m)} based on position data corresponding to subsequent time instants. We can use {right arrow over (m)} to derive an offset to see {right arrow over (v)},
Finally, we can add a correction factor ϕ to correct the coordinate system for the correct quadrant Qi (ϕ∈{Q1=90°, Q2=180°, Q3=270°, Q4=360° }) of the coordinate system. Thus,
{right arrow over (r)}=({right arrow over (r′)}+ϕ)mod(360°).
This basic implementation can estimate correct offsets if the user's head orientation equals his/her movement direction, i.e., if {right arrow over (r)}≈{right arrow over (m)}. However, in reality this is not always the case and a wrong orientation might be estimated, see
Thus, some examples propose to continuously analyze the IMU sensor data and automatically detect ƒ/0 movements to trigger orientation estimation. The skilled person having benefit from the present disclosure will appreciate, however, that also any other predefined head/body relation could be used and trained for correcting the IMU based head orientation. Besides, the automatic movement detection can take care of the maximum tolerated heading drift to keep immersion on a high level. Therefore, some examples monitor the drift permanently and keep it as small as possible.
In some implementations, the sensor data comprises 3-dimensional acceleration data and 3-dimensional rotational velocity data. Some of today's low-cost accelerometers track gravity and linear acceleration accraw=acclin+accgrav at 200 Hz with a maximum of ±16 g, and some gyroscopes track the velocity of rotations at 200 Hz with a maximum of ±2000°/s.
where x can be the raw acceleration, a is the filter coefficient to the feedback filter with filter order Mlow=3 and Mhigh=1, and b is the filter coefficient to the feed-forward filter with filter order Nlow=3 and Nhigh=1. Therefore, each filter (LP, HP) can have its own Butterworth filter design (a, b). For uneven (e.g. N=1)
and bi=1; rims.
Example feature extractions can fuse linear acceleration data with smoothed gyroscope data. Thus, method 400 can optionally further include filtering the (raw) sensor data with a smoothing filter to generate smoothed sensor data. Smoothing of the raw input data while retaining signal characteristics can be achieved by a Savitzky-Golay filter, with frame size F=25 and polynomial order N=3, for example.
In a training-phase the input data can be sliced in windows of constant sample number. The data can be analyzed by a motion state module that can detect motions in the acceleration data by min/max-thresholding acceleration peaks and the time between them. By specifying the number of zero-crossings and their direction we can deduce additional information (step with foot∈[l, r]) about the current window. In a live-phase data can be processed in sliding-windows. In contrast to commonly used window overlaps of 50% a sliding window approach can be used (as this can also be beneficial for auto motion detection). The length of the sliding window can adapt to available CPU time and required response time by a number of future samples ωwait according to physical limitations to create the new data frame upon. However, the window length should be long enough to capture the sensor data of an activity completely. As a human performs a minimum of 1.5 steps/s (while walking at 1.4 m/s in reality, users tend to walk slower in VR: slow 0.75 m/s, normal 1.0 m/s, fast 1.25 m/s) a minimal length of 1000 ms can be used to yield high confidence.
The aforementioned example preprocessing of the raw inertial sensor data is summarized in
Raw sensor data (e.g., acceleration data and/or smoothed gyroscope data) can be smoothened by a smoothening filter 810 (e.g., Savitzky-Golay Filter) without significant feature loss. In order to isolate unwanted signal components (such as gravity, for example) the (smoothened) sensor data can be LP and/or HP filtered 820. The filtering can relate to both acceleration data and gyroscope data or only to one of them. The preprocessed sensor data can then be used for data compression. One example of data compression is to extract one or more statistical and/or heuristic features from the sensor data to generate sensor data feature vectors. It is proposed to use a minimum number of statistical and/or heuristic features to save performance while still providing highly confident results. Basically, the features can be selected to maximize variance between and minimize variance within predefined movement classes. Table I below introduces some features commonly used and shows the degree of freedom and number of features necessary.
One example uses 18 features for the data: 3 axes accelerometer and gyroscope, each represented by mean, StD and PCA features. Thus, in some examples compressing the sensor data can comprise extracting a mean value, standard deviation, and can comprise a Principal Component Analysis (PCA) of the sensor data.
Mean.
The mean
feature value per axis, with the number of samples N and the input data X.
StD.
The standard deviation
value based on the variance (ϕ2), with mean (μ), the number of samples N and signal data X.
PCA.
The score values provided by PCA can be obtained based on a singular value decomposition (SVD). Given an arbitrary matrix X of dimension n×p (matrix of n observations on p variables measured about their means), we can write matrix X=ULA′ where U is a (n×r) unitary matrix and A′ is (p×r) is the adjunct of an unitary matrix A. Each matrix with orthonormal columns so that U′U=Ir and A′A=Ir. L is a (r×r) diagonal matrix (L=Σ, with its singular values on the diagonal) with the rank r of our arbitrary matrix X. Element uik is the (i, k)th element of U and ajk is the (j, k)th element of A.
is the (k)th element of the diagonal matrix L:
Xak=Uu
scores with i=1, 2, . . . , n and k=1, 2, . . . , r. In the context of a SVM classifier the determined PC scores zik represent the perpendicular distances of the observations n from the best-fitting hyperplane.
In some examples, the method includes classifying a relation between the real orientation of the object {right arrow over (r)} and the object's real direction of movement {right arrow over (m)} based on the compressed sensor data or the sensor data feature vectors. Optionally, a confidence level can be generated with respect to the classification result. To classify movement types several classifiers can be used alternatively or in combination, such as, for example, Decision Trees (DT), cubic K-Nearest Neighbor (K-NN), and cubic Support Vector Machines (SVM). That is to say, classifying the relation between the object's real orientation {right arrow over (r)} and the object's real direction of movement {right arrow over (m)} can be performed using one or more classification algorithms.
1) Decision Tree: Here a Classification And Regression Tree (CART) can be used to achieve reasonable results with maximum splits of 100 (complex tree) and a minimal number of leafs of 1. The less CPU-intensive Gini diversity index IG can be used as split criterion:
I
G=1−Σi=1Cp(i|t)2,
subjecting node t∈SC-1 with SC-1={x:x∈[0,1]i, Σi=1Cxi=1} and observed fractions p(i|t) of all classes C with class i that reaches the node t. Additionally, we can save performance as we use CART without surrogate decision splits as we do not miss any data in the classification process. We can prune the tree until the number of parents for each leaf is greater than 10 at the Gini impurity criterion IG. We can allow leave merging of leafs (child nodes C) that both originate from the same parent node (P) and that yield a sum of risk values (Ri) that are greater or equal to the risk related to the parent node (RP): (Σi=1n=|C|Ri)≥RP.
2) K-Nearest Neighbor: Some implementations may also use a cubic K-Nearest-Neighbor classifier with a distance parameter (k=3) and a distance function (X×Y)n→(X→Y). As an example, the cubic Minkowski distance metric can be chosen which holds instance x, label y, distance weight ωi, and dimensionality m:
where x, y∈X=m. Ties that occur when at least two classes have the same number of nearest points among the k nearest neighbors can be broken based on their smallest index value. Instead of a less accurate unmodified K-dimensional tree, a CPU-intensive but sufficient accurate exhaustive (brute force) search algorithm can be used. The distance weights ωi are the squared inverse of the instances x and labels y: ωi=(((x−y)TΣ(x−y))2)−1. The dimensionality of neighbors m=10 can provide reasonable results. A data standardization approach can be used that rescales data to improve situations where predictors have widely different scales. This can be achieved by centering and scaling each predictor data by the mean and standard deviation.
3) Support Vector Machine: Another example implementation uses a cubic SVM with a homogeneous polynomial kernel K(xq, xi)d=ϕ(xq)dϕ(xi)d applied to input sensor data feature vectors X(xq, xi) with space mapping function ϕ(X) and order d. We define training vectors t in training space Rn with xi∈Rn with i=1, t as being divided into classes and a label vector y∈Rt that yi∈{1, −1} holds. The SVM can solve the following optimization problem, subject to
(i) yi(wTϕ(xi)+b)≥1−ξ) and
(ii) ξi≥0 and i=1, . . . , t with ϕ(xi) mapping xi into a higher-dimensional space, with regularization parameter (box constraint level) C and with slack variable ξi:
½∥wTw∥+CΣi=1tξi,
The solution of this optimization provides an optimal normal vector (w) to the hyperplane that satisfies w=Σi=1tyiαiϕ(xi) with weight variable αi. Finally, the decision function ƒ(xq) for a feature vector xq holds:
ƒ(xq)=sgn(wTϕ(xq)+b)=sgn(Σi=1nαiyiK(xq,xi)+b)
An example SVM classifier can use a cubic (d=3) kernel function K(xq, xi)=(1+γ·xqTxi)d to be able to separate non-linear features with C=1 whereas the kernel scale γ=4. The hyperparameters can be obtained by a 10-fold cross validation based on the training data (70%). The cross validation determines the average error over all testing folds (10 divisions) which results in an accuracy overview. A multiclass SVM or a One-vs-All SVM type can be used to provide multiclass classification.
Each of the mentioned example predictors or classifiers (DT, k-NN and SVM) can estimate a class label and its probability or confidence. In some examples, the class labels can be estimated with a rate or frequency higher than a maximum frequency component of the object's movement. For example, during the time it takes a human to turn his head from left to right, numerous class label estimates can be predicted, each with a related probability/confidence ζ. Optionally, the confidences ζ that are provided by our trained classifiers can be further improved by relating them over time and/or considering human-centric motion behavior. In other words, a present confidence of an estimated class label output can be verified by taking into account previous confidences of previous class label outputs or class label hypothesizes and/or by taking into account one or more predetermined physical properties or limitations of the object, in particular limitations of human motion (e.g. head turn rate). For example, a current confidence of a current most probable class label estimate can be compared with one or more previous confidences of previous class label outputs or a mean value thereof. In more complex scenarios, for example, when there are numerous different class labels, also more complex models can be employed, such as Hidden Markov Models (HMMs), for example. There, not only confidences of previous actual class label outputs but also confidences of previous class label hypothesizes corresponding to numerous different class labels can be taken into account.
In some examples, we can predefine the amount s of historical confidences ζH we want to consider. Hence, the classifier can predict confidences at time ζ(ti) which can be compared to previous (historical) confidences ζH at times ti∈[tn-s, . . . , tn]. A starting probability can be provided to the first confidence observation P(ζH, ti=0). As every confidence depends on the probability of its ancestors P(ζH, tn-s) the probability of the current confidence can be determined in respect to n-s past confidences. Therefore, we can significantly improve the trustworthiness of the current confidence and also identify single outliers. The probability of the current confidence P(ζ, tn) and the current confidence ζ(n) based on all n, with s>0 of historic confidences are:
with initial historic probability at ti=0 with P(ζH, ti)=1.0:
ζH(ti)=P(ζH,ti)·ζ(ti)
A summarizing overview of the aforementioned processing acts is described in
(i) Training
(ii) Live
In both cases feature vectors can be extracted from the input data (see reference numeral 1010). During the training phase 1020 smoothened sensor data can be provided to the classifier (e.g., SVM) 1030. The smoothened sensor data can be analyzed by feature extraction algorithms 1010 in order to lift the input data into their feature space. The extracted features can be used as inputs to train or optimize the classifier 1030. During the live phase the trained classifier 1030 can predict a motion class or label together with its confidence. For example, the classifier 1030 gets an inputs signal corresponding to a ƒ/0 movement and predicts the label ƒ/0 with a confidence of 90%. The classifier can use a so-called cross-fold validation principle to judge how well it can classify the input data. With a probabilistic assumption of human motion over time the result can further be improved, since a human mostly views into the direction (s)he is moving. Also a human cannot change his/her head orientation from left to right (or vice versa) within two predictions, e.g. ([5, . . . , 20] ms). Thus, probabilistic dependencies over time and/or history of predicted confidences can be used to predict the current probability of the predicted confidence and to correct the label if necessary (see reference numeral 1040).
An example view adaptation (post-)procedure is illustrated in
Some examples estimate the head orientation using a 6 degree of freedom (DOF) sensor and implement a complementary filter. This filter also accounts for static and temperaturedependent biases as well as additive, zero-mean Gaussian noise. We define the current gyroscope based angular velocity ωx,y,z=ωx,y,z+b+n, with static and temperature depended bias b and additive, zero-mean Gaussian noise n. Subjecting the filter coefficient α and current accelerometer data ax,y,z and converting radians to degrees (180/π) we can determine the roll ϕ and pitch θ orientations:
The roll ϕ and pitch θ orientation can be estimated stable over long-time periods. The yaw orientation can be determined by fusing accelerometer and gyroscope (neglecting the magnetometer). An error of the object's (yaw) orientation information can be corrected incrementally or iteratively. For example, a spherical linear interpolation (SLERP) mechanism can be applied that linearly interpolates the determined heading orientation error ψerr into the current view orientation ψcur of the user. The interpolation approximates the current drift ψerr into the users view by applying a small immersive portion of the offset ωimm·ψerr, while the sign (sgn) of current rotation is sgn(ψcur)=sgn(ψerr). The immersion can be optimized adjusting ωimm (number of degrees per second) per iteration. The corrected heading orientation based on an initial head orientation with yaw ψin can be written as:
A complete example orientation correction procedure is summarized in
Some implementations propose to use supervised machine learning to classify various ranges of ω. If among all the ranges the ω=0°-moment class has the highest probability, we have detected a {right arrow over (m)}={right arrow over (r)}-moment. From the input data of the IMU (accelerometer and gyroscope) we can extract the linear acceleration component, i.e., the movement energy in every direction axis and can define specific features that characterize and represent a certain range of ω. We can use these features to train our classificator for all the ω-classes a-prioi on pre-recorded and labeled training data. At runtime, we can use these models to classify co on live sensor data, and hence detect ω=0°-moments if the corresponding classifier yields the best fit/highest confidence.
While conventional concepts lack a long-time stable heading bias estimation and hence decrease immersion, some examples of the present disclosure propose to combine signal processing, feature extraction, classification and view harmonization that can enable immersive head orientation estimation.
The aspects and features mentioned and described together with one or more of the previously detailed examples and figures, may as well be combined with one or more of the other examples in order to replace a like feature of the other example or in order to additionally introduce the feature to the other example.
Examples may further be or relate to a computer program having a program code for performing one or more of the above methods, when the computer program is executed on a computer or processor. Steps, operations or processes of various above-described methods may be performed by programmed computers or processors. Examples may also cover program storage devices such as digital data storage media, which are machine, processor or computer readable and encode machine-executable, processor-executable or computerexecutable programs of instructions. The instructions perform or cause performing some or all of the acts of the above-described methods. The program storage devices may comprise or be, for instance, digital memories, magnetic storage media such as magnetic disks and magnetic tapes, hard drives, or optically readable digital data storage media. Further examples may also cover computers, processors or control units, e.g. implemented in Smartphones, programmed to perform the acts of the above-described methods or (field) programmable logic arrays ((F)PLAs) or (field) programmable gate arrays ((F)PGAs), programmed to perform the acts of the above-described methods.
The description and drawings merely illustrate the principles of the disclosure. Furthermore, all examples recited herein are principally intended expressly to be only for pedagogical purposes to aid the reader in understanding the principles of the disclosure and the concepts contributed by the inventor(s) to furthering the art. All statements herein reciting principles, aspects, and examples of the disclosure, as well as specific examples thereof, are intended to encompass equivalents thereof.
A functional block denoted as “means for . . . ” performing a certain function may refer to a circuit that is configured to perform a certain function. Hence, a “means for s.th.” may be implemented as a “means configured to or suited for s.th.”, such as a device or a circuit configured to or suited for the respective task.
Functions of various elements shown in the figures, including any functional blocks labeled as “means”, “means for providing a sensor signal”, “means for generating a transmit signal.”, etc., may be implemented in the form of dedicated hardware, such as “a signal provider”, “a signal processing unit”, “a processor”, “a controller”, etc. as well as hardware capable of executing software in association with appropriate software. When provided by a processor, the functions may be provided by a single dedicated processor, by a single shared processor, or by a plurality of individual processors, some of which or all of which may be shared. However, the term “processor” or “controller” is by far not limited to hardware exclusively capable of executing software, but may include digital signal processor (DSP) hardware, network processor, application specific integrated circuit (ASIC), field programmable gate array (FPGA), read only memory (ROM) for storing software, random access memory (RAM), and non-volatile storage. Other hardware, conventional and/or custom, may also be included.
A block diagram may, for instance, illustrate a high-level circuit diagram implementing the principles of the disclosure. Similarly, a flow chart, a flow diagram, a state transition diagram, a pseudo code, and the like may represent various processes, operations or steps, which may, for instance, be substantially represented in computer readable medium and so executed by a computer or processor, whether or not such computer or processor is explicitly shown. Methods disclosed in the specification or in the claims may be implemented by a device having means for performing each of the respective acts of these methods.
It is to be understood that the disclosure of multiple acts, processes, operations, steps or functions disclosed in the specification or claims may not be construed as to be within the specific order, unless explicitly or implicitly stated otherwise, for instance for technical reasons. Therefore, the disclosure of multiple acts or functions will not limit these to a particular order unless such acts or functions are not interchangeable for technical reasons. Furthermore, in some examples a single act, function, process, operation or step may include or may be broken into multiple sub-acts, -functions, -processes, -operations or -steps, respectively. Such sub acts may be included and part of the disclosure of this single act unless explicitly excluded.
Furthermore, the following claims are hereby incorporated into the detailed description, where each claim may stand on its own as a separate example. While each claim may stand on its own as a separate example, it is to be noted that—although a dependent claim may refer in the claims to a specific combination with one or more other claims—other examples may also include a combination of the dependent claim with the subject matter of each other dependent or independent claim. Such combinations are explicitly proposed herein unless it is stated that a specific combination is not intended. Furthermore, it is intended to include also features of a claim to any other independent claim even if this claim is not directly made dependent to the independent claim.
Number | Date | Country | Kind |
---|---|---|---|
10 2017 100 622.2 | Jan 2017 | DE | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/EP2018/050129 | 1/3/2018 | WO | 00 |