Estimation of the direction of arrival (DOA) and delay of room reflections from reverberant sound may be useful for a wide range of applications such as speech enhancement, dereverberation, source separation, optimal beamforming and room geometry inference.
Early reflections have a key role in sound perception, as they can improve speech intelligibility and listener envelopment. They are also related to the impression of source width, loudness and distance. Therefore, exploitation of the early reflections can be beneficial in parametric spatial audio methods and spatial audio coding.
Existing methods for the estimation of the parameters of early reflections can be categorized as blind and non-blind. Non-blind methods, operate on room impulse response signals, or, alternatively, assume that an anechoic recording of the sound source is available. Blind methods operate on microphone signals directly, and assume that no other information is available, as is often the case in many real world applications.
Such methods employ spatial filtering, subspace methods (e.g., MUSIC or ESPRIT), sparse recovery methods which attempt to find the smallest subset of sources that explains the measured signals. Another type of methods that can localize correlated sources is based on modeling the source signals as deterministic unknowns.
The description above is presented as a general overview of related art in this field and should not be construed as an admission that any of the information it contains constitutes prior art against the present patent application.
The figures illustrate generally, by way of example, but not by way of limitation, various embodiments discussed in the present document.
For simplicity and clarity of illustration, elements shown in the figures have not necessarily been drawn to scale. For example, the dimensions of some of the elements may be exaggerated relative to other elements for clarity of presentation. Furthermore, reference numerals may be repeated among the figures to indicate corresponding or analogous elements. References to previously presented elements are implied without necessarily further citing the drawing or description in which they appear. The figures are listed below.
Aspects of disclosed embodiments pertain to systems and/or methods for blind estimation of direction of arrival and delays of reflected and non-reflected sound waves or sources originating from, for example, a single sound wave output or speaker, or multiple speakers, located in a room. It is noted that sources may contain direct sound waves and/or reflected sound waves. The systems and/or methods disclosed herein may have be employed in conjunction with a variety of applications including, for example, audio signal processing, improving audio output, improving perceived sound quality, improving Virtual or Augmented Reality experience, estimating position of walls in a room based on reflections, etc.
The term “source” as herein may also pertain to source signals or source data descriptive of the source.
Early reflections are delayed and attenuated copies of the direct sound (also: non-reflected source). Consequently, a narrowband correlation between a non-reflected source and its reflections has a phase that is linear or substantially in frequency. The property of linearity or substantial linearity can be utilized to construct a transform that can separate reflections having different delays. The transform also enables differentiating between multiple reflections having identical or similar directions of arrival (DOAs). In some examples, the transform employed for implementing part of the method may herein be referred to as Phase Aligned Correlation (PHALCOR). The expression “early reflection” may for example pertain to the 1st room reflection, to the first 2 or the first 3 reflections, or the at least first 10 reflections, or the at least first 20 room reflections or more, or any number of initial reflections up to a certain threshold of initial reflections. The number of early reflections that may be characterized with the method presented herein may, for example, up to 100.
The term “Phase Aligned” is chosen for the reasons set forth herein. PHALCOR is based on a signal model in which the reflection signals are explicitly modeled as delayed and scaled copies of the direct sound. Before the phase alignment transformation, at each frequency, the contribution of each source to the correlation matrix is the product of a matrix with a complex number. The said matrix is not dependent on frequency. However, the complex number does depend on frequency, and specifically at each frequency, the phase (aka argument) of the number is different.
The phase alignment transform can “align” all these complex number to have the same phase. If this alignment is successful, the summation of all of the aligned correlation matrices (of different frequencies) is coherent and the source becomes more dominant. When a selected transform parameter τ matches with a delay of a certain source, the contribution of that reflection is significant. The alignment may be considered to be successful if the parameter τ of the transformation is approximately equal to the delay of the source.
The transform can separate reflections with different delays, enabling the detection and localization of reflections with similar DOAs. The DOAs and delays of the early reflections can be estimated by separately analyzing (e.g., processing) the left and right singular vectors of the transformed matrices using sparse recovery techniques.
The method may be applied on sound data that are descriptive of electronic signals which are representative of received sound waves. Sources may contain a plurality of frequencies, received via audio channels. Every audio channel may be considered to contain a mixture of all reflected sources and the direct source. It is noted that received sound waves may also encompass the meaning of the term “simulated or virtual sound waves”. In some examples, the method may include capturing reflected and non-reflected sound waves, by converting the reflected and non-reflected sound waves incident onto a plurality of microphones of one or more microphone arrays, into corresponding electronic signals. The reflected and non-reflected sound waves may be created by generating sound waves which are output by a single sound output located in the room.
In some embodiments, the method may include determining, based on the sound data, a correlation value between at least two audio channels or between each two audio channels of a plurality of audio channels to obtain, for a same frequency of a plurality of frequencies, a set of frequency-related correlation values. The set of frequency-related correlation values are represented in a first multidimensional array, which may be referred to as a volumetric or 3D spatial correlation matrix or 3D SCM. The 3D SCM may be described as being composed of a plurality of 2D SCMs for a respective plurality of frequencies. For example, each such 2D SCM relates to a spatial correlation of received sound waves of a same frequency.
In some embodiments, the method may further include performing a transpose of the frequency-related correlation values of the 3D SCM from the frequency domain to the time domain. This may be accomplished by performing an inverse weighted Fourier transform of the 3D SCM (i.e., on the values represented by the 3D SCM) to obtain a plurality of subsets (“slices”) of time delay values, represented by a second multidimensional array, which may be referred to as a 3D spatial time-delay representation matrix or 3D TDM. The 3D TDM contains a plurality of 2D TDMs relating to a respective plurality of time delays. For example, each one of such 2D TDM of the 3D TDM relates to a different time delay between two received sound waves.
In some embodiments, the method may further include analyzing each 2D TDM of time delay representations to extract information about the non-reflected sound wave and early reflected sound waves. In some embodiments, the method may include analyzing each 2D TDM of time delay representations to obtain or construct a plurality of sets of values representing a linear combination of a plurality of reflections having a same or substantially same delay. In some examples, the analyzing of the plurality of 2D TDMs of the 3D TDM may be performed through a method called Singular Value Decomposition (SVD), or any other low-rank approximation method.
In some embodiments, the method may further include analyzing the different linear combinations of sources of each set or vector, to differentiate between the source and at least one early reflection for a given delay may for example be performed through sparse recovery techniques such as, for example, Orthogonal Matching Pursuit (OMP). The column entries of the dictionary matrix are steering vectors, which are known if the geometry of the array is known, or from a calibration step.
Differentiating between at least two reflections at a given delay relative to the direct source allows determining Direction of Arrival (DOA) estimations for each one of such reflection. DOAs estimates may be determined, for example, by matching the obtained differentiated reflections and their associated delays with previously determined steering or reference vectors. A match is considered to be found if a certain matching threshold criterion is met, as is outlined herein below in more detail.
It is noted that the term “match”, as well as grammatical variations thereof also encompasses the meaning of the term “substantial match”.
In some embodiments, the method may further include performing clustering on the reflection data information for identifying outliers and clusters.
The notation used herein is presented briefly in this section. Lowercase boldface letters denote vectors, and uppercase boldface letters denote matrices. The k,l entry of a matrix A is denoted by [A]k,l. The complex conjugate, transpose, and conjugate transpose are denoted by (⋅)*, (⋅)T and (⋅)H, respectively. The Euclidean norm of a vector is denoted by ⋅. The outer-product of two vectors a and b is the matrix abH. The imaginary unit is denoted by i.
S2 denotes the unit sphere in R3. The symbol Ω∈S2 represents a direction (DOA) in 3D-space, i.e., a pair of azimuth-elevation angles.
∠/(≠,Ω′)≙arccos(Ω·Ω′) is the angle between directions Ω and Ω′. “⋅” is the dot product in R3? ≠0 represents the DOA of the non-reflected source.
It is noted that the delays and DOAs are determined with respect to the source signal.
Consider a sound field composed of M plane waves with amplitudes a1(f), . . . , aM(f) at frequency f, and directions Ω1, . . . , ΩM. The sound pressure p at any point in space x∈R3 can be formulated as follows:
where k=2πf/c is the wave-number, and c is the speed of sound. When the sound field is composed of a continuum of plane waves, the summation is replaced by an integral over the entire sphere, and the amplitudes are replaced by the plane wave amplitude density (PWAD) a(f,Ω):
For a fixed frequency, the PWAD is a function on the unit sphere. As such, it is possible to describe it by its spherical Fourier transform (SFT) coefficients (B. Rafaely, Fundamentals of Spherical Array Processing. Berlin, Germany: Springer, 2015, vol. 8. (Rafaely 2015)).
where is the order-n and degree-m spherical harmonic. The SFT of the PWAD can be used to represent the sound pressure as follows (Rafaely 2015):
where r=∥x∥, Ω=x/r, and jn is the n'th order spherical Bessel function of the first kind. Equation (3) may be approximated by truncating the infinite sum to order N=┌kr┐ (Rafaely 2015).
k and k′ are two 2 indices in the summation. k is the index of the first source in a pair, k′ is the index of the second source in a pair.
A microphone array can be used to estimate the coefficients of the SFT of the PWAD with order less than or equal to N, by inverting the (truncated) linear transformation (3), a process known as plane wave decomposition. The existence and stability of the inverse transform depend on the frequency f and physical properties of the (typically spherical) microphone array. Further details can be found in Rafaely 2015.
The resulting signals are stacked in a vector of length (N+1)2 as follows:
Embodiments may herein be described in terms of the SFT of the PWAD. Processing and analysis in this domain may offer several advantages. First, the PWAD provides a description of the sound field that is independent of the microphone array. Second, the steering vectors, i.e., the response to a plane-wave from a given direction, are frequency independent. A steering or reference vector y(Ω) may for example be defined by the following expression:
In some examples, a constant
chosen for convenience such that ∥y(Ω)∥=1.
This section presents an example model for describing and implementing embodiments. Considering for example a sound field in a room and which is comprised of a single source, with a frequency domain signal s(f), and a DOA Ω0, relative to a measurement point in the room. As the sound wave output by the speaker propagates in the room, it is reflected from the room boundaries (e.g., walls, ceiling, floor). The k'th reflection may for example be modeled as a separate source with DOA Ωk and signal sk(f), which is a delayed and scaled copy of the source signal (see also: J. B. Allen and D. A. Berkley, “Image method for efficiently simulating small-room acoustics,” J. Acoustical Soc. Amer., vol. 65, no. 4 pp. 943-950, 1979 (Allen et al. 1979)), and may be expressed for instance, by the following mathematical expression:
where τk is the delay relative to the direct sound, and αk is the scaling factor. τ0 and a0 are accordingly normalized to 0 and 1, respectively. In some examples it may be assumed that the delays are sorted such that τk−1≤τk.
Furthermore, anm(f) denotes the vector of the SFT coefficients of the PWAD, up to order N, as a function of frequency. Assuming the sources are in the far field, anm(f) can for example described by the following model:
n(f) represents noise and late reverberation terms, and K is the number of early reflections. Note that although both the model and the proposed method can be generalized to include nearfield sources, this generalization may require near-field steering vectors and some information or estimation of source output distances, which could relate to source delays. Nevertheless, the near-field steering become useful for sources very close to the array (E. Fisher and B. Rafaely, “Near-field spherical microphone array processing with radial filtering,” IEEE Trans. Audio, Speech, Lang. Process., vol. 19, no. 2, pp. 256-265, February 2011 (Fisher 2011))) in which case room reflection may be considered negligible. Therefore, embodiments are herein described with the far-field assumption, which is valid for comparatively compact microphone arrays.
Any assumption described herein shall not be construed in a limiting manner. Hence, embodiments may also encompass models where these assumption may not be applied. For example, embodiments may encompass models where far-field assumptions do not apply.
In the discussion that follows R(f) denote the spatial correlation matrix (SCM) at frequency f:
Substituting Eq. (6) into Eq. (9), and assuming n(f) and s(f) are uncorrelated, yields:
In some embodiments, the method may be considered to include applying a transform or transformation on the SCM. The transform may be considered to be based on a “phase alignment” transformation, e.g., as defined herein.
Equation (10) can be rewritten as:
When neglecting N, the matrix R is a mixture (also: linear combination) of the outer products of every pair of sources' steering vectors. The mixing coefficients are the entries of M, and therefore it is henceforth referred to as the mixing matrix. Note that the mixing coefficients are frequency dependent, but the steering vectors are not. The k,k′ entry of matrix M(f) is the correlation value between source k and source k′.
Supposing that the k,k′ entry of M(f) has a dominant magnitude, relative to all other entries. This leads to the following, for some c∈C:
Estimating Ωk and Ωk′ for a case where dominant entries are likely, is easier than in the general case. However, according to (12), there may not be a dominant entry in M. Not only is the matrix M Hermitian, but also the magnitudes of its entries are products of amplitudes between pairs of two sources. Assuming the amplitude of the direct sound may be dominant, the 0, 0'th entry that corresponds to the direct sound only may indeed be dominant. Clearly, this is not useful for localizing the early reflections. The method presented herein enhance specific entries in M, allowing localizing “non-dominant” entries or reflections.
The following matrix is defined, herein referred to as “Phase-Aligned SCM”:
where τ≥0, Δf is the frequency resolution, and Jf is an integer parameter representing the overall number of frequency points. are non-negative weights. Note that when τ=0 and wj=0, then R is identical to the SCM that may be obtained by frequency smoothing. The matrices
Similarly to Eq. (13), Eq. (16) presents
Similarly to Eq. (13), Eq. (16) presents
First, a explicit expression for the absolute value of the entries of
The second equality in Eq. (17) is due to the definition of M(f) in Eq. (12), while the third equality is due to Eq. (5). Considering the triangle-inequality:
which is true since σs2 and wj are non-negative.
Along with Equation (16) this result implies that this result implies that among all possible delays T, it is the delay between two sources that maximizes the contribution of the outer product of their steering vectors to
In this subsection the source signal is assumed to be white, such that such that σs2(f) is constant in f. In examples in which the weights wj are all set to 1, equation (17) can thus be further simplified
Dn(x) often arises in Fourier analysis, and is related to the Dirichlet kernel. It has a sinc-like behavior, with a main lobe centered around x=0, and a null-to-null width of 2/n. Correspondingly, |[M(τ,f)]k,k′| has a main lobe centered at τ=τk−τk′, and a width of 2(JfΔf)−1. Therefore, Jf determines the temporal resolution, which affects the ability to separate reflections with different delays. This result can be used as a guideline for choosing Jf. Note that the same analysis is valid for non white signals, if the weights satisfy
With reference to
(see Eqs. (5) and (12)). Its real and imaginary parts are added to the k and k′ axes, respectively, for the purpose of illustration of the complex function. This illustration demonstrates that as the delay between two sources is decreased, the period of the corresponding entry, as a function of f, increases. At the extreme, the delay between a source and itself is zero, and so the diagonal entries are constant in frequency.
Since in that case the off diagonal entries can be interpreted as correlations between different sources, this demonstrates the fact that optional frequency smoothing (τ=0) performs decorrelation of the sources. If the reflections had the same delay,
The analysis presented in the previous subsection shows that if the weights {wj}j are inversely proportional to as2(fj), the phase alignment transform can effectively separate reflections with different delays. Assuming wj=c/σs2(fj) (where c is some positive constant) is equivalent to assuming wj=1 and σs2(fj)=c for all j, since their product is all that matters.
As σs2(f) is usually unknown, it must be estimated from the data. A very coarse, yet simple, estimate is given by the trace of R(f). By neglecting N in Eq. (13) and substituting Eqs. (12), (5) and (18), we get:
and returns the real part of a complex scalar. We argue that bk is typically small in comparison to 1, since usually the amplitudes decay rapidly. Furthermore, when two reflections have similar amplitudes, it is usually the case that they have very different DOAs, which implies that the inner product of their steering vectors is small (Rafaely 2015).
In some examples, the weights could have another role. Eq. (20) suggests that even if the weights are inversely proportional to σs2, reflections with delays other than r may still be dominant in
In the following, the dependence on the frequency f is omitted for brevity. It is important to note that there is no direct access to
The optimal rank 1 approximation (in the least squares sense) of (τ) is denoted by
where στ denotes the first (largest) singular value of
The method may be based on the analysis of the first singular vectors of the phase aligned SCM R(τ, f) that are presented herein.
In some embodiments, the method may first include performing plane wave decomposition on captured microphone signals (block 2110).
In some examples, analysis of SCM R(τ, f) may require that the plane wave decomposition signals to be in the frequency domain.
In some other examples, the plane decomposition signals may be approximated using, for example, the short time Fourier transform (STFT) (block 2120). STFT may enable localized analysis in both time and frequency. In some examples, it may be assumed that the window length of the STFT is sufficiently larger than τκ, such that the multiplicative transfer function (MTF) approximation in the STFT is applicable for Eq. (5) (Y. Avargel and I. Cohen, “On multiplicative transfer function approximation in the short-time Fourier transform domain,” IEEE Signal Process. Lett., vol. 14, no. 5, pp. 337-340, May 2007 (Avargel 2007))
In the following, r is the parameter of the phase alignment transform as in Eq. (15), and should not be confused with the time index of the STFT.
In some embodiments, the method may include, for each time-frequency bin, estimating R (also: “3D SCM”) by replacing the expectation in Eq. (9) with averaging across Jt adjacent bins in time (block 2130). SCM calculation may, in some embodiments, involve time smoothing. Blocks 2110, 2120 and 2130 may herein be referred to as a “pre-processing” block 2100.
Frequency-to-time transform
In some embodiments,
Examples for selecting these parameters is discussed further below in more detail, as well as a method to efficiently calculate
where Wj is the j'th sample of a Kaiser window of length Jf. In some examples, the β parameter may be set to, for example, 3 (Oppenheim 1999). In some other example implementations, the β parameter may be set to values ranging from 2 to 7.
The method may further include detecting delay values T that are equal to (or substantially equal or match with) reflection delays (block 2230).
The detection of such values may for example be accomplished by thresholding the following signal:
where vτ is a first right singular vector of
By {circumflex over (Ω)}′(τ) the direction is denoted that attains the maximum in Eq. (25):
When τ is equal to a reflection's delay, Ω′(τ) is an estimate of Ω0 (the DOA of the direct sound). Note that when τ is equal to a delay between two reflections (and not a delay between a reflection and the direct sound), ρ(τ) may be high as well, leading to false detections. However, by employing cluster analysis for example, such detections may be distinguishable from valid ones, as {circumflex over (Ω)}′(τ) will be different from Ω0.
3) DOA Estimation: The method may further include estimating the DOAs of the reflections, for example, by employing a sparse recovery technique (block 2240). Estimating the DOAs may be performed separately for every τ selected in the previous step. Let uτ denote a first left singular vector of
where ∈u∈(0,1) may be predefined threshold, set experimentally for example, to 0.4. In some examples, the parameter value may range from 0.2 to 0.8. The parameter ∈u∈(0,1) represents the error between linear combinations of the steering vectors.
In the context of sparse recovery, the set of vectors {y(Ω):Ω∈S2} is known as the dictionary, and its elements are known as atoms. The optimization problem is computationally intractable and cannot be exactly solved in practice.
In some embodiments, the method may include applying the orthogonal matching pursuit (OMP) algorithm (T. T. Cai and L. Wang, “Orthogonal matching pursuit for sparse signal recovery with noise,” IEEE Trans. Inf. Theory, vol. 57, no. 7, pp. 4680-4688, July 2011 (Cal 2011)) or any other algorithm that may be extendable to infinite dictionaries.
In some examples, it may be assumed that early reflections with similar delays may have very different DOAs as they originate from different walls. If the angle between the DOAs is larger than n/N, the corresponding steering vectors are approximately orthogonal (Rafaely 2015). The sparse recovery technique may be selected such that the projection step only removes the contribution of steering vectors of DOAs the have already been found, without affecting the rest.
In some embodiments, the sparse recovery (e.g., OPM) algorithm is applied on uτ for every detected τ. In some example implementations, values of τ where the resulting s is larger than smax are discarded. In some examples, the value of smax may be (e.g., empirically) set to 3, which may be the smallest set of directions meeting criterion of Equation 26. In some other example implementations, the smallest set of directions may contain 2, 3, 4, 5, 6, 7, or 8 directions.
As a result of finding the smallest set of reflections meeting the requirements defined in Equation 26, a plurality of tuples can defined, where each tuple pertains to a reflection delay estimation relative to direct sound, and a corresponding DOA estimation of the same reflection relative to direct sound, and the DOA of the direct sound. Hence, in some embodiments, each estimate is a triplet of the form ({circumflex over (r)}, {circumflex over (Ω)}, {circumflex over (Ω)}′) corresponding to the delay of a reflection, its DOA and the DOA of direct sound.
In some embodiments, a triplet ({circumflex over (τ)}, {circumflex over (Ω)}, {circumflex over (Ω)}′) is obtained in a process of extracting direction and reflection pairs (block 2310). Clustering (block 2320) may be performed on the triplet, for example, to identify and remove outliers, for obtaining global estimates for the DOAs and delays of the early reflections (output 2400).
In some embodiments, the detection and removal of outliers may be performed by discarding reflection estimates where and angle between {circumflex over (Ω)}′ and Ω0 is larger than a predefined outlier threshold angle. In some examples, the outlier threshold angle may be set to 10 degrees. In some other examples, the outlier threshold angle may range from 5 to 30 degrees.
In general, the DOA of the direct sound Ω0 may not be known in advance. However, Ω0 may be derivable by selecting the peak in the histogram of {circumflex over (Ω)}′. By discarding reflection estimates from ({circumflex over (τ)}, {circumflex over (Ω)}, {circumflex over (Ω)}′) an updated or remaining set of tuples ({circumflex over (τ)}, {circumflex over (Ω)}, {circumflex over (Ω)}′) is obtained.
In some embodiments, cluster analysis may be performed on the remaining set of tuples ({circumflex over (τ)}, {circumflex over (Ω)}, {circumflex over (Ω)}′), for example, by employing the DBSCAN algorithm (M. Ester, H.-P. Kriegel, J. Sander, and X. Xu, “A density-based algorithm for discovering clusters in large spatial databases with noise,” K dd, vol. 96, no. 34, pp. 226-231, 1996 (Ester 1996)) or any other algorithm that, for example, may not require an initial estimate of the number of clusters. Hence, in some example implementations, a clustering algorithm may be selected where the number of reflections is automatically estimated by the algorithm. Furthermore, in some examples, the clustering algorithm may be selected such that assigning an outlier to a specific cluster may not be required. Hence, in some examples, the DBSCAN cluster algorithm may be selected (Ester 1996).
However, in some embodiments, other clustering algorithms may be selected which may, for example, provide improved performance compared to DBSCAN with respect to other performance parameter values. For example, cluster algorithms may be selected from a computational viewpoint, as the complexity of DBSCAN in the present implementation is quadratic in the number of data points.
In the DBSCAN algorithm, two points are defined as neighbors of a cluster if the distance between them is less then distance threshold E. A core point of a cluster may be defined as a point having a number of neighbors that is equal or greater than MINPTS. A noise point is a non-core point, for which none of its neighbors are core points. The algorithm iterates over all the points in the dataset, and assigns two points to the same cluster if one of them is identified a core point. Noise points are not assigned to any cluster.
In some examples, the following metrics may be used:
where γΩ and γτ are normalization constants, set for example empirically to, e.g., 15 degrees and 500 microseconds, respectively. As the metric is normalized, the parameter value is simply set to 1. MINPTS may for example be empirically set to, e.g., 10 percent, of the number of neighbors of the point that has the largest number of neighbors.
In some embodiments, global delays and DOE estimates are calculated for each cluster, for example, by determining, for each cluster, an average or median of the local delays and DOE estimates.
The local delay estimates may be confined to a grid of resolution Δτ. However, the global delay estimates, being averages of local delay estimates, are not. The fact that each DOA estimate has an associated delay, enables the separation of clusters from one another based on the delay, even if they have similar DOAs.
The information captured in
As mentioned herein, method comprises performing a frequency-to-time transformation on the correlation matrix to obtain
is equal (up to scaling and appropriate zero-padding) to the first Jτ terms of the inverse discrete Fourier transform (taken entry wise) of the sequence
so
Thus, it may in some embodiments sufficient to perform the FFT on only the upper triangular entries of R.
It is apparent from Equation (15) that
When the STFT window size is T, the frequency resolution of the STFT Δf satisfies Δf≤1/T. Therefore, reflections with delay τ larger than Δf−1 necessarily do not satisfy the MTF approximation criteria, since τ>T. This analysis also shows that to avoid unnecessary calculations, Jτ may in some embodiments be chosen such that JτΔτ<Δf−1/2.
The calculation of f may require the selection of three parameters: Jf, Δτ and Jτ. The number of frequency bins Jf, may be chosen such that the temporal resolution (given by (ΔfJf)−1) is sufficient. For example, if JfΔf=2000 Hz, then the phase alignment transform can separate two reflections if their delays are spaced by more than 0.5 ms. However, Jf may not be set arbitrarily high. First, the frequency independence of steering vectors is in practice limited to a given band, depending on the geometry of the microphone array. Second, some model assumptions may only be valid for bands of limited width. For example, the linear phase assumption in Equation (5) may, in practice, only hold within a local frequency region.
Once Jf has been determined, Δτ, the delay estimation resolution, may for example be set as follows:
where M is an integer that satisfies M≥Jf. This choice guarantees that Δτ≤(JfΔf)−1, and also that (Δτ·Δf)−1∈N, so the FFT can be used to calculate
In some embodiments, Jτ, the number of grid points over τ, may be chosen such that (Jτ−1)Δτ, the maximal detectable delay, is sufficiently small relative to T, the window length of the STFT such that the MTF criteria for Equation (5) holds. In some examples, (Jτ−1)Δτ≈T/10 may be considered sufficient.
Both Equation 25 and the OMP algorithm may require maximizing functions of the form f(Ω)=|y(Ω)Hx| over the sphere. This may be considered is equivalent to finding the maximum of a signal on the sphere whose SFT is given by x.
In some embodiments, the Newton's method may be employed for performing this maximization, with initialization obtained by sampling the sphere, for example, with a nearly uniform grid of 900 directions (J. Fliege and U. Maier, “A two-stage approach for computing cubature formulae for the sphere,” in Mathematik 139 T, Universitat Dortmund, Fachbereich Mathematik, Universitat Dortmund, 44221. Citeseer, 1996. (Fliege 1996)). Clearly, any other suitable number of grid directions may be chosen.
In some embodiments, Part 1 of PHALCOR may be independently for every selected time-frequency region, and therefore the total computation time grows linearly with the duration of the input signal. As the phase alignment transform can be calculated comparatively efficiently using the FFT, the main bottlenecks here are the SVD, delay detection and DOA estimation.
While calculating the SVD of an SCM is common in many localization methods, it is usually calculated once for every selected time-frequency region. In PHALCOR, however, it is calculated Jτ (the size of the τ-grid) times for every selected time-frequency region. As Jτ controls the maximal detectable delay, there may be a trade off between the maximal detectable delay and the computational complexity. By decreasing Jτ and increasing Δτ (τ-grid resolution), one can decrease run time without changing the maximal detectable delay. However, increasing Δτ comes at the cost of poor delay resolution.
The calculation of ρ and {circumflex over (Ω)}′(Equations (25) and (26)) may be comparatively computationally expensive as it requires a global maximum search over the sphere. For example, similar to SVD decomposition (block 2220), in PHALCOR this calculation may be required for every τ on the grid. Similarly, since the OMP algorithm is applied for every detected delay, the computational complexity of the DOA estimation step also increases with the number of delays. Accordingly, separating reflections of different delays may require the processing for each delay.
When τ=0 and wj=1,
Frequency smoothing is a common procedure in source localization in the presence of reverberation, as it can decorrelate signals, which is necessary for subspace methods such as MUSIC. Furthermore,
L1-SVD is used for source localization that can address correlated sources (D. Malioutov, M. Cetin, and A. S. Willsky, “A sparse signal reconstruction perspective for source localization with sensor arrays,” IEEE Trans. Signal Process., vol. 53, no. 8, pp. 3010-3022, August 2005 (Malioutov 2005))
L1-SVD is based on the observation that the first eigenvectors of the SCM are linear combinations of steering vectors. The DOAs are estimated by decomposing the eigenvectors of the SCM into a sparse combination of steering vectors. This is similar to our method, which decomposes a first left singular vector of the phase aligned SCM to a sparse linear combination of steering vectors. In general, the performance of sparse recovery methods improves as the vectors are sparser. While in L1-SVD the sparsity is determined by the total number of reflections, in the PHALCOR employed in some embodiments, the sparsity is determined by the number of sources at a specific delay, which is significantly lower. Furthermore, like MUSIC, in L1-SVD the number of detectable sources is limited by the number of input channels ((N+1)2 in our case). In PHALCOR, on the other hand, it is possible to detect more reflections than input channels, as each delay is processed independently.
The relations of PHALCOR to MUSIC and L1-SVD is related only to DOA estimation. However, PHALCOR is also related to delay estimation methods that are based on generalized cross correlation analysis. (J. Hassab and R. Boucher, “Optimum estimation of time delay by a generalized correlator,” IEEE Trans. Acoust., Speech, Signal Process., vol. A SSP-27, no. 4, pp. 373-380, August 1979 (Hassab 1979))
It can be shown that the entries of {circumflex over (R)}(τ) contain a generalized cross correlation between each pair of input channels, at lag r. Although similar, there are some important distinctions between the two methods. While cross correlation analysis is typically used to estimate the delay between two signals that are observed directly, in embodiments, PHALCOR aims to estimate the delay between multiple signals that are observed indirectly where each input channel is a linear combination of the delayed signals, given by the unknown steering matrix, which is estimated as well.
Trans. Acoust., Speech, Signal
Process., vol. A SS33, no. 4, pp.
The following section provides further example implementations in conjunction with performed simulations. It is noted that the simulation configurations described herein may be analogously implemented in a real-world scenario.
An example simulation study is herein presented, demonstrating the performance of the PHALCOR method, according to some embodiments. First, a detailed analysis of the different steps of the algorithm is presented on a specific test case. Next, a Monte Carlo analysis is presented, demonstrating the robustness of PHALCOR.
The setup of the simulations, common to both the case study and the Monte Carlo study, is as follows. An acoustic scene that consists of a speaker and a rigid spherical microphone array in a shoe box room, was simulated using the image method (Allen et al. 1979). The speech signal is a 5 seconds sample, drawn randomly from the TSP Speech Database (P. Kabal, “Tsp speech database,” McGill University, D atabase Version, vol. 1, no. 0, pp. 09-02, 2002 (Kabal 2002)).
The array has 32 microphones, and a radius of 4.2 cm (similar to the Eigenmike (M. Acoustics, “Em32 Eigenmike Microphone Array Release Notes (v17.0),” 25 Summit Ave, Summit, N J 07901, USA, 2013)), facilitating plane wave decomposition with spherical harmonics order N=4. The microphone signals were sampled at 48 kHz. Sensor noise was added, such that the direct sound to noise ratio is 30 dB. The positions of the speaker and the array were chosen at random inside the room, with the restriction that the distance between each other, and to the room boundaries is no less than 1 m. Three different rooms sizes are considered. Their dimensions and several acoustic parameters, are presented in Table 3.
The signals recorded by the microphones were used to compute anm(J). An STFT was applied to the PWAD coefficients signals using a Hanning window of 8192 samples, and an overlap of 75%. A frequency range of [500,5000] Hz was chosen for the analysis. The number of time bins used for averaging, Jt, was set to 6, while the number of frequency bins used for the phase alignment transform, JF, was set such that JfΔf=2000 Hz. The delay resolution Δτ was set to 83.33 microseconds (equivalent to setting M=2048 in Eq. (32)), while Jτ was chosen such that the maximal delay is 20 ms.
With these parameters, PHALCOR, was applied to the simulated data. The values of the different hyper-parameters of PHALCOR, including ρmin, ∈μ, Smax, γΩ and γτ, were set as exemplified herein above.
The MUSIC algorithm (Khaykin 2009) was applied as a reference method for DOA estimation, by selecting the peaks in the MUSIC spectrum ∥y(Ω)HU∥, where U is a matrix whose columns are orthonormal eigenvectors that correspond to the L largest eigenvalues of the time and frequency smoothed SCM. The time and frequency smoothing parameters are the same as in PHALCOR. The dimension of the signal subspace L was determined using the SORTE method. (K. Han and A. Nehorai, “Improved source number detection and direction estimation with nested arrays and ulas using jackknifing,” IEEE Trans. Signal Process., vol. 61, no. 23, pp. 6118-6128, December 2013).
To reduce false positives, peaks for which the MUSIC magnitude height is lower than 0.75 were discarded. The local estimates were clustered using DBSCAN, to obtain global estimates. The delays of the detected reflections were estimated using the following method. First, each reflection signal was estimated by solving Eq. (6) for s in the least squares sense. Then, the delay of the k'th reflections was estimated by selecting the maximum of the cross correlation values between sk and s0. Note that in contrast to PHALCOR, the delays are estimated after the clustering.
For both PHALCOR and the reference method, a detected reflection as a true positive was considered if its delay and DOA matched simultaneously the delay and DOA of a true reflection. The matching tolerance was chosen to be 500 ρs for the delay, and 15 degrees for the DOA. The probability of detection (PD) at a given maximal delay t is defined as the fraction of true positive detections of reflections r with a delay smaller than or equal to t, out of the total number of reflections with a delay smaller than or equal to t:
where is the set of all ground truth reflections, and is the set of all estimated reflections. The false positive rate (FPR) at a given maximal delay t is defined as the fraction of false positive detections with a delay smaller than or equal to t, out of the total number of detections with a delay smaller than or equal to t:
The false positive rate (FPR) at a given maximal delay t is defined as the fraction of false positive detections with a delay smaller than or equal to t, out of the total number of detections with a delay smaller than or equal to t:
Here, |⋅| denotes the cardinality of the set.
The test case presented in this section is of a female speaker in the medium sized room. There are K=31 reflections with a delay less than 20 ms in this case.
These correspond to delays between two reflections (as opposed to delays between a reflection and the direct sound). For example, the peak near τ=2 ms, corresponds to the delay between the second and third reflections, whose delays are about 3 ms and 5 ms, respectively. Such cases may be identified by testing ∠({circumflex over (Ω)}′(τ), Ω0), the angle between the DOA of the steering vector that is most similar to vτ, and the DOA of the direct sound. As shown in
In the top row, the function |y(Ω)uτ| is shown, where each column corresponds to a different value of τ. When τ equals a reflection's delay, the direction that maximizes the response is excepted to be that of the direct sound. Indeed, as τ varies across columns, the location of the peak remains, and is equal to Ω0, the DOA of the direct sound.
In the middle row, the function |y(Ω)uτ| is shown. It is similar to the top row, except that a first left singular vector is used instead of a right one. When τ is a reflection's delay uτ is approximately equal to a linear combination of the steering vectors of reflections with delays of approximately τ. When the DOAs are sufficiently separated, they can be identified as peaks in |y(Ω)uτ|. For τ1 and τ2, only one such peak is apparent, and its location matches the DOA of the corresponding reflection. When τ=τ4, it is apparent that there are two dominant peaks, at directions Ω4 and Ω5. This is due to the fact that the 4th and 5th reflections have similar delays. Similarly, since the 8th and 9th reflections have similar delays, when τ=τ8 the two peaks correspond to Ω8 and Ω9.
In
It is apparent that, compared to the reference, in the example discussed, PHALCOR is able to detect significantly more reflections. PHALCOR detected successfully 29 reflections, while the MUSIC based method could only detect 8 (not including the direct sound). Furthermore,
Finally, we note that while the performance of the proposed method is superior, the difference in computation time is quite significant: 303 seconds for the proposed method, and only 19 seconds for the reference method (as obtained using MATLAB 2020a, on a MacBook Pro 2019 with a 2.3 GHz 8-Core Intel Core i9 processor, 16 GB RAM).
The simulation described above is repeated 50 times for each of the 3 rooms, varying the speakers, their location, and the microphone array location.
Compared with the reference method, the performance of PHALCOR is significantly better, both in terms of probability of detection and false positive rates, by a factor ranging from 3 to 20. As the delay of a reflection increases, the probability of detection decreases. This is since later reflections usually have lower amplitudes. Furthermore, the reflection density is higher as the delay increases, making it more difficult to separate the reflections spatially.
The root mean square (RMS) for DOA and delay estimation errors for each method are computed and averaged for all the estimates in this Monte Carlo simulation, and are presented in Table 4. The RMS is calculated excluding the direct sound. Table 4 shows that the performance in terms of DOA estimation error is comparable between the two methods. In terms of delay estimation error, the reference method is superior, but note that the errors are calculated only on true positive detections, which are considerably more frequent in PHALCOR, as is evident from
Reference is now made to
The term “processor”, as used herein, may additionally or alternatively refer to a controller. Processor 1200 may be implemented by various types of processor devices and/or processor architectures including, for example, embedded processors, communication processors, graphics processing unit (GPU)-accelerated computing, soft-core processors, quantum processors, and/or general purpose processors.
Memory 1300 may be implemented by various types of memories, including transactional memory and/or long-term storage memory facilities and may function as file storage, document storage, program storage, or as a working memory. The latter may for example be in the form of a static random access memory (SRAM), dynamic random access memory (DRAM), read-only memory (ROM), cache and/or flash memory. As working memory, memory 1300 may, for example, include, e.g., temporally-based and/or non-temporally based instructions. As long-term memory, memory 1300 may for example include a volatile or non-volatile computer storage medium, a hard disk drive, a solid state drive, a magnetic storage medium, a flash memory and/or other storage facility. A hardware memory facility may for example store a fixed information set (e.g., software code) including, but not limited to, a file, program, application, source code, object code, data, and/or the like.
System 1000 may further include an input/output device 1500 which may be configured to provide or receive any type of data or information. Input/output device 1500 may include, for example, visual presentation devices or systems such as, for example, computer screen(s), head mounted display (HMD) device(s), first person view (FPV) display device(s), device interfaces (e.g., a Universal Serial Bus interface), and/or audio output device(s) such as, for example, speaker(s) and/or earphones. Input/output device 1500 may be employed to access information generated by the system and/or to provide inputs including, for instance, control commands, operating parameters, queries and/or the like. For example, input/output device 1500 may allow a user of system 1000 to view and/or otherwise receive information of at least one (early) reflection and its associated DOA.
System 1000 may further comprise at least one communication module 1600 configured to enable wired and/or wireless communication between the various components and/or modules of the system and which may communicate with each other over one or more communication buses (not shown), signal lines (not shown) and/or a network infrastructure.
System 1000 may further include a power module 1700 for powering the various components and/or modules and/or subsystems of the system. Power module 1700 may comprise an internal power supply (e.g., a rechargeable battery) and/or an interface for allowing connection to an external power supply.
It will be appreciated that separate hardware components such as processors and/or memories may be allocated to each component and/or module of system 1000. However, for simplicity and without be construed in a limiting manner, the description and claims may refer to a single module and/or component. For example, although processor 1200 may be implemented by several processors, the following description will refer to processor 1200 as the component that conducts all the necessary processing functions of system 1000.
Functionalities of system 1000 may be implemented fully or partially by a multifunction mobile communication device also known as “smartphone”, a mobile or portable device, a non-mobile or non-portable device, a digital video camera, a personal computer, a laptop computer, a tablet computer, a server (which may relate to one or more servers or storage systems and/or services associated with a business or corporate entity, including for example, a file hosting service, cloud storage service, online file storage provider, peer-to-peer file storage or hosting service and/or a cyberlocker), personal digital assistant, a workstation, a wearable device, a handheld computer, a notebook computer, a vehicular device, a non-vehicular device, a robot, a stationary device and/or a home appliances control system. For example, microphone array 1100 may be part of a smartphone camera or of an autonomous or semi-autonomous vehicle, and some of RDOA engine 1400 functionalities may be implemented by the smartphone or the vehicle, and some by devices and/or system external to the smartphone or vehicle. Alternative configurations may also be conceived.
Reference is now made to
In some embodiments, the method may further include determining, based on the sound data, a correlation value between at least two audio channels of a plurality of audio channels to obtain, for a same frequency of the plurality of frequencies, a set of correlation values (e.g., a 3D correlation matrix) (block 11200).
In some embodiments, the method may include performing inverse weighted Fourier transform on each set of frequency-related correlation values to obtain a plurality of subsets of delay representations composing a set of values represented in a 3D delay matrix) (block 11300). Each subset of delay representation may relate to a different time delay between two received sound waves.
In some embodiments, the method may further include analyzing each subset of time delay representations to extract, for a selected delay, information about the non-reflected and reflected sources to obtain a set of linear combinations of the sources (block 11400). Each linear combination may pertain to a same or substantially same delay.
In some embodiments, the method may include analyzing the set to identify at least one reflection at a given delay (block 11500). In some examples, the set of values may represent linear combinations of a direct source and at least one reflection or of a plurality of reflections. In some examples, the analyzing may be performed to differentiate between at least two reflections of a selected linear combination having an about same delay.
In some embodiments, the method may include determining, for the at least one identified reflection at the given delay, a direction-of-arrival (DOA) estimate (block 11600). This may for instance be performed by determining or searching for a match for the at least one identified reflection with steering vectors to estimate the respective DOA. In some examples, the method may include determining which of the at least two obtained differentiated reflections and their associated delays match with steering vectors, to estimate their DOAs (e.g., respectively).
Example 1 pertains to a method for estimating direction of arrival and delays of reflected and non-reflected sound waves in a room received by a plurality of sound capturing devices in the room, the method comprising:
Example 2 includes the subject matter of example 1 and, optionally, performing clustering on the reflection data information for identifying outliers and clusters.
Example 3 includes the subject matter of example 1 and/or example 2 and, optionally, wherein analyzing each set of time delay representations is performed using SVD decomposition.
Example 4 includes the subject matter of any one or more of the examples 1 to 3 and, optionally, the method of any one or more of the preceding claims, wherein the step of analyzing the set of linear combinations of reflections is performed using a sparse recovery algorithm.
Example 5 includes the subject matter of example 4 and, optionally, wherein the sparse recovery algorithm is an orthogonal matching pursuit (OMP) algorithm.
Example 6 includes the subject matter of any one or more of the examples 1 to 5 and, optionally, wherein the sound data are created based on electronical signals generated by a microphone array located in the room.
Example 7 includes the subject matter of any one or more of the examples 1 to 6 and, optionally, wherein the sound waves are produced by at least one speaker located in the room.
Example 8 includes the subject matter of any one or more of the examples 1 to 7 and, optionally, comprising determining a match for a plurality of reflections with a plurality of steering vectors, respectively to estimate the direction-of-arrival (DOA) for the plurality of reflections.
Example 9 includes the subject matter of any one or more of the examples 1 to 8 and, optionally, wherein the reflections are early room reflections.
Example 10 includes the subject matter of example 9 and, optionally, wherein the early room reflections pertain to the 1st reflection (only), the first two room reflections (only), the first 3 room reflections (only), or up to the first 10 or 20 room reflections.
Example 11 pertains to a system for estimating direction of arrival and delays of reflected and non-reflected sound waves in a room received by a plurality of sound capturing devices in the room, the system comprising:
Example 12 includes the subject matter of example 11 and, optionally, performing clustering on the reflection data information for identifying outliers and clusters.
Example 13 includes the subject matter of examples 11 and/or 12 and, optionally, wherein the analyzing of each set of time delay representations is performed using SVD decomposition.
Example 14 includes the subject matter of any one or more of the examples 11 to 13 and, optionally, wherein the step of analyzing the set of linear combinations of reflections is performed using a sparse recovery algorithm.
Example 15 includes the subject matter of example 14 and, optionally, wherein the sparse recovery algorithm is an orthogonal matching pursuit (OMP) algorithm.
Example 16 includes the subject matter of any one or more of the examples 11 to 15 and, optionally, wherein the sound data are created based on electronical signals generated by a microphone array located in the room.
Example 17 includes the subject matter of any one or more of the examples 11 to 16 and, optionally, wherein the sound waves are produced by at least one speaker located in the room.
Example 18 includes the subject matter of any one or more of the examples 11 to 17 and, optionally, determining a match for a plurality of reflections with a plurality of steering vectors, respectively to estimate the direction-of-arrival (DOA) for the plurality of reflections.
Example 19 includes the subject matter of any one or more of the examples 11 to 18 and, optionally, wherein the reflections are early room reflections.
Example 20 includes the subject matter of example 19 and, optionally, wherein the early room reflections contain the 1st reflection, the first two room reflections, the first 3 room reflections, or up to the first 10 or 20 room reflections.
It is important to note that the methods described herein and illustrated in the accompanying diagrams shall not be construed in a limiting manner. For example, methods described herein may include additional or even fewer processes or operations in comparison to what is described herein and/or illustrated in the diagrams. In addition, method steps are not necessarily limited to the chronological order as illustrated and described herein.
Any digital computer system, unit, device, module and/or engine exemplified herein can be configured or otherwise programmed to implement a method disclosed herein, and to the extent that the system, module and/or engine is configured to implement such a method, it is within the scope and spirit of the disclosure. Once the system, module and/or engine are programmed to perform particular functions pursuant to computer readable and executable instructions from program software that implements a method disclosed herein, it in effect becomes a special purpose computer particular to embodiments of the method disclosed herein. The methods and/or processes disclosed herein may be implemented as a computer program product that may be tangibly embodied in an information carrier including, for example, in a non-transitory tangible computer-readable and/or non-transitory tangible machine-readable storage device. The computer program product may directly loadable into an internal memory of a digital computer, comprising software code portions for performing the methods and/or processes as disclosed herein.
The methods and/or processes disclosed herein may be implemented as a computer program that may be intangibly embodied by a computer readable signal medium. A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may be any computer readable medium that is not a non-transitory computer or machine-readable storage device and that can communicate, propagate, or transport a program for use by or in connection with apparatuses, systems, platforms, methods, operations and/or processes discussed herein.
The terms “non-transitory computer-readable storage device” and “non-transitory machine-readable storage device” encompasses distribution media, intermediate storage media, execution memory of a computer, and any other medium or device capable of storing for later reading by a computer program implementing embodiments of a method disclosed herein. A computer program product can be deployed to be executed on one computer or on multiple computers at one site or distributed across multiple sites and interconnected by one or more communication networks.
These computer readable and executable instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer readable and executable program instructions may also be stored in a computer readable storage medium that can direct a computer, a programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer readable storage medium having instructions stored therein comprises an article of manufacture including instructions which implement aspects of the function/act specified in the flowchart and/or block diagram block or blocks.
The computer readable and executable instructions may also be loaded onto a computer, other programmable data processing apparatus, or other device to cause a series of operational steps to be performed on the computer, other programmable apparatus or other device to produce a computer implemented process, such that the instructions which execute on the computer, other programmable apparatus, or other device implement the functions/acts specified in the flowchart and/or block diagram block or blocks.
The term “engine” may comprise one or more computer modules, wherein a module may be a self-contained hardware and/or software component that interfaces with a larger system. A module may comprise a machine or machines executable instructions. A module may be embodied by a circuit or a controller programmed to cause the system to implement the method, process and/or operation as disclosed herein. For example, a module may be implemented as a hardware circuit comprising, e.g., custom VLSI circuits or gate arrays, an Application-specific integrated circuit (ASIC), off-the-shelf semiconductors such as logic chips, transistors, and/or other discrete components. A module may also be implemented in programmable hardware devices such as field programmable gate arrays, programmable array logic, programmable logic devices and/or the like.
The term “random” also encompasses the meaning of the term “substantially randomly” or “pseudo-randomly”.
The expression “real-time” as used herein generally refers to the updating of information based on received data, at essentially the same rate as the data is received, for instance, without user-noticeable judder, latency or lag.
In the discussion, unless otherwise stated, adjectives such as “substantially” and “about” that modify a condition or relationship characteristic of a feature or features of an embodiment of the invention, are to be understood to mean that the condition or characteristic is defined to within tolerances that are acceptable for operation of the embodiment for an application for which it is intended.
Unless otherwise specified, the terms “substantially”, “about” and/or “close” with respect to a magnitude or a numerical value may imply to be within an inclusive range of −10% to +10% of the respective magnitude or value.
It is important to note that the method may include is not limited to those diagrams or to the corresponding descriptions. For example, the method may include additional or even fewer processes or operations in comparison to what is described in the figures. In addition, embodiments of the method are not necessarily limited to the chronological order as illustrated and described herein.
Discussions herein utilizing terms such as, for example, “processing”, “computing”, “calculating”, “determining”, “establishing”, “analyzing”, “checking”, “estimating”, “deriving”, “selecting”, “inferring” or the like, may refer to operation(s) and/or process(es) of a computer, a computing platform, a computing system, or other electronic computing device, that manipulate and/or transform data represented as physical (e.g., electronic) quantities within the computer's registers and/or memories into other data similarly represented as physical quantities within the computer's registers and/or memories or other information storage medium that may store instructions to perform operations and/or processes. The term determining may, where applicable, also refer to “heuristically determining”.
It should be noted that where an embodiment refers to a condition of “above a threshold”, this should not be construed as excluding an embodiment referring to a condition of “equal or above a threshold”. Analogously, where an embodiment refers to a condition “below a threshold”, this should not be construed as excluding an embodiment referring to a condition “equal or below a threshold”. It is clear that should a condition be interpreted as being fulfilled if the value of a given parameter is above a threshold, then the same condition is considered as not being fulfilled if the value of the given parameter is equal or below the given threshold. Conversely, should a condition be interpreted as being fulfilled if the value of a given parameter is equal or above a threshold, then the same condition is considered as not being fulfilled if the value of the given parameter is below (and only below) the given threshold.
It should be understood that where the claims or specification refer to “a” or “an” element and/or feature, such reference is not to be construed as there being only one of that element. Hence, reference to “an element” or “at least one element” for instance may also encompass “one or more elements”.
Terms used in the singular shall also include the plural, except where expressly otherwise stated or where the context otherwise requires.
In the description and claims of the present application, each of the verbs, “comprise” “include” and “have”, and conjugates thereof, are used to indicate that the data portion or data portions of the verb are not necessarily a complete listing of components, elements or parts of the subject or subjects of the verb.
Unless otherwise stated, the use of the expression “and/or” between the last two members of a list of options for selection indicates that a selection of one or more of the listed options is appropriate and may be made. Further, the use of the expression “and/or” may be used interchangeably with the expressions “at least one of the following”, “any one of the following” or “one or more of the following”, followed by a listing of the various options.
As used herein, the phrase “A,B,C, or any combination of the aforesaid” should be interpreted as meaning all of the following: (i) A or B or C or any combination of A, B, and C, (ii) at least one of A, B, and C; (iii) A, and/or B and/or C, and (iv) A, B and/or C. Where appropriate, the phrase A, B and/or C can be interpreted as meaning A, B or C. The phrase A, B or C should be interpreted as meaning “selected from the group consisting of A, B and C”. This concept is illustrated for three elements (i.e., A,B,C), but extends to fewer and greater numbers of elements (e.g., A, B, C, D, etc.).
It is appreciated that certain features of the invention, which are, for clarity, described in the context of separate embodiments or example, may also be provided in combination in a single embodiment. Conversely, various features of the invention, which are, for brevity, described in the context of a single embodiment, example and/or option, may also be provided separately or in any suitable sub-combination or as suitable in any other described embodiment, example or option of the invention. Certain features described in the context of various embodiments, examples and/or optional implementation are not to be considered essential features of those embodiments, unless the embodiment, example and/or optional implementation is inoperative without those elements.
It is noted that the terms “in some embodiments”, “according to some embodiments”, “for example”, “e.g.”, “for instance” and “optionally” may herein be used interchangeably.
The number of elements shown in the Figures should by no means be construed as limiting and is for illustrative purposes only.
“Real-time” as used herein generally refers to the updating of information at essentially the same rate as the data is received. More specifically, in the context of the present invention “real-time” is intended to mean that the image data is acquired, processed, and transmitted from a sensor at a high enough data rate and at a low enough time delay that when the data is displayed, data portions presented and/or displayed in the visualization move smoothly without user-noticeable judder, latency or lag.
It is noted that the terms “operable to” can encompass the meaning of the term “modified or configured to”. In other words, a machine “operable to” perform a task can in some embodiments, embrace a mere capability (e.g., “modified”) to perform the function and, in some other embodiments, a machine that is actually made (e.g., “configured”) to perform the function.
Throughout this application, various embodiments may be presented in and/or relate to a range format. It should be understood that the description in range format is merely for convenience and brevity and should not be construed as an inflexible limitation on the scope of the embodiments. Accordingly, the description of a range should be considered to have specifically disclosed all the possible subranges as well as individual numerical values within that range. For example, description of a range such as from 1 to 6 should be considered to have specifically disclosed subranges such as from 1 to 3, from 1 to 4, from 1 to 5, from 2 to 4, from 2 to 6, from 3 to 6 etc., as well as individual numbers within that range, for example, 1, 2, 3, 4, 5, and 6. This applies regardless of the breadth of the range.
The phrases “ranging/ranges between” a first indicate number and a second indicate number and “ranging/ranges from” a first indicate number “to” a second indicate number are used herein interchangeably and are meant to include the first and second indicated numbers and all the fractional and integral numerals there between.
While the invention has been described with respect to a limited number of embodiments, these should not be construed as limitations on the scope of the invention, but rather as exemplifications of some of the embodiments.
The present application claims priority to U.S. Provisional Patent Application 63/174,039, filed on Apr. 13, 2021, which is incorporated by reference herein in its entirety.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/IB2022/053478 | 4/13/2022 | WO |
Number | Date | Country | |
---|---|---|---|
63174039 | Apr 2021 | US |