The present invention relates to simultaneous indoor localization and, more particularly, to room shape reconstruction using a single mobile computing device.
With the development of mobile devices, many applications related to public safety, medical care, or commercial use become available by using sensory information collected by the devices. In many cases, these applications highly rely on the localization feature provided by the devices. Therefore, localization becomes an integral part for applications where location information is critical.
Outdoor localization is largely considered as a solved problem. The satellite based Global Positioning System (GPS) is able to provide satisfactory accuracy and coverage in most outdoor environment. However, it cannot offer an acceptable performance for indoor localization, as the microwaves are easy to be heavily attenuated when penetrating the construction materials. In addition, the multi-path propagation caused by the reflections on the construction surfaces leads to significant losses of localization accuracy.
Indoor localization has been an active research area in the recent years. Most works focus on the simultaneous localization and mapping (SLAM), which is able to build the map of the environment, while determining the device's position within the map. Several techniques have been demonstrated to be effective to accomplish indoor localization, such as those utilizing location specific signatures from WiFi, Bluetooth, UWB signals as well as LED light. Most existing techniques require some prior information about the surrounding environment, such as anchor nodes in UWB based system whose positions are fixed and known. Additionally, these techniques invariably require the availability of infrastructure that is functioning (i.e., powered up) during the localization and mapping process. There are applications, however, where indoor mapping and localization may be required in the absence of pre-established infrastructure. A simple example is the need of first responders when natural disaster may lead to a power outage that in turn renders any pre-established infrastructure inaccessible.
The present invention comprises a device for performing simultaneous localization and mapping in an enclosed space having a loudspeaker capable of emitting a predetermined sound, a microphone co-located with the loudspeaker, a processor interconnected to the microphone, wherein the processor is programmed to receive a series of echoes of the predetermined sound when emitted by the loudspeaker from a corresponding series of non-collinear locations within the enclosed space and to determine shape of the enclosed space based on the series of echoes from the corresponding series of locations. The processor is programmed to determine the shape of the enclosed space by measuring the distance between each of the series of locations from a preceding one of the series of locations. The processor is programmed to determine the shape of the enclosed space by measuring the distance between each of the series of locations and all walls of the enclosed space. The processor is programmed to determine the shape of the enclosed space by identifying first order echoes from within the series of echoes received at each of the corresponding series of locations. The processor is programmed to determine shape of the enclosed space by reconstructing all possible shapes of the enclosed space and selecting the shape with the most number of edges. The processor is programmed to determine the location the series of non-collinear locations within the shape of the enclosed space. The predetermined sound may comprise a chirp signal sweeping from a first frequency to a second frequency, where the first frequency is 30 Hz and the second frequency is 8 kHz.
The invention thus involves a single mobile computing device that is equipped with a loud speaker and a microphone as well as various motion sensors that is programmed to perform room shape reconstruction. The requisite equipment are generally available in conventional smartphones and laptop computer which can be programmed using applications to implement the present invention. The invention provides a technology that allows simultaneous room shape recovery and self-localization without another external infrastructure. Mobile device 10 provides as a co-located acoustic transmitter and receiver that emits and receives acoustic echoes; together with the information gathered through internal sensors, the device can autonomously reconstruct any 2-D convex polygonal room shape while self-localizing with respect the reconstructed room shape.
The present invention also encompasses a method of performing simultaneous localization and mapping in an enclosed space, comprising the steps of providing a loudspeaker capable of emitting a predetermined sound, emitting the predetermined sound from the loudspeaker from each of a series of locations within the enclosed space, receiving a corresponding series of echoes of the predetermined sound from each of the series of locations with a microphone co-located with the loudspeaker, and using a processor interconnected to the microphone to determine shape of the enclosed space based on the series of echoes received from the corresponding series of locations.
The method of the present invention can thus use a single mobile device with acoustic features and motion sensors to simultaneously recover the room shape and localize the device itself. The effectiveness of the invention was demonstrated for SLAM in 2-D convex polygonal rooms. In the method of the invention, the mobile device serves as a co-located acoustic transmitter and receiver. Specifically, it transmits a probing signal to excite the acoustic response in the indoor environment, and receives and records the echoes. By measuring the time of arrival (ToA) of the echoes, the distance between mobile device 10 and each reflector (wall) can be recovered. Then to establish the environment infrastructure through the ToA information, it is proved that the transmission-reception process needs to be done for at least three times at three distinct non-collinear positions. Moreover, to obtain better performance of infrastructure reconstruction, the inertial sensors mounted in the mobile device, such as the accelerometer and magnetometer, are used to track the trajectory of itself. However, the motion direction information estimated by the inertial sensors are known to be highly inaccurate, and will not lead to acceptable performance for localization and mapping. Therefore, in this method, only the path lengths, i.e., the distance between the consecutive measurement points, are estimated and used. Given the ToA information collected at three distinct non-collinear measurement points and the distance information between consecutive measurement points, the developed technology can reconstruct any convex polygon in 2-D, as well as localize the device itself using acoustic echoes. Thus, in the technique of the present invention, 2-D SLAM can be achieved by using the acoustic functions and motion sensors of a single mobile device, without any pre-established infrastructure or external power supply.
The present invention will be more fully understood and appreciated by reading the following Detailed Description in conjunction with the accompanying drawings, in which:
Referring to the figures, wherein like numerals refer to like parts throughout, there is seen in
As seen in
Mobile device 10 provides co-located loudspeaker 14 and microphone 16, and is moved around inside the room whose shape is to be reconstructed. At each measurement point, device 10 is programmed to emit a probing acoustic signal s(t) and receives and records the echoes r(t). As seen in
Image Source Model
The basic technique to link the acoustic echoes and the room shape begins with a classic model widely used in acoustics and optics, called image source model (ISM). In
where τi(j) is the travelling time of the probing signal being reflected by the edge W and returning to the source Oj, and c is the speed of sound. Here, it is possible to assume the emission time is set at t=0.
All the distances collected at a single source are denoted as a vector {right arrow over (r)}j. It is quite trivial to show three sets of distances {right arrow over (r)}j, or equivalently, three sets of ToA information collected respectively at three distinct locations that are not co-linear inside the room are sufficient to reconstruct the room shape. As seen in
Room Shape Reconstruction and Self-Localization with Known Path Lengths
With only first order echo information, it is conventionally known that without any additional information, such as relative distance of measurement points, it is impossible to reconstruct all 2-D convex polygons. In particular, if the room shape is a rectangle, it has been shown that there are infinite parallelograms of completely different shapes that yield the same set of first order echoes. However, with various internal sensors, it is now feasible using the present invention to measure, for example, the distance between two measurement points or even the angles if the user of device 10 walks along different straight lines after each measurement points. In the present invention, the distance information measured by motion sensor 18 between two neighboring measurement points is used to supply the necessary information. Specifically, the distances between O1 and O2, as well as between O2 and O3, which are denoted as d12 and d23 in
Peak Detection
To achieve better resolution for determining τi(j), or equivalently rj,i, wide-band signals are usually used. In the case of acoustic signals, a chirp signal is used as its auto-correlation provides a good approximation to the Dirac delta function. Therefore, to obtain the ToA information, the received signal r(t) is first convolved with the probing signal s(t). Whenever there is an echo (first or higher orders), a peak will occur at the output of the correlator. The first and most significant one corresponds to the light-of-sight (LOS) path (i.e., directly received by the microphone without reflecting off any wall). This LOS arrival time will be recorded and subtracted from subsequent echoes and differences are precisely the time each echo travels along a certain path. All detected echoes are collected into the distance set {right arrow over (r)}j at each source Oj.
Reconstruction for the Ideal Case
Consider a convex planar K—polygon as shown in
(r2,i−r1,i)+d12 cos θi=0, (1)
(r3,i−r2,i)+d23 cos(θi−φ)=0. (2)
The ideal case refers to the case when echoes corresponding to different walls are correctly labeled at different nodes, and only the first-order echoes are present in the distance sets. Thus, in each {right arrow over (r)}j, the {right arrow over (r)}j,i's are sorted in the same order as i=1, . . . , K though they may not arrive in this order. The system needs to determine the uniqueness of φ and θi's according to (1) and (2). The solutions to (1) and (2) are given by:
and these two equations yield four possible sign combinations. However, there are only two sign combinations which satisfy (1) and (2) simultaneously for all i=1, . . . , K, and those two are reflections of each other with respect to O1O2.
Notice that in such a coordinate system, the first two sources are located at (0, 0) and (d12, 0), and once φ and θi are determined, the coordinate of O3 is determined as well. Hence, the self-localization can be accomplished.
Echo Labelling
Practically, the received echoes are not correctly labeled at different measurement points, i.e., one does not know a priori which are the first order echoes corresponding to the same wall—notice that at different nodes, echoes from different walls may not arrive at the same order. In addition, {right arrow over (r)}j may contain high-order echoes. Therefore, the higher-order echoes have to be eliminated, and the first-order echoes have to be labeled in the correct order; this is done by trying different echo combinations to solve (1) and (2). With random measurement points, no solutions to (1) and (2) can be obtained for all i=1, . . . , K except for the correct set of first order echoes. The length of {right arrow over (r)}j is denoted as Nj, then N=min{N1,N2,N3}. To find the correct labels of the echoes, each K out of N distances are selected from each distance set {right arrow over (r)}j, and plugged into (3) and (4), to determine if they can yield a valid solution to (1) and (2) for all i=1, . . . ,K. As the actual number of walls is unknown in prior, K needs to vary from 3 to N, corresponding to polygons of varying number of sides. There may be multiple polygons satisfying (1) and (2), and among all these polygons, e.g., if the original shape is a pentagon, then it is possible that four set of first order echoes will also correctly solve (1) and (2), yielding a quadrilateral. Thus, the one with the most number of edges is chosen as the final reconstructed shape.
Self-Localizing
Once 2-D room shape is reconstructed after at least three measurement points, the coordinates of the three measurement points are automatically recovered in the process. Subsequently, echoes collected at other points are used to determine the location of those points, i.e., self-localization can be trivially accomplished.
The concrete steps of the system are seen in
Room Impulse Response Model
Acoustic signal propagation from a loudspeaker to a microphone in a room can be described by the room impulse response (RIR), which can be formulated as the summation of both line-of-sight (LOS) and reflected components. In practice, if the microphone and loudspeaker are much closer to each other compared to the distance between the device and the walls, the device is referred to as a co-located device. For a co-located device at a measurement point denoted by Oj, the RIR is
where αi(j)'s and τi(j)'s are path gains and delays from the transmitter to the receiver, respectively. Since higher order reflective paths typically have much weaker power compared with the lower order ones, (1) can be approximately expressed by the first Nj+1 components including LOS and Nj reflective paths:
It is possible to assume that the N reflective paths contain all first order reflections and higher order ones that are detectable. Given the transmitted signal s(t), the received signal at Oj is
r
(j)(t)=s(t)*h(j)(t)+ω(t),
where ω(t) is the additive noise. τi(j)'s can be obtained from r(j)(t) if the s(t−τi(j)) decays before s(t−τi+1(j)) arrives at the receiver. However, it is difficult to generate such kind of acoustic signals which requires extremely wide bandwidth. A better way to obtain τi(j)'s is to consider the correlator output:
m
(j)(t)=r(j)(t)*s(t).
If s(t) has nice auto-correlation property, the first peak of m(j)(t) corresponds to the LOS components, while other peaks correspond to reflective components. Hence the time difference of arrival (TDOA) can be obtained given asynchronous loudspeaker and microphone. This paper applies chirp signals which are easy to generate and have good auto-correlation properties.
Since the loudspeaker and microphone are co-located, τ0, which corresponds to the delay of the LOS path, is close to zero. Define a column vector
where c is the speed of sound. Then {tilde over (r)}j contains all the distances between the device and the walls. Hence synchronization between loudspeakers and microphones is not required for co-located device if only the distances between measurement point and the walls are of interest.
Image Source Model
By conventional image source model, reflections within a constrained space can be viewed as LOS propagation from virtual sources to the receiver in the free space. Suppose the coordinate of Oj is denoted by oj. As shown in
õ
j,i=2pi−oj,ni+oj,
where pi is any point on the ith wall and ni is the outward norm vector of the ith wall. Thus
Let rj,i be the distance between Oj and the ith wall, then rj,i=½τi(j)c which is equal to half of the distance between oj and õj,i. The second order image source of Oj with respect to the ith and the kth wall is
õ
j,ik=2pk−õj,i,nk+õj,i.
Similarly, let rj,ik be half of the distance between oj and õj,ik. Following the same step, higher order image sources can be represented by lower order image sources. Then {tilde over (r)}j is associated with image sources. The term echo is used to refer either the delay τi(j) or the corresponding distance if no ambiguity occurs.
Two Extreme Cases
There are some special cases for room shape reconstruction and mobile device location. For instance, suppose distances between each pair of measurement points are given and the three measurement points are not collinear. In this case, only the room shape is of interest. By geometry, there exists at most one common tangent line for three circles with non-collinear centers. Thus, the room shape is uniquely determined by first-order echoes.
The second special case is when the reconstruction is free of geometry information of the measurement points. In this case, both room shape and the position of the device are of interest. The conventional art has shown that a large class of convex polygons can be reconstructed by first order echoes that are correctly labeled. The basic idea is that many convex polygons can be generated by the intersection of a triangle and some lines. As long as the triangle is obtained the coordinate of the measurement points are also determined. Therefore the rest of the reconstruction work is exactly the same as the previous case. However, parallelograms cannot be reconstructed uniquely under this assumption.
Recovery with Known Path Lengths
Geometry
Consider a convex planar K-polygon. As shown in
From
(r2,i−r1,i)+d12 cos θi=0,
(r3,i−r2,i)+d23 cos(θi−φ)=0.
Ideal Case
Let {rj,i}i=1K be a column vector. Here, it is possible to assume that for all j's, the one-to-one mapping fj:rj{tilde over (r)}j is known. In other words, rj,i's have been correctly chosen from {tilde over (r)}j for j=1, 2, 3 and i=1, . . . , K. In the rest of the paper, we say that the received echoes are grouped if echoes are chosen from {tilde over (r)}j's according to fj's. The remaining problem is to determine the uniqueness of θi's and φ given (3) and (4).
Define
For simplicity we denote αi,j and βii by αi and βi, respectively. Given correctly labeled, by (3) and (4), we have
θi=±arccos αi and θi−φ=±arccos βi. (5)
Thus, there are four possible sign combinations for a given i,
θi=arccos αi and θi−φ=arccos βi, (6)
θi=arccos αi and θi−φ=−arccos βi, (7)
θi=arccos αi and θi−φ=arccos βi, (8)
θi=arccos αi and θi−φ=−arccos βi, (9)
Definition III.1. Given a room R and a location O, O is feasible if the co-located device at O can receive all the first-order echoes of a signal emitted at O.
Lemma III.1. Suppose O1, O2 and O3 are feasible and not collinear. Given grouped first-order echoes, with probability 1, there exist exactly two sign combinations such that (3) and (4) hold simultaneously for all i if φ and the direction of both {right arrow over (O1O2)} and {right arrow over (O2O3)} are randomly chosen. The two possible sign combinations have opposite signs for φ and all θi's and correspond to reflection of each other.
Proof.
Assume that the ground truth of the polygon is (6) for all iε{1, . . . ,K}. Note that (6) implies that (9) holds for θ′i=−θi and φ′=−φ for all i, which is the reflection of the room.
Suppose multiple sign combinations hold for a wall. Without loss of generality, let i=1. From (6) we have
φ=arccos α1−arccos β1. (10)
Assume that one of the following equations also holds,
φ=−arccos α1−arccos β1, (11)
φ=arccos α1+arccos β1, (12)
φ=−arccos α1+arccos β1. (13)
Then, the following three cases exist:
1) If (10) and (11) hold, θ1=0 which implies that O1O2 is perpendicular to the first wall, and φ=−arccos β1.
2) If (10) and (12) hold, arccos β1=0, which implies that O2O3 is perpendicular to the first wall.
3) If (10) and (13) hold, φ=0, which contradicts with the assumption that O1, O2 and O3 are not collinear.
With probability 1, the first two cases do not occur since both φ and directions of {right arrow over (O1O2)} and {right arrow over (O2O3)} are randomly chosen.
If a subset of (7)-(9) holds for i and iI simultaneously, then (θi,θi′)ε{θi=0,θi=φ,φ=0}×{θi′=0,θi′=φ,φ=0}, which again, does not occur due to randomly chosen measurement points. Similarly, it can be shown that for more than two walls, (6) would imply none of (7)-(9) holds for all walls.
Echo Labeling
Since echoes may arrive in different orders at different Oj's and {tilde over (r)}j contains higher order echoes if Nj>K, fj is unknown. Then θi's and φ are also unknown. Therefore we need to find the mapping fj first. We can then estimate θi's, the room shape and the location of the device. We say the received echoes are ungrouped if echoes are chosen according to f′j≠fj for some j.
Lemma III.2. Given ungrouped echoes, with probability 1, there are only two possible cases:
1) there exist no solution to (3) and (4) given no parallel edges.
2) the reconstructed room shape has larger dimension with respect to parallel edges.
Proof. The proof is illustrated by considering only the case of K=4. The result can be easily extended to K=3 and K>4.
The ground truth is (6) for all i. Considering first parallelograms and excluding odd higher order echoes resulting from a pair of parallel walls. The distances between Oj (j=1,2,3) and the four walls satisfy
r
1,1
+r
1,2
=r
2,1
+r
2,2
=r
3,1
+r
3,2
=a (14)
and
r
1,3
+r
1,4
=r
2,3
+r
2,4
=r
3,3
+r
3,4
=b (15)
We can see that for some fj's, pairs of {αii′,βii′} (i,i′ε{1, 2,3, 4}) are related to each other. Consider for example the fj's resulting in {a12,a21, a34, a43} and {β12,β21,β34,β43}. Since α12+α21=0, α34+α43=0, β12+β21=0 and β34+β43=0, we have
arccos α21=π±arccos α12
arccos α43=π±arccos α34
arccos β21=π±arccos β12
arccos β43=π±arccos β34
Thus (5) reduces to two equations.
φ=±arccos α12±arccos β12
φ=±arccos α34±arccos β34
With probability 1, these two equations do not hold simultaneously as α12, β12 are independent of α34, β34 due to randomly chosen measurement points. Other f′j(≠fj)'s always have at least two equations with independent choice of α and β. Hence no solution can be found for those instances.
Suppose f′j's are chosen such that we have αii′ and βii′ (i≠i′ i′≠i″). For rooms with no more than one pair of parallel walls, almost surely only echoes chosen according to f′j's can make (6) holds for all i. This is because for those rooms, at least one of (14) and (15) does not hold. Thus some αii′'s and βii′'s are not related since r1i′, r2i, and r3i″ are randomly chosen from {tilde over (r)}1, {tilde over (r)}2, {tilde over (r)}3, respectively.
Given parallel edges, however, higher order echoes may also satisfy (3) and (4). For instance, as shown in
r
j,131
−r
j′,131
=r
j,1
−r
j′,1
and
r
j,313
−r
j′,313
=r
j,3
−r
j′,3.
Where j≠j′. hence, (3) and (4) provide the same cos θ1, cos θ3, cos(θ1−φ) and cos(θ3−φ) if rj,1 and rj,3 are replaced by rj,131 and rj,313, respectively. By Lemma III.1, the third-order echoes resulting from a pair of parallel edges may lead to a larger room with the same norm vectors. Similarly, one can prove that given odd higher order echoes resulting from a pair of parallel edges leads to a larger room with the same norm vectors. Therefore, Lemma III.2 is proved.
Given Lemma III.1 and Lemma III.2, it is possible to conclude that the grouped first-order echoes provide either a unique room or a room with the smallest dimension. Then we have the following result on the identifiability of any convex polygonal room by using only first-order echoes.
Theorem III.3. One can recover, with probability 1, any convex planar K-polygon subject to reflection ambiguity, by using the first order echoes received at three random points in the feasible region, with known d12 and d23 and unknown φε(0, 2π).
Remark 1: Both the room shape and the coordinate of O3 are subject to reflection ambiguity for φε(0, 2π). If, however, if it is possible to limit φε(0,π), the SLAM will be free of such ambiguity.
Remark 2: In reality, it is inevitable to collect reflection from the ceiling and the floor. However, by theorem III.3, if distances corresponding to the echoes from the ceiling and the floor are chosen, no polygon can be recovered as long as the trajectory is perpendicular to the walls.
Recovery with Known Length of O1O2
Geometry
The path length obtained by motion sensors may have some errors. Additionally, some of the path lengths may not be accurate enough. In the case where either d12 or d23 is not accurate enough, the inaccurate path length is removed. Without loss of generality, assume only d12 is known. As shown in
(r3,i−r1,i)+x3 cos θi+y3 sin θi=0. (16)
(16) can also be rewritten in a matrix form
A[x
3
,y
3]T=b, (17)
where A=[cos θi,sin θi]K×2 and b=[−(r3,i,−r2,i)]K×1. Let A(:,i) and A(j,:) be the i th column and j th row of A, respectively.
Ideal Case
Similar to the previous section, it may be assumed that rj,i's have been correctly chosen from {tilde over (r)}j for j=1, 2,3 and i=1, . . . ,K. Then, since cos θi is uniquely determined by (3), the remaining question is whether (17) provides a unique solution to (x3,y3) and θi's given cos θi's and b.
Lemma IV.1. Suppose acoustic signals are emitted and received at three non-collinear feasible points Oi (i=1, 2, 3), where the coordinates of Oi are randomly chosen. If either d12 or d23 is missing, then SLAM can be done for non-parallelogram subject to reflection ambiguity given grouped first-order echoes.
Proof Given grouped echoes, we can compute cos θi by (3) for iε{1, . . . , K}. Then sin θi=±√{square root over (1−cos2θi)}. For simplicity, it is possible to assume that the ground truth of sin θi is √{square root over (1−cos2θi)} for all i. Note that if A[x,y]T=b has a solution (x3,y3) (y3>0), then A−[x,y]T=b also has a solution (x3,−y3) where
A
−=[cos θi,−sin θi]K×2
which is the reflection of the ground truth.
Assume ∀iε{1, . . . ,K}, α and β (α,β≠0) such that α cos θi+β sin θi=0. Then
√{square root over (α2+β2)} sin(θi+arctan α/β)=0. (18)
make (18) hold. Since there are at least three walls with different θi, rank(A)=2. Recall that as (x3,y3) is a solution to (17),
rank(A)=rank({tilde over (A)})=2,
where Ã=[A,b]. In other words, given grouped first-order echoes and correct sign combination of {sin θi}i=1K, the room shape can be recovered without ambiguity if y3>0. If the sign of y3 is unknown, the reconstruction result is subject to reflection ambiguity.
Let Aπ be a matrix with sign combination of {sin θi} different from the ground truth and its reflection and let Ãπ=[Aπ,b]. Without loss of generality, it is possible to assume that the first two rows of à are linearly independent. As a result, there is a linear row transform F(·) such that
where Ã*2×3=Ã(1:2,:) is a full row rank matrix. Apply the linear row transformation F(·) to Ãπ, we have
where A*′(:,2) has at least 1 non-zero entry. Hence, rank(Ãπ)=3 and no solution can be found.
Therefore only A and A− provide unique solution of (x,y) and (x,−y) respectively. In other words, SLAM is accomplished.
Echo Labeling
The following lemma guarantees that given ungrouped echoes, SLAM can be achieved in any convex polygon except parallelogram.
Lemma IV.2. Suppose acoustic signals are emitted and received at three non-collinear feasible points Oj (j=1, 2, 3), where the coordinates of Oj are randomly chosen. If either d12 or d23 is missing: (i) no solution to (3) and (16) can be found given un-grouped echo collected in any convex polygon free of parallel edges; and (i) multiple solutions to (3) and (16) can he found given ungrouped echo collected in any non-parallelogram convex polygon with parallel edges. But the dimension of the room is greater than the ground truth.
Proof. All odd higher order echoes resulting from parallel edges are excluded first. Given ungrouped echoes resulting from at least three non-parallel walls:
A′=[cos θii′,sin θii′]K×2
and
Ã=[A′,b′]
where iε{1, . . . , N2}, i′ε{1, . . . , N1}, i≠i′ for at least one entry, K′ is not necessarily equal to K and the j th entry of b′ is −(r3,j′,−r2,j). For simplicity, consider the case where sin θii′=√{square root over (1−cos θii′)} for all i. Similar to the proof of Lemma 4.1:
rank(A′)=rank(A′π)=2,
where A′π is a matrix with signs of {sin θii′} different from A′. Let Ãπ=[A′π,b′]. Since b′ is independent to A′π,
rank({tilde over (A)}′)=rank(Ã′π)=3.
Therefore, with probability 1 if the echo chosen according to some f′j contains echoes resulting from at least 3 non-parallel walls.
If echoes chosen contain odd higher order echoes resulting from a pairs of parallel walls, then the outward norm vectors remain invariant but the dimension becomes larger, which is similar to Lemma III.2.
Lemma IV.1 and IV.2 implies that for non-parallelogram convex polygon the grouped first-order echoes provides unique solution (subject to ambiguity) to (3) and (16) such that the reconstructed room shape is either the smallest one or the unique one. In other words, SLAM is accomplished by choosing the smallest room shape and the corresponding coordinate of O3. The following lemma establishes that if either d12 or d23 is missing, parallelogram can not be recovered uniquely.
Lemma IV.3. Suppose acoustic signals are emitted and received at three non-collinear feasible points Oj (j=1, 2, 3) where the coordinates of Oj are randomly chosen. If either d12 or d23 is missing, then parallelogram can not be reconstructed given ungrouped first-order echoes.
Proof.
An example may be given to show that if the shape of the room is a parallelogram, there exist multiple rooms satisfying (3) and (16). The ground truth is assumed to be
where
r
1,i
+r
1,i′
=r
2,i
+r
2,i′
=r
3,i
+r
3,i′
and
r
1,j
+r
1,j′
=r
2,j
+r
2,j′
=r
3,j
+r
3,j′.
Let
cos θii′+cos θi′i=0
and
cos θjj′+cos θj′j=0.
Moreover, since sin θ=±√{square root over (1−cos2 θ)}.
sin θii′+sin θi′i=0
and
sin θjj′+sin θj′j=0
can hold if we manipulate the sign of square root properly. Then, rank(A′)=rank([A′,b′])=2. Thus, a room shape and the coordinate of O3 other than that of the ground truth and its reflection satisfies both (3) and (16).
Given lemma IV.1-IV.3, the following result on the identifiability of convex polygon except parallelogram is possible by using only first-order echoes.
Theorem IV.4. Suppose acoustic signals are emitted and received at three non-collinear feasible points Oj (j=1, 2, 3) where the coordinates of Oj are randomly chosen. If only the distance between two of the three the measurement points is known, then SLAM can be accomplished given ungrouped echoes any convex polygon except a parallelogram.
Practical Algorithm
Two distances between three consecutive measurement points are sufficient and necessary for SLAM given any convex polygon in 2-D. The remaining question is to make the algorithm robust in noisy case.
Peak-Detection Algorithm
A simple peak-detection algorithm may be used based on the idea that peaks have steep slopes. At the receiver, |m(j)(t)| is used instead of the original one. Since the LOS component is much stronger than reflective component, the LOS peak can be easily detected. Let t0(j) be the time that at which the LOS peak in the correlator output. Suppose the nth local maxima after the LOS peak appear at tn(j) with magnitude mn(j) (n=1, 2, 3, . . . ) Then (tn(j),mn(j)) are some points in the 2-D plane. Define the slopes of the peak centered at (tn(j),mn(j)) to be
A peak centered at (tn(j),mn(j)) is said to be “steep” if gl,n(j) and −g)r,n(j) are greater than the given positive threshold gth. The experiment result suggest that gl,n±1(j) and gr,n±1(j) should also be considered. As a result, a peak centered at (tn(j),mn(j)) is “steep” if one of the following conditions is satisfied:
g
l,n
(j)
>g
th and −gr,n(j)>gth 1)
αlgl,n(j)+(1−αl)gl,n(j)>gth and −gr,n(j)>gth 2)
g
l,n
(j)
>g
th and −αrgr,n(j)−(1−αr)gr,n(j)>gth, 3)
αlgl,n(j)+(1−αl)gl,n(j)>gth and −αrgr,n(j)−(1−αr)gr,n(j)>gth 3)
where αl,αrε(0,1) depend on {tn−2(j),tn−1(j),tn(j))} and {tn(j),tn+1(j),tn+2(j)}, respectively. Hence, τi(j) 's can be obtained by detected peaks.
In practice, due to the non-ideal auto-correlation property, it is necessary to assume that no TDOA exists in [0,tmin] and the time difference between contiguous peaks is greater than Δt. Two peaks are “close” to each other if the difference of their appearance time is less than Δt. Let M be the set of peaks steep enough and P be the set of detected peaks. Suppose the maximum distance between measurement points and the walls are less than tmaxc/2. The peak detection algorithm can be summarized as Algorithm 1.
Then the candidate distances are obtained by (2).
SLAM Given Distances Between Consecutive Measurement Points
In noisy case, the distances extracted from m(j)(t) are corrupted by the noise. Define
{right arrow over (r)}
j
={tilde over (r)}
j
+n
j
as the corrupted distances, where nj is the error. In the presence of noise, however, {tilde over (r)}j is subject to measurement errors. Hence φ solved from (5) for different i's are not identical. The essential idea of a straightforward practical algorithm that handles the measurement errors is given below:
The corresponding algorithm is summarized as Algorithm 2.
SLAM Given The Distance Between Two Measurement Points
In a noisy case, the echo and sign combination is chosen such that the matrix is close to a rank-2 matrix. A straightforward idea of the practical algorithm that handles the measurement errors is given below and the practical algorithm is summarized as Algorithm 3.
Since a rectangle is the most common shape of the room, the method proposed in Section III for the present invention was tested by a real room. Since the three-order echoes resulting from parallel walls only change the dimension of the room, only the first- and second-order echoes were considered.
Given Two Distances of Three Consecutive Measurement Points
Using a laptop as microphone 18 and a HTC M8 phone as a loudspeaker 16, the speaker of the cell phone was placed towards each wall to ensure the corresponding first order echo is strong enough as the loudspeaker of the cell phone is not omnidirectional and power limited. Note that the loudspeaker will record both first order echoes and some higher order ones.
A chirp signal linearly sweeping from 30 Hz to 8 kHz was emitted by the cell phone. The sample rate at the receiver is fs=96 kHz. It has been shown in the art that if the input chirp signal is correlated with its windowed version, the output may resemble a delta function. The simulation shows that the candidate distances obtained by correlating the received signals with its triangularly windowed version outperforms the correlator output using the original one. The comparison is shown in
where c=346 m/s. gth is set to be 5fs. Under these assumptions, local maxima of
The proposed algorithm for SLAM is verified by experiment in which d12 and d23 are measured with a tape measure. Even if some elements of rj have measurement errors up to 10 cm, SLAM is accomplished with small error of both the room shape and the coordinate of O3 by the proposed algorithm given only first-order echoes. In the presence of higher order echoes, the proposed algorithm performs poorly when the variance criterion is the only criterion used to determine the correct combination of echoes. Since most rooms are regular, a heuristic constraint is added: all the angles of two adjacent walls are between 50° and 130°. An interesting phenomenon is that sometimes the proposed algorithm is unable to provide the correct room shape, but the estimate of c is always close to the true value. Therefore, one can use the algorithm in Section III to obtain c and then reconstruct the room shape independently with full knowledge of the geometry information of the measurement points. The comparison between the SLAM result and the ground truth is illustrated in
Given the Distance Between O1 and O2
Here it is assumed that O3 lies always above x-axis, i.e., y3>0. Thus SLAM result is free of ambiguity. In noiseless case, simulations show that the algorithm of the present invention achieves successful SLAM given all the first-order echoes and some second-order echoes. In noisy case, the candidate distances, including all that correspond to the first-order echoes and some correspond to the second-order echoes, are corrupted by the Gaussian noise with N (0, 0.0052). Heuristic constraint in the last section is also applied. Two rooms are used to test the proposed algorithm. For room 1, assume that O2 (1,0) and O3(1,1). Then d12=1 and d23=1.1180. The distances between walls and measurement points and the real angles of the walls are given in table I and table II.
The sample of the corrupted distances and the recovered angles are given in table III and IV, respectively.
The parameters of the second room are given table V and VI. Assume that O2(0.5,0) and O3(⅓,0.5). Then d12=0.5 and d23=0.5270.
The simulation result is shown in table VII and VIII.
From the simulation result, it may be seen that in a noisy case the present invention can reconstruct the room shape given d12. But in both cases, the present invention was unable to obtain the coordinate obtained by the corrupted distances. The possible reason is that the angles of the walls are estimated directly by the elements of A, while the coordinate of O3 is obtained by
A−1 is more vulnerable to noise than A. Thus, the coordinate of O3 may not be obtained in noisy case while the norm vector of the walls can be estimated.
The present invention makes progress in acoustic SLAM integrating measurement from internal motion sensors along with echo measurements for localization and mapping. A simple approach based on gradient test is used to detect peaks of the correlator output which are used to compute candidate distances. Experiment results show that the developed system can recover all desired first order echoes along with some high order echoes as well as some spurious peaks. With the distances between consecutive measurement points obtained through internal sensors, the present invention can recover any 2-D convex polygon while self-localizing using the collected acoustic echoes. In the presence of noise, a simple algorithm is devised that is effective in recovering the room shape even in the presence of higher order echoes.
The present invention may also be applied for 3D SLAM, which has found applications for both navigation and construction monitoring. The present invention can be extended to the 3D case: it can be shown that, in an idealized case, four measurement points that do not reside on a single plane can recover any convex 3D polyhedron when distances between consecutive measurement points (in this case there are three of them) are known. Other interesting problems include 3D SLAM for shoebox rooms as they are one of the most encountered rooms in practice. For a shoebox, the outward norm vectors are always subject to rotation and translation ambiguity due to its symmetry therefore only the coordinates of the measurement points and the dimension of the shoebox are of interest. For a shoebox, fewer than four measurement points may be needed when complete set of first order echoes (in this case including from floor and ceiling) are available. Additionally, many room shapes besides shoebox have some special structural information that can be exploited. For instance, the floor is almost always perpendicular to the walls and there often exist two adjacent walls that are perpendicular to each other. This structural information, namely three connected planes are perpendicular to each other, can be explored for echo labeling, which is more challenging for 3D SLAM. Even with labeled echoes, 3D SLAM often requires solving a bilinear optimization problem for arbitrary convex polyhedra whose corresponding cost function is non-convex and thus multiple local minima exist. Clearly, having more measurement points or other geometry information may impose additional constraints and can help resolve the inherence ambiguity, i.e., in identifying the correct solution.
As described above, the present invention may be a system, a method, and/or a computer program associated therewith and is described herein with reference to flowcharts and block diagrams of methods and systems. The flowchart and block diagrams illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer programs of the present invention. It should be understood that each block of the flowcharts and block diagrams can be implemented by computer readable program instructions in software, firmware, or dedicated analog or digital circuits. These computer readable program instructions may be implemented on the processor of a general purpose computer, a special purpose computer, or other programmable data processing apparatus to produce a machine that implements a part or all of any of the blocks in the flowcharts and block diagrams. Each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical functions. It should also be noted that each block of the block diagrams and flowchart illustrations, or combinations of blocks in the block diagrams and flowcharts, can be implemented by special purpose hardware-based systems that perform the specified functions or acts or carry out combinations of special purpose hardware and computer instructions.
The present application claims priority to U.S. Provisional Application No. 62/354,482, filed on Jun. 24, 2016.
Number | Date | Country | |
---|---|---|---|
62354482 | Jun 2016 | US |