This invention relates generally to object tracking and recognition system, and in particular, to a method and apparatus for object tracking and recognition including both RFID and video devices.
The idea of ubiquitous computing is helpful to let people who benefit from the computer or other electronic devices see no device as they use it. People's natural activities can drive the applications executed as needed without explicit commands.
The first task to build a ubiquitous computing platform is to carry out an object tracking and recognition to acquire the activity status of a user. For example, in a digital home environment, the activity status can be any information on location, motion, gestures, voice, or face expression of the user. With the detection and analysis on user's activity status, the system can take corresponding actions without explicit operations on home devices.
A simple application scenario can be described as follows: 1) A TV set for example is in off status regardless of a user moving in and out the living room where the TV set is placed. 2) Once the user shows intention to watch the TV for example by sitting on the coach or standing still in front of the TV, the TV flashes (like blinking its eyes). 3) When the user waives to the TV with his hand a couple of times, the TV turns on. 4) The user can continue to operate the TV with his hand gestures or voice command. 5) When the user moves out of living room, the TV program is paused; and when the user moves back, the TV program is resumed.
Traditionally, Radio frequency identification (RFID) location determining system is used for user location detection. However it is still difficult to have the location detection because the obtained power strength is not always stable in a multipath indoor environment. In addition, video camera for tracking objects is common in the art. The image information of the object, such as location including the “real world” coordinates of the object, the “physical” location of the object, the “3D” coordinates of the object, the motion, the gesture and so on, is determined from the appearance of the object in the field of view of one or more cameras.
One simple hybrid location detection and object recognition method is as follows: each home user wears an active RFID tag transmitting radio signals to receivers inside home. A simple setup is to locate one RFID receiver on top of the electronic device to be controlled, such as TV. The tag can be attached on user's shoes or clothing. In addition, multiple video devices such as camera, video tracking device, are also arranged on the top of TV or at a specific place in the room.
The invention concerns a object tracking and recognition system comprising: a video device for getting image information of an object in detection range of the object tracking and recognition system; and RFID device for detecting signal strength of a RF tag of the object; Wherein the video device is turned on to get the image information of the object upon the detected signal strength reaching a predetermined signal strength threshold.
The invention also concerns a method used in an object tracking and recognition system comprising: getting, by a video device, image information of an object in detection range of the object tracking and recognition system; and detecting, by a RFID device, signal strength of a RF tag of the object; Wherein turning on the video device to get the image information of the object based on the detected signal strength when reaching a predetermined signal strength threshold.
Exemplary embodiments of the invention are described with reference to the accompanying drawings, which show in
The system 100 also includes a RFID device 130, which includes one or more RFID receivers arranged on the display device, for detecting the signal strength of RFID tags (not shown) attached on the objects or users and emitting RF energy. The RFID device 130 is coupled to, or includes a coordinate generator 140 to get the location of the objects based on the coordinate and image model 150, and detect the distance from the user to the display device. The technology of using the RFID device 130 and the coordinate generator 140 to get the coordinates and distance of object is known to one skilled in the art, and the details thereof will be omitted here.
It is known that the video device can get more accurate image information of the object than the RFID device. However, considering the user's privacy and the energy consumption, it is better to turn on the video device only when it is necessary, especially when the camera used as the video device is a wireless one. Therefore, according to the embodiment of the invention, a central processor 160 is used to control the video device 110 and the RFID device 130 so that when RF signal strength is greater than a predetermined threshold, it indicates the object or the user is close to a display device, such as TV, the video device powers on; when object or user is far from TV, RF signal strength is low, video device cut off.
The predetermined threshold is the signal strength when the object is located in a specific place or has a specific distance (such as 6 meters) from the display device. The threshold can represent the environment when it is generated. If environment changes, the threshold may be invalid, and in that case the specific place or the distance will be error.
According to the embodiment of the invention, a relationship model is built to describe the signal strength relation between two points in the detection range of the system 100, so as to estimate and dynamically update the signal strength of the threshold using the two points.
It is known that the signal strength relationship between two grids among these grids (R1-R9 and DR1-DR3) can be modeled by the following linear model:
r
i=ωi,j0+ωi,j1rj (1)
Wherein ri represents received signal strength sample at the ith grid, rj represents received signal strength sample at the jth grid, ωi,j0 and ωi,j1 are the coefficient of linear model. For example, these coefficients can be obtained by multiple samples at the ith grid and jth grid collected from offline training, and applying Minimum mean-square error (MMSE) to these samples to get these coefficients.
In addition, it is known that the linear relationship is stable or fixed, that is, normally the relationship cannot be influenced by the environment change. So a linear model matrix (assumed it is composed of M*M elements) is used to describe the relationship between M grids in the detection range, wherein the element (in the ith row and the jth column) is the coefficients (ωi,j0, ωi,j1) which represent the RSS
relationship between ith and jth grid. And this linear model matrix can be used in online phase to help to estimate the signal strength in detection range, such as the RSS value in the threshold ranges DR1, DR2, and DR3 in
One example of the linear model matrix is shown in
In addition, multiple point estimation method based on the linear model matrix can also be used. When there are a plurality of signal strength samples captured by image analysis of the camera in online phase, the multiple point estimation can be used. That means the threshold is estimated by multiple RSS samples at multiple points. For example, it is assumed that threshold (ri) corresponding to DR1 will be estimated and the signal strength samples at R4, R5, R7, R8 have been captured by image analysis. The two grids (R7,R8) which are the two nearest grids to DR1, will be chose for the estimation. And then respective linear model will be independently used to estimate the signal strength of DR1. These two estimation results will be combined by one maximal ratio combination algorithm. The detail is as follows:
Choose the two nearest grids (R7,R8), rj, rk representing the signal strength sample of R7, R8 respectively. In order to select a more accurate grid to get the estimation value, a confidence of each grid, C1,i of rj and C2,i of rk can be introduced as follow:
Find the adjacent grid R4 (rl is its signal strength) of the grid R7, and estimate rj by rl using equation (1). Then get the confidence of R7 by equation (2):
C
1,i=1/(rj−(ωj,l0+ωj,l1rl))2 (2)
wherein rj is the detected signal strength at R7, and ωj,l0+ωj,l1rlis the estimated signal strength at R7 by using rl
and the linear model matrix in
Then find the nearest grid R5 (rm is its signal strength) of the point R8, and estimate rk by rm using equation (1). Then
get the confidence of R8 by equation (4):
C
2,i=1/(rk−(ωk,m0+ωk,m1rm))2 (3)
Normalize C1,i and C2,i using the following equations:
C
1,i
=C
1,i/(C1,i+C2,i) (4)
C
2,i
=C
2,i/(C1,i+C2,i) (5)
Combine the estimations of the threshold ri with maximal ratio combining (MRC) method using the following equations: estimate the threshold ri (expressed as r1,i in the equation) using the rj
r
1,i=ωi,j0+ωi,j1rj (6)
Estimate ri (expressed as r2,i in the equation) using the rk.
r
2,i=ωi,k0+ωi,k1rk (7)
Obtain the final estimation of ri by MRC method:
r
i
=C
1,i
r
1,i
+C
2,i
r
2,i (8)
The threshold is updated using the obtained ri so that the camera can power on when an object moving into the threshold grids to capture the status information of the object.
The foregoing merely illustrates the embodiment of the invention and it will thus be appreciated that those skilled in the art will be able to devise numerous alternative arrangements which, although not explicitly described herein, embody the principles of the invention and are within its spirit and scope.
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/CN2010/001000 | 7/2/2010 | WO | 00 | 1/2/2013 |