REAL TIME OBJECT TAGGING FOR INTERACTIVE IMAGE DISPLAY APPLICATIONS

Abstract
Apparatus and methods that track the location of an object within a video image at the time of capture of the video image are described. The location of the object within each frame can be recorded as meta-data for the video image so that when the video image is played back, a viewer can select the object using suitable interaction means and be linked through to a source of additional information about the object, such as a product website or the like. A device emitting radio frequency (RF) signals is attached to an object that is to be identified and tracked within a video image. Using an RF receiver with multiple antennas and applying trilateration techniques, the object's location within the video image is determined in real time and recorded as the video image is recorded. Where multiple objects are to be tracked, each object is provided with a radio device having a unique ID and the location of each device within the video image is recorded. The described solution automates an otherwise manual, error-prone and time-consuming process.
Description
FIELD OF THE INVENTION

The present invention relates to the field of interactive image display, and more specifically to apparatus and methods relating to the real-time tagging, positioning, and tracking of objects for interactive image display applications such as interactive television.


BACKGROUND INFORMATION

Object identification and hyperlink tagging in video media allows a viewer to learn more about displayed objects by selecting an object and being linked to a website with additional information about the object. This provides sponsors of a television program or a movie production with a means to effectively embed advertising in a program or to display advertisements that will allow interested viewers to learn more about products or services displayed therein.


Currently, no object tagging or tracking procedures are considered at the time of filming. The object identification and tagging in the video medium is done at the post-editing stage. This task is typically done by a human manually entering the object information in a database. A more automated approach has been to use image recognition technology to track the object of interest in the captured video stream. This, however, is more error-prone even with current state-of-the-art image processing algorithms.


SUMMARY OF THE INVENTION

The present invention is directed to apparatus and methods that track the location of an object within a video image at the time of capture of the video image. The location of the object within each frame can be recorded as meta-data for the video image so that when the video image is played back, a viewer can select the object using suitable interaction means and be linked through to a source of additional information about the object, such as a product website or the like. Preferably, the present invention allows multiple objects in an image to be individually tracked and identified.


In accordance with an exemplary embodiment of the present invention, a device emitting radio frequency (RF) signals is attached to an object that is to be identified and tracked within a video image. Using an RF receiver with multiple antennas and applying trilateration techniques, the object's location within the video image is determined in real time and recorded as the video image is recorded. Where multiple objects are to be tracked, each object is provided with a radio device having a unique ID and the location of each device within the video image is recorded.


Using a projection algorithm, positions of the objects in the 3-D field can be mapped to a set of pixels on the 2-D screen on which the image is displayed. The coordinate information, the frame number of the filmed video, the ID of the radio device, and other relevant or useful information can be stored in a database, as meta-data, or in any appropriate form, at the time of recording.


In a further exemplary embodiment, a camera capturing an image containing the tagged object is also provided with RF emitting devices which allow for the determination of the camera position and orientation using trilateration techniques. Using additional camera information such as focal length and field of vision, the 2-D virtual screen representing the captured image can be derived.


The aforementioned and other features and aspects of the present invention are described in greater detail below.





BRIEF DESCRIPTION OF THE DRAWINGS


FIG. 1 is a high-level block diagram of an exemplary embodiment of an object tagging system in accordance with the present invention.



FIG. 2 is a high-level flow chart illustrating the operation of the system of FIG. 1.



FIG. 3 is a schematic representation of a trilateration technique used in an exemplary embodiment of the present invention.



FIGS. 4A through 4D diagrams for illustrating an exemplary technique of mapping the three-dimensional location of an object onto a virtual, two-dimensional screen representative of an image captured by a camera.





DETAILED DESCRIPTION


FIG. 1 is a block diagram of an exemplary embodiment of an object tagging system 100 in accordance with the present invention. The system 100 comprises a positioning block 110, a computing block 120, and media storage 130. The positioning block 110 tracks and determines positional information relating to a camera 140 and one or more objects 150.


As contemplated in the exemplary system 100, each object 150 is provided with a radio device or tag 155 that allows the positioning block 110 to locate the object and track its position in real time using trilateration techniques, described below in greater detail. Any of a variety of suitable radio technologies, including, for example, RFID, Bluetooth, or UWB, can be exploited for this purpose. The tag 155 may be an active device which emits a signal under its own power, or it may be a passive device which emits a signal derived from a signal with which it is illuminated. Where multiple objects 150 are to be tagged, each tag 155 preferably emits a unique ID to allow individual tracking of the multiple objects.


As the camera 140 captures images of a scene including the tagged object 150, the object's location in three dimensions is determined by the positioning block 110. For determining the location of the object 150 with trilateration, the positioning block 110 uses multiple antennas for receiving signals from the tag 155. (An additional, emitting antenna may be included for implementations using passive tags.) In addition, the location, shooting angle, focal length, and/or field-of-view of the camera 140 is provided to the positioning block 110. The camera information can be provided to the positioning block 110 over a dedicated interface (wireless or hard-wired) or, like the object 150, the camera 140 may have one or more tags attached thereto, with the tags providing the camera information. An exemplary trilateration arrangement in which the camera is provided with multiple tags is described below. In a further exemplary embodiment, the relevant camera information can be determined by the camera itself or by data collection apparatus associated with the camera and sent therefrom to the positioning block.


The camera information and object location information are provided in real time to the computing block 120. Using a projection algorithm described in greater detail below, the computing block maps the three-dimensional object location information onto a two-dimensional field representing the viewing screen of the captured video image. The location of the tagged object 150 within a scene can be represented in terms of pixel locations in the captured image.


The 2D location information of the tagged object 150 within each frame of a captured video stream is provided and recorded in the media storage 130. For multiple tagged objects, the location information for each object is associated with the object's ID. Each tagged object is associated with a hyperlink so that when the viewer of the video stream points to and selects the object (with a suitable interaction device such as, for example, a mouse or a television remote control), the user can navigate to a website with additional information about the object.



FIG. 2 is a high-level flow chart illustrating an exemplary method in accordance with the present invention. As mentioned above, the location of the tagged object in three-dimensional space is first determined, at step 201. At step 202, the 3D location of the object is mapped onto a two-dimensional virtual screen representative of the image captured by a camera viewing a scene containing the object. The processing of the object location takes place while the image is captured, as represented by step 203. The location information and the image are recorded at step 204. Additional information may also be recorded, including, for example, object ID, time, and frame number, among others. The data and image recording are preferably done simultaneously.


Exemplary techniques for carrying out the steps illustrated in FIG. 2 will now be described in greater detail.


An exemplary arrangement for determining the coordinates in three-dimensional space of an object will now be described with reference to FIG. 3. The points R0, R1, R2, and R3 are stationary, known reference points from which distances to any RF transmission point, P, can be measured. In the exemplary system described above, the points R0, R1, R2, and R3 represent the locations of antennas receiving emissions from an RF tag located at point P. The receiving antennas are used in a time difference of arrival (TDOA) scheme in which the differences in the times of arrival at the antennas of a signal emitted from the tag are used to determine the distances from each antenna to the tag.


R0 is treated as the origin of the Cartesian coordinate system and the line R0R1 is in the yz-plane. The line R0R2 is on the z-axis. R1 and R3 can be placed anywhere in the domain except on the z-axis. In an exemplary embodiment, the points R1, R2, and R3 are on the y, z, and x axes, equidistant from the origin R0 of the 3 dimensional Cartesian coordinate system.


For an arbitrary transmission point P=(x,y,z), r0, r1, r2, and r3 are the distances between point P and points R0, R1, R2, and R3, respectively, and are determined using the aforementioned TDOA technique. The RF signal receiving points and the transmission points can be arranged so as to have non-negative coordinates by proper placement of R0, R1, R2, and R3.


The coordinates of the reference points can be represented by d1, d2, d3, d4, d5 and d6, the distances between the reference points. These distances are fixed and known. The angles among the line segments connecting reference points can be obtained from basic trigonometric relationships, as follows:










α









R
1



R
0



R
2



=


arccos
(



d
1
2

+

d
2
2

-

d
4
2



2


d
1



d
2



)

.





(
1
)







Then, the coordinates R1(0,y1,z1) and R2(0,0,Z2) are given by:











y
1

=


d
1



cos


(


π
2

-
α

)











z
1

=


d
1



sin


(


π
2

-
α

)











z
2

=


d
2

.






(
2
)







The coordinates of R3(x3,y3,z3) can be obtained by solving the following equations:






d
3
2
=x
3
2
+y
3
2
+z
3
2






d
5
2
=x
3
2
+y
3
2+(z3−z2)2






d
6
2
=x
3
2+(y3−y1)2+(z3−z1l)2.   (3)


These equations yield the following solutions:











x
3

=





d
3
2

-


[





d
5
2

-

d
6
2

+

y
1
2

+

z
1
2

-

z
2
2

+







(


z
2
2

+

d
3
2

-

d
5
2


)



(

1
-


z
1

/

z
2



)





]

2



4


y
1
2



-



[


z
2
2

+

d
3
2

-

d
5
2


]

2


4


z
2
2













y
3

=



d
5
2

-

d
6
2

+

y
1
2

+

z
1
2

-

z
2
2

+


(


z
2
2

+

d
3
2

-

d
5
2


)



(

1
-


z
1

/

z
2



)




2


y
1











z
3

=



z
2
2

+

d
3
2

-

d
5
2



2


z
2








(
4
)







Once the coordinates of the reference points R1, R2 and R3 are determined, the coordinates of point P=(x,y,z) can be obtained by solving the following system of equations:






r
0
2
=x
2
+y
2
+z
2






r
1
2
=x
2+(y−y1)2+(z−z1)






r
2
2
=x
2
+y
2+(z−z2)2






r
3
2=(x−x3)2+(y−y3)2+(z−z3)2   (5)


This system of equation yields the following solution:










x
=

±



r
0

-



[





r
0

-

r
1

+

y
1
2

+

z
1
2

-








(



r
0
2



z
1


-


r
2
2



z
1



)

/

z
2


-


z
1



z
2






]

2


4


y
1
2



-



[


r
0
2

-

r
2
2

+

z
2
2


]

2


4


z
2
2













y
=



r
0
2

-

r
1
2

+

y
1
2

+

z
1
2

-


(



r
0
2



z
1


-


r
2
2



z
1



)

/

z
2


-


z
1



z
2




2


y
1










z
=



r
0
2

-

r
2
2

+

z
2
2



2


z
2








(
6
)







The sign of x should be positive due to the assumptions made above.


As such, using the exemplary trilateration technique described, the 3D coordinates of the tagged object (at point P), can be determined from the distances between the receiving antennas (d1, d2, d3, d4, d5 and d6) and the distances between the receiving antennas and the tagged object (r0, r1, r2, and r3 ).


Ultimately, the object appears on a two-dimensional screen, thus, the object coordinates in three-dimensional space should be mapped on a virtual planar surface which represents the screen to be viewed. An exemplary procedure for performing such a mapping will now be described with reference to FIGS. 4A-4D which show a camera 310, a tagged object 320, and a two-dimensional plane or virtual screen 350 representative of the image (still or moving) captured by the camera. FIG. 4A shows a plan view, FIG. 4B an elevation view and FIG. 4C an isometric view of the aforementioned elements. The screen 350 extends horizontally and vertically by dimensions h and v, respectively, about a center point Co.


Three points are shown on the camera 310, Ca, Cb, and Cc, at which emitters, such as the tag used for the object 320 are located, in accordance with an exemplary embodiment of the invention. The coordinates of each of these points, Ca=(xa,ya, za), Cb=(xb,yb,zb), Cc=(xc,yc,zc), can be determined from the distances between these points and the reference points R0, R1, R2, and R3, using a similar procedure and arrangement as described above for the coordinates of the object 320, P=(xp,yp,zp). With reference to FIG. 1, the same positioning block 110 and receiving antennas used to locate the tagged device(s) 150 can be used for determining the location and orientation of the camera 140. As shown in FIG. 4A, the points Cb, and Cc are arranged in a line that is substantially perpendicular to a line Lc which includes the point Ca and is substantially at the center of the field of view of the camera 310. The line Lc is also perpendicular to the two-dimensional plane 350 of the scene, which is defined, as shown in FIG. 4C, by the lines Lx and Ly.


Ideally, the point Ca is at the center of the lens of the camera but because of the physical limitations of placing an emitting device there, it is preferably as close as possible, such as centered directly above the lens. In this embodiment, the points Cb, and Cc are equidistant from the center of the camera lens, in which case, the line Lc includes the midpoint between the points Cb, and Cc, namely, Cm=(xm,ym,zm), where xm=(xb+xc)/2, ym=(yb+yc)/2, zm=(zb+zc)/2. The line, Lc, through Ca and the midpoint Cm=(xm,ym,zm) of Cb and Cc, can be expressed as follows:











x
-

x
a




x
m

-

x
a



=



y
-

y
a




y
m

-

y
a



=



z
-

z
a




z
m

-

z
a



.






(
7
)







Let l, m, n be the directional cosine of the line Lc, then they become:










l
=



x
m

-

x
a






(


x
m

-

x
a


)

2

+


(


y
m

-

y
a


)

2

+


(


z
m

-

z
a


)

2











m
=



y
m

-

y
a






(


x
m

-

x
a


)

2

+


(


y
m

-

y
a


)

2

+


(


z
m

-

z
a


)

2











n
=



z
m

-

z
a






(


x
m

-

x
a


)

2

+


(


y
m

-

y
a


)

2

+


(


z
m

-

z
a


)

2









(
8
)







The image of the object point P on the screen 350 is designated as point Pi=(xi,yi,zi). A line Lp from the point Ca to the object image point Pi=(xi,yi,zi) is:











x
-

x
a




x
i

-

x
a



=



y
-

y
a




y
i

-

y
a



=



z
-

z
a




z
i

-

z
a



.






(
9
)







Because the line Lc is perpendicular to the plane 350 and the point Co=(xo,yo,zo) is in the plane 350, the equation of the plane 350 becomes






l(x−xo)+m(y−yo)+n(z−zo)=0.   (10)


The center point of the screen plane 350 can be used as the origin of a two-dimensional coordinate system for the screen plane 350. Since the center point Co=(xo,yo,zo) is on the line Lc, it satisfies the following:












x
o

-

x
a




x
m

-

x
a



=




y
o

-

y
a




y
m

-

y
a



=




z
o

-

z
a




z
m

-

z
a



.






(
11
)







Another equation is needed to close the system and to determine the coordinates of the point Co. The focal length f of the camera is the distance from the lens of the camera Ca to the focal point of the camera, which corresponds to the center point Co. As such:






f=√{square root over ((xa−xo)2+(ya−yo)2+(za−zo)2)}{square root over ((xa−xo)2+(ya−yo)2+(za−zo)2)}{square root over ((xa−xo)2+(ya−yo)2+(za−zo)2)}.   (12)


Let ko be a constant which satisfies:













x
o

-

x
a




x
m

-

x
a



=




y
o

-

y
a




y
m

-

y
a



=




z
o

-

z
a




z
m

-

z
a



=

k
o




,




(
13
)







in which case the focal length f and ko have the following relationship:










k
o

=


f




(


x
m

-

x
a


)

2

+


(


y
m

-

y
a


)

2

+


(


z
m

-

z
a


)

2




.





(
14
)







The coordinates of point Co are:






x
o
=x
a
+k
o(xm−xa)






y
o
=y
a
+k
o(ym−ya)






z
o
=z
a
+k
o(zm−za)   (15)


The coordinates of the object image point Pi can be obtained from the following system of equations:











l


(


x
i

-

x
o


)


+

m


(


y
i

-

y
o


)


+

n


(


z
i

-

z
o


)



=
0




(
16
)









x
i

-

x
a




x
p

-

x
a



=




y
i

-

y
a




y
p

-

y
a



=




z
i

-

z
a




z
p

-

z
a



=

k
p







(
17
)







Eq. 17 follows from the fact the point Pi is on screen 350. The second part of the above equations is valid since the point Pi is on a line connecting point Ca and the object point P=(xp,yp,zp). kp is a constant which satisfies the line equation. The coordinate of the point Pi becomes:











x
i

=


x
a

+


k
p



(


x
p

-

x
a


)












y
i

=


y
a

+


k
p



(


y
p

-

y
a


)




,






z
i

=


z
a

+


k
p



(


z
p

-

z
a


)










where




(
18
)







k
p

=




l


(


x
o

-

x
a


)


+

m


(


y
o

-

y
a


)


+

n


(


z
o

-

z
a


)





l


(


x
p

-

x
a


)


+

m


(


y
p

-

y
a


)


+

n


(


z
p

-

z
a


)




.





(
19
)







Now, we have all the coordinate information for the center point Co and the object image point Pi. A line through these two points is:












x
-

x
o



l
io


=



y
-

y
o



m
io


=


z
-

z
o



n
io




,




where




(
20
)








l
io

=



x
i

-

x
o






(


x
i

-

x
o


)

2

+


(


y
i

-

y
o


)

2

+


(


z
i

-

z
o


)

2












m
io

=





y
i

-

y
o






(


x
i

-

x
o


)

2

+


(


y
i

-

y
o


)

2

+


(


z
i

-

z
o


)

2




.





n
io


=



z
i

-

z
o






(


x
i

-

x
o


)

2

+


(


y
i

-

y
o


)

2

+


(


z
i

-

z
o


)

2










(
21
)







The line equations for Lx and Ly will give the values of the angles θ and φ shown in FIGS. 4A and 4B. Suppose that the equations of Lx and Ly are:












x
-

x
o



l
1


=



y
-

y
o



m
1


=


z
-

z
o



n
1




,
and




(
22
)








x
-

x
o



l
2


=



y
-

y
o



m
2


=



z
-

z
o



n
2


.






(
23
)







The directional cosine of line Lx should be proportional to the directional cosine of a line passing through points Cb and Cc since they are parallel. More precisely the directional cosine, (lbc,mbc,nbc), of a line through points Cb and Cc becomes











l
bc

=



x
b

-

x
c






(


x
b

-

x
c


)

2

+


(


y
b

-

y
c


)

2

+


(


z
b

-

z
c


)

2












m
bc

=





y
b

-

y
c






(


x
b

-

x
c


)

2

+


(


y
b

-

y
c


)

2

+


(


z
b

-

z
c


)

2




.





n
bc


=



z
b

-

z
c






(


x
b

-

x
c


)

2

+


(


y
b

-

y
c


)

2

+


(


z
b

-

z
c


)

2










(
24
)







We then have l1=klbc,m1=kmbc, and n1=knbc for a certain constant k. The equation of line Lx can be rewritten as:











x
-

x
o



l
bc


=



y
-

y
o



m
bc


=



z
-

z
o



n
bc


.






(
25
)







To obtain the directional cosine of Ly we have two equations:






l
2
l
bc
+m
2
m
bc
+n
2
n
bc=0,   (26)


since Lx⊥Ly, and






l
2
l+m
2
m+n
2
n=0,   (27)


since Ly is on the plane 350. This system of equations yields the following solution for the directional cosine of Ly:











l
2

=
h








m
2

=

h
·



n
·

l
bc


-

l
·

n
bc





m
·

n
bc


-


m
bc

·
n












n
2

=

h
·



l
·

m
bc


-

m
·

l
bc





m
·

n
bc


-


m
bc

·
n









(
28
)







for a constant h. The equation of line Ly becomes











x
-

x
o




m
·

n
bc


-

n
·

m
bc




=



y
-

y
o




n
·

l
bc


-

l
·

n
bc




=



z
-

z
o




l
·

m
bc


-

m
·

l
bc




.






(
29
)







The directional cosine of Ly can be rewritten as:











l
2

=



m
·

n
bc


-

n
·

m
bc










(


m
·

n
bc


-

n
·

m
bc



)

2

+








(


n
·

l
bc


-

l
·

n
bc



)

2

+


(


l
·

m
bc


-

m
·

l
bc



)

2















m
2

=



n
·

l
bc


-

l
·

n
bc










(


m
·

n
bc


-

n
·

m
bc



)

2

+








(


n
·

l
bc


-

l
·

n
bc



)

2

+


(


l
·

m
bc


-

m
·

l
bc



)

2















n
2

=



l
·

m
bc


-

m
·

l
bc










(


m
·

n
bc


-

n
·

m
bc



)

2

+








(


n
·

l
bc


-

l
·

n
bc



)

2

+


(


l
·

m
bc


-

m
·

l
bc



)

2












(
30
)







Let line LIO be defined by the two points Co and Pi. Then, the angle φ between Lx and LIO becomes





φ=arc cos(l1lio+m1mio+n1nio)   (31)


The angle θ, between Ly and LIO is





θ=arc cos(l2lio+m2mio+n2nio)   (32)


Since f, h, and v are readily available, the angles δh and δv can be derived as:











δ
h

=

arctan


(

h
f

)



,
and




(
33
)







δ
v

=


arctan


(

v
f

)


.





(
34
)







The ratios θ/δv and φ/δh are sufficient to determine, respectively, the relative vertical and horizontal positions of the object image point Pi on the screen 350. This is shown in FIG. 4D.


Once the coordinates of the object within the camera image have been determined, as described above, this information along with any other relevant information that may be desired, is recorded, as discussed above with reference to FIG. 2.


The present invention can be used in a variety of applications. Consider an illustrative application of the present invention in which a movie studio is filming a scene in Central Park in which the main actor and actress are sitting on a bench. A sponsor of the movie is a well-known fashion company that wants to advertise a new handbag held by the actress on her lap. The fashion company wants to provide a direct link to their online shop if a viewer moves the pointer, available with an interactive TV set, to the proximity of the handbag. At the time of filming, a Bluetooth radio device, or the like, is placed inside the handbag. Four radio antennas placed around the bench receive the radio signals from the Bluetooth device and send it to a laptop computer. Simultaneously, the video camera sends frame numbers to the laptop computer where the concurrently generated object position and frame numbers are associated and stored in a database. The present invention allows the producer to build a database of all the necessary information regarding the location of the object (i.e., handbag) in the video screen, its identity, and the frame number. Advantageously, this can be done without human intervention or error-prone image recognition technologies. The trilateration positioning device, video camera, and computer can communicate over wired or wireless connections.


The present invention provides accurate means of object tracking and tagging in real time for interactive TV applications, streaming video, or the like. This eliminates time consuming and/or error-prone post processing steps involved in locating objects in the video. It is a useful tool for a variety of applications such as advertising and marketing in interactive video. Additionally, the present invention can help advertisers track the amount of time that their products are seen on the screen, and provide other useful information.


Note that while the apparatus and methods of the present invention are most advantageously used in conjunction with video or moving images, the present invention can just as readily be applied to still imaging as well, where individual images are captured.


It is understood that the above-described embodiments are illustrative of only a few of the possible specific embodiments which can represent applications of the invention. Numerous and varied other arrangements can be made by those skilled in the art without departing from the spirit and scope of the invention.

Claims
  • 1 A method of tracking an object in an image comprising: determining the location of the object in three-dimensional space based on trilateration of emissions from a radio frequency (RF) tag device attached to the object;determining the location and orientation of a camera;mapping the location of the object from three-dimensional space onto a two-dimensional virtual screen defined by the location and orientation of the camera; andrecording the mapped location of the object.
  • 2. The method of claim 1, comprising: recording an image containing the object, wherein the recording of the image and the recording of the mapped location of the object occur simultaneously.
  • 3. The method of claim 1, wherein the RF emissions from the tag device contain identification information associated with the object.
  • 4. The method of claim 1, wherein the object is associated with a hyperlink.
  • 5. The method of claim 1, wherein determining the location and orientation of the camera is based on trilateration of emissions from a plurality of RF tag devices attached to the camera.
  • 6. The method of claim 1, wherein the image comprises a video image.
  • 7. The method of claim 2, wherein the mapped location of the object and the image are recorded in the same medium.
  • 8. A system for tracking an object in an image comprising: a positioning apparatus, the positioning apparatus determining the location of the object in three-dimensional space based on trilateration of emissions from a radio frequency (RF) tag device attached to the object, the positioning apparatus also determining the location and orientation of a camera;a computing apparatus, the computing apparatus mapping the location of the object from three-dimensional space onto a two-dimensional virtual screen defined by the location and orientation of the camera; anda recording apparatus, the recording apparatus recording the mapped location of the object.
  • 9. The system of claim 8, wherein the recording apparatus records an image containing the object, and wherein the recording of the image and the recording of the mapped location of the object occur simultaneously.
  • 10. The system of claim 8, wherein the RF emissions from the tag device contain identification information associated with the object.
  • 11. The system of claim 8, wherein the object is associated with a hyperlink.
  • 12. The system of claim 8, wherein the positioning apparatus determines the location and orientation of the camera based on trilateration of emissions from a plurality of RF tag devices attached to the camera.
  • 13. The system of claim 8, wherein the image comprises a video image.
  • 14. The system of claim 9, wherein the mapped location of the object and the image are recorded in the same medium.